Namazu-devel-ja(旧)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EUC-JP strings in perl scripts

From: Tadamasa Teranishi <yw3t-trns@xxxxxxxxxxxxxxx>
Date: Sun, 01 Feb 2004 17:28:13 +0900
X-ml-name: namazu-devel-ja
X-mail-count: 03632
References: <873ca1j00b.wl@knok.daionet.gr.jp> <200401270903.SAA05150@x81002.hsba.go.jp> <87y8rthhoz.wl@knok.daionet.gr.jp> <20040127.192747.103135326.at@gclab.org> <871xpghngz.wl@knok.daionet.gr.jp> <401B4917.99FF8B59@asahi-net.or.jp> <8765eri9l7.wl@knok.daionet.gr.jp> <401CA53A.9688E7D0@asahi-net.or.jp>

寺西です。

Tadamasa Teranishi wrote:
> 
> それとは別に 2.1 版で namazu-2.0.12-fixinutf8.patch を大幅に簡素化
> して書き直したパッチをもう少し後に公開します。

このパッチを公開します。

> (Akira TAGOH さんには申し訳ないのですが、多少問題点があるため、
> 修正の必要がありました。また、いろいろ欲が出て手を加えたため、
> 随分と違ったものになりました。もちろん、元のプログラム、データが
> 有効なものであるからこそ、いろいろ手を加えたくなったわけで、
> Akira TAGOH さんの努力を無駄にはしていないと思っています。)

# というより、コアの部分はそのまま使わせていただいています。

変更内容:

・regexp パターンが中心だが、一部通常の文字列も取り扱う部分
　があるため、文字列リソースとして扱うことにしました。
　(実際の処理に違いはありません。)
・langspec.txt を resource.txt と名前を変更しました。
  (以下、リソースファイルと呼ぶ)
・リソースファイルを filter/ から pl/resource/ に置くように変更
　しました。
　(念のため、将来、言語ごとのファイルが増えることを考慮して、
　サブディレクトリを作成しました。)
・リソースファイルのフォーマットは継承しつつ、
　空白行、行頭に '#' を記述するとコメント行としました。
・リソースに無効な内容が含まれる場合には、ワーニングを表示するように
　しました。
・util::read_langspec_char_from_file() は、
  util::loadresourcefile() と util::loadstring() に分割しました。
・util::loadresourcefile() は、mknmz の最初に呼び出され、
  リソースファイルを読み込みます。
　ファイルが読み込めない場合は、mknmz を終了します。
  (英語モードでは一見リソースファイルが不要に思えますが、実際には
　英語の処理にも文字列リソースを使うため)
・util::loadstring() は、文字列id をキーにリソースから文字列を
　取り出します。
　キーが見つからない場合は、システムエラー(プログラムミス)のため、
　mknmz を終了します。
・文字列id は一部変更しました。また、リソースの追加を行いました。
・utf8 ファイル、切り替え機能は無効なため、削除しました。
　これにより、I18N::Langinfo は使わなくなりましたので、インストール
　の必要はなくなりました。
-- 
=====================================================================
寺西 忠勝(TADAMASA TERANISHI)  yw3t-trns@xxxxxxxxxxxxxxx
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint =  474E 4D93 8E97 11F6 662D  8A42 17F5 52F4 10E7 D14E

Index: namazu/configure.in
===================================================================
RCS file: /storage/cvsroot/namazu/configure.in,v
retrieving revision 1.157
diff -u -r1.157 configure.in
--- namazu/configure.in	1 Aug 2003 08:18:10 -0000	1.157
+++ namazu/configure.in	1 Feb 2004 07:30:58 -0000
@@ -408,6 +408,7 @@
 	   pl/Makefile
 	   pl/var.pl
 	   pl/conf.pl
+	   pl/resource/Makefile
 	   po/Makefile.in 
 	   scripts/Makefile 
 	   scripts/bnamazu
Index: namazu/filter/hnf.pl
===================================================================
RCS file: /storage/cvsroot/namazu/filter/hnf.pl,v
retrieving revision 1.10
diff -u -r1.10 hnf.pl
--- namazu/filter/hnf.pl	29 Nov 2001 09:29:20 -0000	1.10
+++ namazu/filter/hnf.pl	1 Feb 2004 07:30:58 -0000
@@ -69,8 +69,8 @@
 
     my $mark = "# ";
     my $end  = "--";
-    $mark = "■" if util::islang("ja");
-    $end  = "▼" if util::islang("ja");
+    $mark = util::loadstring("hnf.pl_filter_mark") if util::islang("ja");
+    $end  = util::loadstring("hnf.pl_filter_end") if util::islang("ja");
 
     get_uri($cfile, $fields);
     hnf_filter($contref, $weighted_str, $fields, $headings, $cfile, 
Index: namazu/filter/mailnews.pl
===================================================================
RCS file: /storage/cvsroot/namazu/filter/mailnews.pl,v
retrieving revision 1.28
diff -u -r1.28 mailnews.pl
--- namazu/filter/mailnews.pl	7 Oct 2003 05:48:56 -0000	1.28
+++ namazu/filter/mailnews.pl	1 Feb 2004 07:30:58 -0000
@@ -180,9 +180,11 @@
     my @tmp = split(/\n/, $$contref);
     $$contref = "";
 
+    my $regexp = util::loadstring("mailnews.pl_mailnews_citation_filter_1");
+
     # Greeting at the beginning (first one or two lines)
     for (my $i = 0; $i < 2 && defined($tmp[$i]); $i++) {
-	if ($tmp[$i] =~ /(^\s*((([\xa1-\xfe][\xa1-\xfe]){1,8}|([\x21-\x7e]{1,16}))\s*(。|．|\.|，|,|、|\@|＠|の)\s*){0,2}\s*(([\xa1-\xfe][\xa1-\xfe]){1,8}|([\x21-\x7e]{1,16}))\s*(です|と申します|ともうします|といいます)(.{0,2})?\s*$)/) {
+	if ($tmp[$i] =~ /$regexp/) {
 	    # for searching debug info by perl -n00e 'print if /^<<<</'
 	    util::dprint("\n\n<<<<$tmp[$i]>>>>\n\n");
 	    $omake .= $tmp[$i] . "\n";
@@ -206,13 +208,14 @@
     # Isolate meaningless message such as "foo wrote:".
     @tmp = split(/\n\n+/, $$contref);
     $$contref = "";
+    $regexp = util::loadstring("mailnews.pl_mailnews_citation_filter_2");
     my $i = 0;
     for my $line (@tmp) {
 	# Complete excluding is impossible. I tnink it's good enough.
         # Process only first five paragrahs.
 	# And don't handle the paragrah which has five or longer lines.
 	# Hmm, this regex looks very hairly.
-	if ($i < 5 && ($line =~ tr/\n/\n/) <= 5 && $line =~ /(^\s*(Date:|Subject:|Message-ID:|From:|件名|差出人|日時))|(^.+(返事です|reply\s*です|曰く|いわく|書きました|言いました|話で|wrote|said|writes|says)(.{0,2})?\s*$)|(^.*In .*(article|message))|(<\S+\@([\w\-.]\.)+\w+>)/im) {
+	if ($i < 5 && ($line =~ tr/\n/\n/) <= 5 && $line =~ /$regexp/im) {
 	    util::dprint("\n\n<<<<$line>>>>\n\n");
 	    $omake .= $line . "\n";
 	    $line = "";
Index: namazu/filter/man.pl
===================================================================
RCS file: /storage/cvsroot/namazu/filter/man.pl,v
retrieving revision 1.28
diff -u -r1.28 man.pl
--- namazu/filter/man.pl	23 Sep 2002 08:52:32 -0000	1.28
+++ namazu/filter/man.pl	1 Feb 2004 07:30:58 -0000
@@ -142,15 +142,15 @@
     my $weight = $conf::Weight{'html'}->{'title'};
     $$weighted_str .= "\x7f$weight\x7f$title\x7f/$weight\x7f\n";
 
-    if ($$contref =~ /^(?:NAME|名前|名称)\s*\n(.*?)\n\n/ms) {
+    my $regexp = util::loadstring("man.pl_man_filter_name");
+    if ($$contref =~ /$regexp/ms) {
 	$name = "$1::\n";
 	$weight = $conf::Weight{'html'}->{'h1'};
 	$$weighted_str .= "\x7f$weight\x7f$1\x7f/$weight\x7f\n";
     }
 
-    if ($$contref =~ 
-	s/\A(.+^(?:DESCRIPTION 解説|DESCRIPTIONS?|SHELL GRAMMAR|INTRODUCTION|【概要】|解説|説明|機能説明|基本機能説明)\s*\n)//ims) 
-    {
+    $regexp = util::loadstring("man.pl_man_filter_description");
+    if ($$contref =~ s/$regexp//ims) {
 	$$contref = $name . $$contref;
 	$$weighted_str .= "\x7f1\x7f$1\x7f/1\x7f\n";
     }
Index: namazu/pl/Makefile.am
===================================================================
RCS file: /storage/cvsroot/namazu/pl/Makefile.am,v
retrieving revision 1.18
diff -u -r1.18 Makefile.am
--- namazu/pl/Makefile.am	11 Jan 2004 08:57:58 -0000	1.18
+++ namazu/pl/Makefile.am	1 Feb 2004 07:30:59 -0000
@@ -2,6 +2,10 @@
 
 AUTOMAKE_OPTIONS = 1.4 no-dependencies
 
+SUBDIRS = resource
+
+DIST_SUBDIRS = resource
+
 localedir   = $(prefix)/$(DATADIRNAME)/locale
 perllibdir = $(pkgdatadir)/pl
 
Index: namazu/pl/gfilter.pl
===================================================================
RCS file: /storage/cvsroot/namazu/pl/gfilter.pl,v
retrieving revision 1.3
diff -u -r1.3 gfilter.pl
--- namazu/pl/gfilter.pl	30 Jan 2004 14:22:16 -0000	1.3
+++ namazu/pl/gfilter.pl	1 Feb 2004 07:30:59 -0000
@@ -89,13 +89,14 @@
     return undef unless defined($$text);
 
     my @tmp = split(/\n/, $$text);
+    my $regexp = util::loadstring("gfilter.pl_line_adjust_filter");
     for my $line (@tmp) {
 	$line .= "\n";
 	$line =~ s/^[ \>\|\#\:]+//;
 	$line =~ s/ +$//;
 	$line =~ s/\n// if (($line =~ /[\xa1-\xfe]\n*$/) &&
 			    (length($line) >=40));
-	$line =~ s/(。|、)$/$1\n/;
+	$line =~ s/$regexp/$1\n/;
 	$line =~ s/([a-z])-\n/$1/;  # for hyphenation.
     }
     $$text = join('', @tmp);
Index: namazu/pl/util.pl
===================================================================
RCS file: /storage/cvsroot/namazu/pl/util.pl,v
retrieving revision 1.28
diff -u -r1.28 util.pl
--- namazu/pl/util.pl	11 Jan 2004 08:57:58 -0000	1.28
+++ namazu/pl/util.pl	1 Feb 2004 07:30:59 -0000
@@ -313,4 +313,44 @@
     return $_[0] =~ /^[a-z]+:/;
 }
 
+my $PKGDATADIR    = $ENV{'pkgdatadir'} || "/usr/local/share/namazu";
+my $LIBDIR        = $PKGDATADIR . "/pl";      # directory where library etc. are in.
+my $RESOURCEDIR   = $LIBDIR . "/resource";    # directory where resource are in.
+
+my %resource;
+
+sub loadresourcefile () {
+    my $cont;
+    my $fh = util::efopen("$RESOURCEDIR/resource.txt");
+    $cont = util::readfile($fh);
+    $fh->close();
+
+    my @string = split(/\n/, $cont);
+
+    my $line = 1;
+    for my $string (@string) {
+        if ($string =~ /^#|^\s*$/) {
+            # comment
+        } elsif ($string =~ /^(.*)?: (.*)$/) {
+            $resource{$1} = $2;
+        } else {
+            chomp($string);
+            printf("Warning: Syntax error." . 
+                " $RESOURCEDIR/resource.txt(%d): %s\n", $line, $string);
+        }
+        $line++;
+    }
+
+    return undef;
+}
+
+sub loadstring ($) {
+    my ($strid) = @_;
+    my $str = $resource{$strid};
+    
+    cdie("Warning: undefined id: $strid\n") unless (defined $str);
+
+    return $str;
+}
+
 1;
Index: namazu/pl/wakati.pl
===================================================================
RCS file: /storage/cvsroot/namazu/pl/wakati.pl,v
retrieving revision 1.12
diff -u -r1.12 wakati.pl
--- namazu/pl/wakati.pl	15 Jul 2003 04:48:15 -0000	1.12
+++ namazu/pl/wakati.pl	1 Feb 2004 07:30:59 -0000
@@ -54,7 +54,8 @@
     # Collect only noun words when -m option is specified.
     if ($var::Opt{'noun'}) {
 	$$content = "";
-	$$content .= shift(@tmp) =~ /(.+ )名詞/ ? $1 : "" while @tmp; 
+        my $regexp = util::loadstring("wakati.pl_wakatize_japanese");
+	$$content .= shift(@tmp) =~ /$regexp/ ? $1 : "" while @tmp; 
     } else {
 	$$content = join("\n", @tmp);
     }
Index: namazu/scripts/mknmz.in
===================================================================
RCS file: /storage/cvsroot/namazu/scripts/mknmz.in,v
retrieving revision 1.129
diff -u -r1.129 mknmz.in
--- namazu/scripts/mknmz.in	30 Jan 2004 16:41:29 -0000	1.129
+++ namazu/scripts/mknmz.in	1 Feb 2004 07:31:05 -0000
@@ -82,6 +82,8 @@
     # At first, loading pl/conf.pl to prevent overriding some variables.
     preload_modules();
 
+    util::loadresourcefile();
+
     # set LANG and bind textdomain
     util::set_lang();
     textdomain('namazu', $util::LANG_MSG);
@@ -726,7 +728,7 @@
 
     if (util::islang_msg("ja")) {
 	if ($msg eq "unknown") {
-	    return "不明";
+	    return util::loadstring("mknmz.in_getmsg_unknown");
 	}
     }
     return $msg;

Attachment: resource.tar.gz
Description: GNU Zip compressed data

Follow-Ups:
- Re: EUC-JP strings in perl scripts
  - From: Tadamasa Teranishi

References:
- EUC-JP strings in perl scripts
  - From: knok
- Re: EUC-JP strings in perl scripts
  - From: Yukio USUDA
- Re: EUC-JP strings in perl scripts
  - From: knok
- Re: EUC-JP strings in perl scripts
  - From: Akira TAGOH
- Re: EUC-JP strings in perl scripts
  - From: knok
- Re: EUC-JP strings in perl scripts
  - From: Tadamasa Teranishi
- Re: EUC-JP strings in perl scripts
  - From: knok
- Re: EUC-JP strings in perl scripts
  - From: Tadamasa Teranishi

Prev by Date: Re: EUC-JP strings in perl scripts
Next by Date: Re: taro7_10.plでのEncode使�用
Previous by thread: Re: EUC-JP strings in perl scripts
Next by thread: Re: EUC-JP strings in perl scripts
Index(es):
- Date
- Thread