[Namazu-users-en] Re: Problems with mknmz and Perl 5.8.6

Tadamasa Teranishi yw3t-trns at asahi-net.or.jp
Mon Jun 13 06:09:49 JST 2005


Earl Hood wrote:
> 
> > Please use it by ASCII text-only.
...
> It is worth noting that namazu-users-en/2000-07/msg00000.html is
> actually in ASCII in the raw form (something mhonarc does by default).
> The problem is with the unicode character entity references >= 256.
> I.e.  MHonArc, by default, converts the iso-2022-jp character data
> into raw ASCII, using unicode character entity references for Japanase
> characters.  So the raw HTML input is ASCII-only.

Having written only ASCII character was not accurate. 
It was an expression to invite misunderstanding. 

Even if it is ASCII character, it is not good according to the 
character entity references. 
Namazu corresponds to a pure ASCII-only text alone without the 
character entity references. 

Please use it by pure ASCII text-only.

> What kind of consideration has there been for supporting Unicode
> (UTF-8) in namazu?

It is a problem for Unicode of Namazu. 
The text of ja_JP.UTF-8 can be processed by combining with nkf 2.0.5 
if it limits it to a Japanese environment. 
It is scheduled to do for complete ja_JP.UTF-8 in Namazu 2.2.X. 
Namazu's doing to Unicode for complete will become it further 
previously. 

There is a hand evaded for the time being by the method of deleting 
numberd entity because it is difficult for Unicode. 
The decode_numbered_entity subroutine of filter/html.pl is rewritten 
as follows. 

sub decode_numbered_entity ($) {
    my ($num) = @_;
    return "?";
}

Numberd entity can be substituted for '?' by doing so. 
Perhaps, it doesn't influence harmfully by substituting it for '?'. 
--
=====================================================================
TADAMASA TERANISHI
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint =  474E 4D93 8E97 11F6 662D  8A42 17F5 52F4 10E7 D14E



More information about the Namazu-users-en mailing list