[Namazu-users-en] Re: mknmz notworkingforJapanese
languagedocuments ?
Tadamasa Teranishi
yw3t-trns at asahi-net.or.jp
Fri Jun 30 11:58:42 JST 2006
Darren Cook wrote:
>
> I've not used the perl modules, but I can tell you what I do on a site
> that isn't native EUC.
>
> For indexing an English UTF8 site I use:
> mknmz --indexing-lang=en.UTF-8 -e ...
It is a mistake.
Namazu doesn't support UTF-8.
> For indexing a Japanese UTF8 site I use (the -k means use kakasi):
> mknmz --indexing-lang=ja.UTF-8 -k -e ...
It is a mistake.
Namazu doesn't support UTF-8.
(But, it corresponds to the document of ja_JP.UTF-8.)
It is necessary to keep the following.
$ mknmz --indexing-lang=ja_JP.eucjp -k -e ...
The document of ISO-2022-JP, Shift_JIS, and EUC-JP can be handled
though it is specified ja_JP.eucjp.
--indexing-lang option doesn't specify the encoding of the
handled document.
> For searching (I'm using PHP module by the way) I convert the search
> keywords to EUC:
> $kw_euc=mb_convert_encoding($kw,"EUC-JP","UTF8");
The retrieval key word supports only ISO-2022-JP, Shift_JIS,
and EUC-JP.
(UTF-8 is a unsupport. Therefore, it is recommended to convert it
into EUC-JP like this example. )
> Then do the search, then for each search hit I convert the result back
> from EUC to UTF8 ready for display, e.g.:
The retrieval result is sure to become EUC-JP. (for UNIX)
--
=====================================================================
TADAMASA TERANISHI yw3t-trns �� asahi-net.or.jp
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint = 474E 4D93 8E97 11F6 662D 8A42 17F5 52F4 10E7 D14E
More information about the Namazu-users-en
mailing list