[Namazu-users-en] Re:
mknmz notworkingforJapanese languagedocuments ?
Darren Cook
darren at dcook.org
Fri Jun 30 10:50:28 JST 2006
> What we can find is dependant on the character encoding setting used by
> the browser doing the search.
> The documents we built the index from are very likely to be using
> several different Japanese character encodings. (ex. Shift_JIS, EUC-JP).
I've not used the perl modules, but I can tell you what I do on a site
that isn't native EUC.
For indexing an English UTF8 site I use:
mknmz --indexing-lang=en.UTF-8 -e ...
For indexing a Japanese UTF8 site I use (the -k means use kakasi):
mknmz --indexing-lang=ja.UTF-8 -k -e ...
For searching (I'm using PHP module by the way) I convert the search
keywords to EUC:
$kw_euc=mb_convert_encoding($kw,"EUC-JP","UTF8");
Then do the search, then for each search hit I convert the result back
from EUC to UTF8 ready for display, e.g.:
$title=mb_convert_encoding(
nmz_result_field($hlist,$n,'subject'),
'UTF8','EUC-JP');
Darren
More information about the Namazu-users-en
mailing list