[Namazu-users-en] Re: mknmz notworkingforJapanese languagedocuments ?

Darren Cook darren at dcook.org
Fri Jun 30 10:50:28 JST 2006


> What we can find is dependant on the character encoding  setting used by 
> the browser doing the search.
> The documents we built the index from are very likely to be using 
> several different Japanese character encodings. (ex. Shift_JIS, EUC-JP).

I've not used the perl modules, but I can tell you what I do on a site
that isn't native EUC.

For indexing an English UTF8 site I use:
  mknmz --indexing-lang=en.UTF-8 -e ...

For indexing a Japanese UTF8 site I use (the -k means use kakasi):
  mknmz --indexing-lang=ja.UTF-8 -k -e ...

For searching (I'm using PHP module by the way) I convert the search
keywords to EUC:
  $kw_euc=mb_convert_encoding($kw,"EUC-JP","UTF8");

Then do the search, then for each search hit I convert the result back
from EUC to UTF8 ready for display, e.g.:
  $title=mb_convert_encoding(
	nmz_result_field($hlist,$n,'subject'),
	'UTF8','EUC-JP');

Darren


More information about the Namazu-users-en mailing list