Re: 2.0.13 and wide characters problem + index exists but no results problem

Julien Gourdon julio at cnedra.org
Thu Aug 12 07:18:29 JST 2004

Hi Takatsugu !

knok at daionet.gr.jp wrote:

> Unfortunately, current stable Namazu has no support for all UTF-8
> characters.

Does that mean that we're doomed ? ;)
I've got a few questions regarding the issues Arkadiusz pointed out in
his mail on 27th of June.
I'm making archives of a french newsgroup with mhonarc, I then generate
an index file with mknmz and use namazu as a search tool.
Recently I've reached more than 250 000 text files in my database. And
I've seen losts of these errors when using mknmz:

Malformed UTF-8 character (unexpected continuation byte XYZ, with no
preceding start byte) messages


Wide character in print at /usr/bin/mknmz line 2475.

As a result, the answers to my queries are now always empty !!
With the full index:
With a temporary index I'm currently rebuilding:

I've spent weeks and weeks generating my index file (the index is now
370Mo), so I'd need to know if I have to re-build the whole thing...
Would using the fedora patch solve my problem ? Is there a way to
"clean" the index ? A mknmz option perhaps ? (I've not found it btw)

Namazu is really a great tool (and thanks a lot for bringing it to us
!), but it would really be a pain if I had to build my index again...

For your info:
12/08 0:04 julio at gourdon /mnt/frcd% mknmz -C
Loaded rcfile: /etc/namazu/mknmzrc
System: linux
Namazu: 2.0.13
Perl: 5.008004
File-MMagic: 1.22
NKF: module_nkf
KAKASI: module_kakasi -ieuc -oeuc -w
ChaSen: module_chasen -j -F '%m '
Wakati: module_kakasi -ieuc -oeuc -w
Lang_Msg: en_IE
Lang: en_IE
Coding System: euc
CONFDIR: /etc/namazu
LIBDIR: /usr/share/namazu/pl
FILTERDIR: /usr/share/namazu/filter
TEMPLATEDIR: /usr/share/namazu/template
Supported media types:   (22)
Unsupported media types: (11) marked with minus (-) probably missing
application in your $path.
- application/excel: excel.pl
  application/ichitaro5: taro56.pl
  application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
  application/macbinary: macbinary.pl
- application/msword: msword.pl
- application/pdf: pdf.pl
  application/postscript: postscript.pl
- application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
  application/vnd.sun.xml.calc: ooo.pl
  application/vnd.sun.xml.draw: ooo.pl
  application/vnd.sun.xml.impress: ooo.pl
  application/vnd.sun.xml.writer: ooo.pl
  application/x-apache-cache: apachecache.pl
  application/x-bzip2: bzip2.pl
  application/x-compress: compress.pl
  application/x-deb: deb.pl
- application/x-dvi: dvi.pl
  application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
- application/x-rpm: rpm.pl
- application/x-tex: tex.pl
- audio/mpeg: mp3.pl
  message/news: mailnews.pl
  message/rfc822: mailnews.pl
  text/hnf: hnf.pl
  text/html: html.pl
  text/html; x-type=mhonarc: mhonarc.pl
  text/plain; x-type=rfc: rfc.pl
  text/x-hdml: hdml.pl
  text/x-roff: man.pl
zsh: 1803 exit 1     mknmz -C

Thanks a lot !

Julien Gourdon <julio at cnedra.org>

