[Namazu-users-en] Namazu indexing issues for PDF

Saravanan G gsaravananinfy at gmail.com
Wed Jan 26 23:13:52 JST 2011


Hi,
I am using an older version of Namazu.
Recently, i upgraded Namazu to the latest version, along with the latest
versions of the following allied softwares:-
Perl 5.12.0
NKF 2.1.0
latest kakasi
latest text-Kakasi
latest File-MMagic
latest chasen, Darts etc.
GCC also is the latest.

The problem i encounter is that when i generate the index for a pdf which
has  a graph containing a list of double width katakana words separated with
single width katakana spaces, the NMZ.w file generates the index by removing
the single width spaces and consider the list of katakana words as a single
word.
For example:
the pdf has イタリア インド ポルトガル
the indexed words in NMZ.w are
イタリアインドポルトガル
Thus when i search for イタリア i do not get a hit.

The above problem does not arise when i use the old version of namazu that
was installed way back in 2003.

The newly installed namazu and the old namazu use the same xpdf.

i converted the pdf to txt file and indexed it, even then the index was like
イタリアインドポルトガル.

i installed the old NKF version and reinstalled namazu, still could not
solve the problem.

I have my doubts on the new perl version.

Any help with this regard would be appreciated.

Alan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.namazu.org/pipermail/namazu-users-en/attachments/20110126/fd3ee74e/attachment.htm>


More information about the Namazu-users-en mailing list