[Namazu-users-en] Re: namazu stopped working

IEM - network operating center noc at iem.at
Mon Nov 28 19:24:16 JST 2005

Tadamasa Teranishi wrote:
> IEM - network operating center wrote:
> Namazu supports only English (and Japanese). 
> Spanish cannot be correctly processed. 
> In a word, operation when Spanish is input has not been secured. 

but the funny thing is, that back then when i managed to manually get 
the serach engine running again, the spanish name i was suspecting to be 
the cause of my trouble was searchable fine and produced even some results.
(wow that was a long sentence. in short: as written in my first mail, 
the search engine stopped working 2 weeks ago; manually rebuilding the 
index helped then (currently this doesn't work any longer); when i had 
rebuilt the index i was able to search and find a name like "João")

>>i guess it is a problem with some multi-byte characters.
> The cause might be another one. 
> If the document file can be gotten by specifying the document 
> that makes trouble, it is 
> likely to be able to pinpoint the cause. 

i am not sure what you mean here.
should i try to find the document (or one of the documents) that causes 
the trouble?

i was able to track the problem down to following line:
sorry, whenever did you think i&#314;l put windows code in mine?

(character &#314; is a an "l" with some accent originating from a 
spanish person who has obviously some dead-keys on his keyboard (trying 
to type "i'll"))

if you need the entire document, i can send it of course.

> By the way,
> I think that warning is improved by the following corrections. 
> (no guarantee)
> -    $$contref =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a/ /;
> +    $$contref =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a\x80-\xff/ /;

unfortunately this did not help.

more information on my system:
the locale is set to "en_GB.ISO-8859-15": i have no idea how this 
happened, since i am in austria (so there is no need to have a british 
locale); could this be related to the problem? should i choose one with 



