[Namazu-users-en] Re: namazu stopped working
IEM - network operating center
noc at iem.at
Fri Nov 25 01:28:46 JST 2005
IEM - network operating center wrote:
>
> debugging output:
> running "namazu" from the command-line with debugging options gives me
> following ("OSC" is a keyword which pops up every now and then)
>
> %> namazu -c -d --config=pd-list "OSC"
> namazu(debug): NAMAZUNORC: ''
> namazu(debug): load_rcfile: /etc/namazu/namazurc loaded
> namazu(debug): 5: Directive: [Index]
> namazu(debug): Argument 1: [/var/lib/namazu/index/pd-list]
> namazu(debug): 12: Directive: [Template]
> namazu(debug): Argument 1: [/var/lib/namazu/index/pd-list]
> namazu(debug): 14: Directive: [Replace]
> namazu(debug): Argument 1: [/var/lib/mailman/archives/private]
> namazu(debug): Argument 2: [/pipermail]
> namazu(debug): 20: Directive: [Logging]
> namazu(debug): Argument 1: [on]
> namazu(debug): load_rcfile: pd-list loaded
> namazu(debug): -n: 20
> namazu(debug): -w: 0
> namazu(debug): query: [OSC]
> namazu(debug): Index name [0]: /var/lib/namazu/index/pd-list
> namazu(debug): set_phrase_trick: OSC
> namazu(debug): set_regex_trick: OSC
> namazu(debug): query.tokennum: 1
> namazu(debug): query.tab[0]: OSC
> namazu(debug): size of /var/lib/namazu/index/pd-list/NMZ.t: 132748
> namazu(debug): before nmz_strlower: [OSC]
> namazu(debug): after nmz_strlower: [osc]
> namazu(debug): do WORD search
> namazu(debug): size of /var/lib/namazu/index/pd-list/NMZ.ii: 1492960
> namazu(debug): l:0: !
> namazu(debug): r:373239: µÎ¬
> namazu(debug): searching: ..)
> namazu(debug): searching:
> namazu(debug): searching: khz.
...
so after a bit more research i found, that NMZ.ii does not return the
correct offset.
as far as i understand it the search::nmz_binsearch() performs a binary
search of the keyword using NMZ.wi to look up which byte-offset a given
line has in NMZ.w (with each keyord in a separate line)
it first starts with line 186620 [=(373239+1)/2=(r+1)/2] which in fact
contains "clean;" but namazu thinks that it contains "..)"
more research revealed, that the byte-offset returned from NMZ.wi points
into the middle of a line "clean....)"; however, since the so found term
in "..)" the binary search miserably fails.
i guess it is a problem with some multi-byte characters.
(which reminds me that when i build the index i get some warnings:
"Wide character in print at /usr/bin/mknmz line 2447, <GEN7162> line
158600.")
any hints how i should proceed?
mfg.asdr
IOhannes
mfg.asdr.
IOhannes
More information about the Namazu-users-en
mailing list