[Namazu-users-en] Re: Malformed UTF-8 character

Tadamasa Teranishi yw3t-trns at asahi-net.or.jp
Wed Jun 15 04:19:40 JST 2005


Earl Hood wrote:
> 
> So non-printable characters and some whitespace characters do not
> constitute word boundaries?  You realize that characters like tab
> (ASCII 9) and form-feed (ASCII 12) are not being treated as word
> boundaries.  I think this is a mistake.

It is case by case. might.
Maybe, the correct answer doesn't exist. 

In the sample, because the control code from 0 to 31 was deleted, 
it doesn't pass. 

> The code you have will combine two words into one.  For example:
> 
>   hello	there
> 
> Will get filtered to:
> 
>   hellothere
> 
> Using '?' for the replacement will have:
> 
>   hello?there
> 
> which, hopefully, will cause mknmz to treat "hello" and "there"
> as two separate words.

If it wants to do this, neatly converting it into TAB no "?" it 
is better. 
Or, no do be known whether converting it into SPACE is better.
-- 
=====================================================================
TADAMASA TERANISHI
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint =  474E 4D93 8E97 11F6 662D  8A42 17F5 52F4 10E7 D14E



More information about the Namazu-users-en mailing list