Namazu-users-en(old)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Stop Words



On Sat, 3 Nov 2001, Subramanian Radhakrishnan wrote:

> How to implement stop words in Namazu search. what are all the files
> are need to be modified for this purpose...

Before you try my way, I'd really suggest that you pre-process the query 
string with something like perl before sending it to namazu.  Iff that 
doesn't work for you, then try this:

Instructions follow:

These are for namazu-2.0.5, I have not tested with later versions.  It 
is also not the best way to do it, I can think of better ways, but 
haven't tried it yet.  I will try and get this to work similarly to the 
rest of namazu, but for now, it works for me.

I also have an implementation of synonyms along the same lines.

I have attached two files - stop-list.c and stop-list.h

Additionally, you will need to create a text file called stopwords.txt 
with one word per line.  This file will be in the same directory as your 
index.

you have to put these in nmz/ directory, and add the following to 
nmz/query.c:

#include "stop-list.h"         (at the top)


nmz_make_query():

after:
    /* If too much items in query, return with error */
    if (tokennum > QUERY_TOKEN_MAX) {
        return ERR_TOO_MANY_TOKENS;
    }

add:
    /* Read stop list from file */
    read_stop_list();


after:
        if (query.str[i] != '\0')
            query.str[i++] = '\0';

add:
        /* If the word is in the stop list, then purge it */
        if(is_stop_word(query.tab[tokennum])) {
                query.tab[tokennum] = (char *) NULL;
        }

after end of for loop, add:
    /* Clear stop list */
    clear_word_list();



-- 
The program isn't debugged until the last user is dead.


Visit my webpage at http://www.ncst.ernet.in/~philip/
Read my writings at http://www.ncst.ernet.in/~philip/writings/

  MSN  philiptellis                         Yahoo!  philiptellis
  AIM  philiptellis                         ICQ     129711328

Attachment: stop-list.tar.gz
Description: GNU Zip compressed data