[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Trying to hook into Namazu

Hello all,

Let me first introduce myself: I am a graduate student of computer science from the
Eindhoven University of Technology, the Netherlands. I am working on a document 
retieval research project and for my research I need a search engine which can 
spit out the following things about its results:

- some kind of document ID
- document frequency for all search terms (so .. in how many documents does term A occur)
- term frequency for all terms for each found document (so .. how many times does term A occur in document i)
- the length (preferably in characters) of each found document
- size of the searched document collection

I have been digging throught the Namazu source code, looking for a location where I could
place a hook to add these things to the search result but I found that it will cost me a lot of time
just to find out what some of the functions do without having an idea of the whole architecture.

I think I could change nmz_get_hlist() to add raw term frequencies for all search terms
to the nmz_data struct it is building. But I'm not sure because I don't understand what 
nmz_read_unpackw() and nmz_get_unpackw() do.

Can someone please give his/her idea of the feasibility of the things I'm trying to do?


	Roel Brand