[Namazu-devel-en] making filter/mailnews.pl understand "no archive"
ot at w3.org
Tue Sep 7 17:35:58 JST 2004
Hello Namazu developers,
I have been doing some research on how indexing/archiving software can
ignore certain rfc(2)822/mailnews documents.
A similar idea is already implemented for HTML documents in the form of
"noindex, nofollow" directives in the robots exclusion protocol. I
found out that most of the mail archiving software (e.g
Mhonarc,hypermail) I could find implement a "X-no-archive"
directive which makes them ignore the specific message.
Namazu appears to already nicely implement the robots exclusion
protocol (for HTML) (as seen in the filter() subroutine in
filter/html.pl), and I am planning to hack namazu to make it behave
similarly (for mailnews), and I was wondering if you would be able to
answer these few questions.
* If I understood correctly, if the filter() subroutine of any given
filter returns an (error) string, then indexing of this file aborts.
Could you confirm this?
* There is no consensus on whether "X-no-archive: " is the only header
that should trigger such a mechanism. Arguably, for indexing, it could
also be "X-no-index:", and other headers such as "X-Spam-Status: Yes"
would be nice, too. Would it be OK if name of the trigger header(s)
was/were made an option in mknmzrc?
And finally... when I am done with this patch, will you be interested
in including it within the namazu distribution?
Thank you very much.
More information about the Namazu-devel-en