[Namazu-devel-en] Re: making filter/mailnews.pl understand "no
archive" directive
Earl Hood
earl at earlhood.com
Sat Sep 11 00:24:26 JST 2004
On September 7, 2004 at 17:35, Olivier Thereaux wrote:
> Namazu appears to already nicely implement the robots exclusion
> protocol (for HTML) (as seen in the filter() subroutine in
> filter/html.pl), and I am planning to hack namazu to make it behave
> similarly (for mailnews), and I was wondering if you would be able to
> answer these few questions.
You do encounter some semantic problems with adding no-archive support
directly into namazu. Namazu is a search indexing tool, so the
concept of an "archive" is separate from namazu, even though many use
it to index mail archives.
An example of where no-archive support is not desired is for those
that use namazu to index their personal mail folders. In this case,
the user wants messages with no-archive indicator to be indexed.
IMHO, if one desires to not have messages with a no-archive designator
to not be indexed, that message should not be part of namazu's input.
For example, if namazu is being used to index a mail archive, messages
with a no-archive indicator should not be placed in the archive in
the first place.
It may help if you provide some context on why you desire such a
feature in namazu to see if patching namazu is the best solution for
your problem.
> * If I understood correctly, if the filter() subroutine of any given
> filter returns an (error) string, then indexing of this file aborts.
> Could you confirm this?
I cannot answer this one. Of course, a simple test can be done to
confirm the behavior. Unfortunately, the filtering aspects of namazu
are not documented that well.
> * There is no consensus on whether "X-no-archive: " is the only header
> that should trigger such a mechanism. Arguably, for indexing, it could
> also be "X-no-index:", and other headers such as "X-Spam-Status: Yes"
> would be nice, too.
MHonArc also looks for:
Restrict: no-external-archive
The Restrict: header field is the formal way of indicating no archiving,
but the X-no-archive is still widely used.
--ewh
More information about the Namazu-devel-en
mailing list