[Namazu-users-en] Re: Problems with date sorts
fmouse-namazu at fmp.com
Sun Jan 14 03:14:51 JST 2007
On Sat, 2007-01-13 at 13:58 +0900, Tadamasa Teranishi wrote:
> Lindsay Haisley wrote:
> > > Does date information accurately follow the form of RFC2822 by
> > > all documents of MailMan?
> > >
> > > Is there mail with an illegal Date: field ?
> > >
> > > Please show the Date: field of the mail.
> > OK, here is an example. I used the following query:
> > http://www.kca-tx.org/mailman/kca/namazu.cgi?query=Laptop&submit=Search%21&idxname=kca&max=100&result=short&sort=date%3Aearly
> > Here's the result:
> > 1. win 98SE (score: 2)
> > /pipermail/kca/2002-September/000192.html (4,152 bytes)
> > You can see from path names that these are out of order. Here are the Date
> > fields in each of these, copy-n-pasted from the files themselves:
> > Sun Sep 22 14:07:51 CDT 2002
> > Fri Jul 19 11:54:00 CDT 2002
> > Fri Aug 23 08:44:57 CDT 2002
> > Mon Mar 3 10:14:12 CST 2003
> > Tue Jan 14 17:37:11 CST 2003
> To begin with, Date of pipermail was not RFC2822 form.
> However, Date of pipermail is correctly reflected in the field 'Date'.
The pipermail format contains no RFC822 header, but morphs the "Date"
header from the original post into an HTML element, and the date format
is different from what was in the original message/rfc822 format. This
is apparently correctly parsed by mknmz. For instance, the original
Date: Sun, 22 Sep 2002 14:07:51 -0500
Whereas Mailman's pipermail conversion converts this to:
<I>Sun Sep 22 14:07:51 CDT 2002</I>
Namazu must handle this difference. Pipermail files, at least those
generated by Mailman, aren't in RFC822 format, nor can the program
reasonably expect them to be in this format.
> Then, let's examine the time stamp of pipermail file next.
> For instance, please confirm the following file and confirm the date
> by 'ls -alF'.
The Unix time stamp is irrelevant! The entire pipermail archive file
hierarchy can be rebuilt using the Mailman "arch" utility (arch --wipe
listname) in which case the time stamp on these files will be the time of the
rebuild, not the posting time. No indexing utility (or any other utility for
that matter) can expect to get valid posting date/time data from the Unix file
> One is a method of changing the time stamp of the file according
> to information on Date of contents of the file.
No accessory utility such as namazu should ever make any changes to the
data source it's analyzing, even if it's only a matter of changing the
Unix time stamp.
Even if this weren't bad software design, it's a rather excessive and
inefficient way to do the job. If namazu can correctly parse out the date from
a pipermail html file, as it seems to be able to do in pipermail.pl, then it
can certainly store this information in an index so it can be used to sort the
files in correct date order. Moreover, although I haven't looked at all the
code, nowhere in the code I looked at did it look as if namazu was trying to
sequence files based on the Unix time stamp!
I'm not sure what you're saying here. Does namazu expect to be able to do date
sorts based on the Unix timestamp on pipermail files? I find this hard to
believe since the namazu code looks pretty intelligent otherwise. I'll run
some tests to see if this is true, but if it is, we'll have to abandon our
project to integrate namazu into mailman since this kind of behavior would
constitute a serious design flaw.
> Another one is a method of using the field sorting UTC function.
> Do you know the method of setting the field sorting UTC function?
I'm not familiar with the field sorting UTC function. What does namazu
use, and where in the code is the sorting done?
Are you one of the authors of namazu?
Lindsay Haisley | "The voice of dissent | PGP public key
FMP Computer Services | was arrested before the | available at
512-259-1190 | president cleared his | http://pubkeys.fmp.com
http://www.fmp.com | throat to speak |
| of freedom" |
| (Chris Chandler) |
More information about the Namazu-users-en