[Namazu-users-en] Re: html data not indexed in text/html mails
Yukio USUDA
usuda at hsba.go.jp
Thu Aug 4 21:36:21 JST 2005
swati wrote:
> Hello all,
> This is a sample mail that i was trying to index and search on.
>
snip
>
> In this mail I am able to search on the words like rober and streams, which exists in the header part. But the words like fear or member or primers, which exists inside the html part of the mail are not indexed or searched. I tried the new verison of namazu (namazu-2.0.15pre1 ) with that also i am not able to index/search this type of mail.
>
> Can anyone give some suggestions as to how I can make these mails also indexd and searched.
>
I made a patch for this type mail (from namazu-2.0.15pre1.)
bash$ diff -ub filter/mailnews.pl.org filter/mailnews.pl
--- filter/mailnews.pl.org Mon Jun 6 14:41:42 2005
+++ filter/mailnews.pl Thu Aug 4 21:13:53 2005
@@ -65,7 +65,7 @@
util::vprint("Processing mail/news file ...\n");
uuencode_filter($cont);
- mailnews_filter($cont, $weighted_str, $fields);
+ mailnews_filter($cont, $weighted_str, $headings, $fields);
mailnews_citation_filter($cont, $weighted_str);
gfilter::line_adjust_filter($cont);
@@ -79,11 +79,12 @@
# Original of this code was contributed by <furukawa at tcp-ip.or.jp>.
sub mailnews_filter ($$$) {
- my ($contref, $weighted_str, $fields) = @_;
+ my ($contref, $weighted_str, $headings, $fields) = @_;
my $boundary = "";
my $line = "";
my $partial = 0;
+ my $htmlmail = "";
$$contref =~ s/^\s+//;
# Don't handle if first like does'nt seem like a mail/news header.
@@ -125,6 +126,10 @@
# contributed by Hiroshi Kato <tumibito at mm.rd.nttdata.co.jp>
$partial = $1;
util::dprint("((partial: $partial))\n");
+ } elsif ($line =~ m!text/html!i) {
+ # The simplest form of an HTML email message.
+ util::dprint("text/html mail\n");
+ $htmlmail = "yes";
} elsif ($line !~ m!text/plain!i) {
$$contref = '';
return;
@@ -161,6 +166,9 @@
multipart_process($contref, $boundary, $weighted_str, $fields);
}
+ if ($htmlmail) {
+ html::html_filter($contref, $weighted_str, $fields, $headings);
+ }
}
# Prototype declaration for avoiding
Yukio USUDA
More information about the Namazu-users-en
mailing list