[Namazu-users-en] Re: html data not indexed in text/html mails

Yukio USUDA usuda at hsba.go.jp
Thu Aug 4 21:36:21 JST 2005


swati wrote:

> Hello all,
> This is a sample mail that i was trying to index and search on.
> 

snip

> 
> In this mail I am able to search on the words like rober and streams, which exists in the header part. But the words like fear or member or primers, which exists inside the html part of the mail are not indexed or searched. I tried the new verison of namazu (namazu-2.0.15pre1 ) with that also i am not able to index/search this type of mail.
> 
> Can anyone give some suggestions as to how I can make these mails also indexd and searched.
> 

I made a patch for this type mail (from namazu-2.0.15pre1.)

bash$ diff -ub filter/mailnews.pl.org filter/mailnews.pl
--- filter/mailnews.pl.org      Mon Jun  6 14:41:42 2005
+++ filter/mailnews.pl  Thu Aug  4 21:13:53 2005
@@ -65,7 +65,7 @@
     util::vprint("Processing mail/news file ...\n");
 
     uuencode_filter($cont);
-    mailnews_filter($cont, $weighted_str, $fields);
+    mailnews_filter($cont, $weighted_str, $headings, $fields);
     mailnews_citation_filter($cont, $weighted_str);
 
     gfilter::line_adjust_filter($cont);
@@ -79,11 +79,12 @@
 
 # Original of this code was contributed by <furukawa at tcp-ip.or.jp>. 
 sub mailnews_filter ($$$) {
-    my ($contref, $weighted_str, $fields) = @_;
+    my ($contref, $weighted_str, $headings, $fields) = @_;
 
     my $boundary = "";
     my $line     = "";
     my $partial  = 0;
+    my $htmlmail = "";
 
     $$contref =~ s/^\s+//;
     # Don't handle if first like does'nt seem like a mail/news header.
@@ -125,6 +126,10 @@
                 # contributed by Hiroshi Kato <tumibito at mm.rd.nttdata.co.jp>
                 $partial = $1;
                 util::dprint("((partial: $partial))\n");
+            } elsif ($line =~ m!text/html!i) {
+               # The simplest form of an HTML email message.
+               util::dprint("text/html mail\n");
+               $htmlmail = "yes";
             } elsif ($line !~ m!text/plain!i) {
                 $$contref = '';
                 return;
@@ -161,6 +166,9 @@
        multipart_process($contref, $boundary, $weighted_str, $fields);
 
     }
+    if ($htmlmail) {
+       html::html_filter($contref, $weighted_str, $fields, $headings);
+    }
 }
 
 # Prototype declaration for avoiding


Yukio USUDA



More information about the Namazu-users-en mailing list