[Namazu-devel-en] Parsing of MIME multipart messages' boundary too strict

Olivier Thereaux ot at w3.org
Mon Feb 21 13:09:32 JST 2005


Hello Namazu developers,

I have recently noticed that Namazu (using the mailnews.pl filter
through the use of the -h option at indexing time) failed to index
properly some messages, in particular messages sent with Apple's
Mail.app with attachments. Searching any kind of content from these
messages gives no result at all.

I tracked the issue down to the fact that when sending messages with
attachments, Mail.app (like most mail clients) sends a multipart/mixed
MIME message, with a boundary declaration. What Mail.app does not do
(that most other mailers do) is enclose the boundary in double quotes,
e.g it uses
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p
instead of 
Content-Type: multipart/mixed; boundary="gc0p4Jq0M2Yt08j34c0p"  

Both are actually perfectly legitimate, even though the latter is
considered safer. 

Quoting RFC 2046: [[ WARNING TO IMPLEMENTORS: The grammar for parameters
on the Content- type field is such that it is often necessary to enclose
the boundary parameter values in quotes on the Content-type line. This
is not always necessary, but never hurts. Implementors should be sure
to study the grammar carefully in order to avoid producing invalid
Content-type fields. ]] -- http://www.faqs.org/rfcs/rfc2046.html


Namazu's filter/mailnews.pl is therefore too "safe" in its parsing of
the boundary, in effect ignoring some multipart messages it should not.

A very simple patch (from current HEAD) would be something like:

--- mailnews_orig.pl    Mon Feb 21 12:44:47 2005
+++ mailnews.pl Mon Feb 21 12:54:02 2005
@@ -203,8 +203,9 @@
                if ($contenttype =~ m!text/plain!){
                    $$contref .= $body;
                } elsif ($contenttype =~ m!multipart/alternative!){
-                   if ($head =~ /boundary="(.*?)"/i){
+                   if ($head =~ /boundary=(.*?)/i){
                        my $boundary2 = $1;
+                       $boundary2 =~ s/"(.*?)"/$1/;
                        util::dprint("((boundary: $boundary2))\n");
                        $boundary2 =~ s/(\W)/\\$1/g;
                        multipart_process(\$body, $boundary2, $weighted_str, $fields);


Would you mind checking the proposed patch and applying if it looks OK?

Thank you,
-- 
olivier Thereaux
http://www.w3.org/People/olivier/ 
http://yoda.zoy.org/


More information about the Namazu-devel-en mailing list