Namazu-users-en(old)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mailman & "Charactères Français"



> How about your envrionment? The follwoing is mine:
> 
> Debian GNU/Linux (today's unstable)
> Linux 2.4.21-pre4
> glibc 2.3.1

My environment is

Redhat 9
Linux Kernel 2.4.20-8
glibc 2.3.2

I think the problem  has something to do with pipermail.pl - I use a sh script
that does the following:

cd /var/mailman/namazu
/usr/bin/mknmz --media-type='text/html; x-type=pipermail' \
     /var/mailman/archives/public/*

I have cut & paste (using editor pico) relevant sections of 1) the original mail
2) the namazu search that ignores (error) the characters, and 3) the file that
namazu builds the search index from.

1) ************************************************** This is the email: ***

Content-Type: text/plain; charset=UTF-8
Subject: Mailman & =?iso-8859-1?q?=22Charact=E8res_Fran=E7ais=22?=
Content-Transfer-Encoding: 8bit

SVP ignorer... ceci est un test.

[ àéèêçôî ]

2) ******************************** This is the HTML of a Namazu search: ***

<strong> Total <!-- HIT -->1<!-- HIT --> documents matching your query.</strong></p>

<dl><dt>1. <strong><a
href="http://[snip]/pipermail//cipt/2003-June/000000.html";>Mailman
&amp;"Charactres Franais"</a></strong> (score : 2)
</dt><dd><strong>Auteur</strong> :
<em>[snip]?Subject=Mailman%20%26%20%3D%3Fiso-8859-1%3Fq%3F%3D22Charact%3DE8res_Fran%3DE7ais%3D22%3F%3D&amp;In-Reply-To=</em>
</dd><dd><strong>Date</strong> : <em>Tue, 03 Jun 2003 09:58:31</em>

</dd><dd>Mailman &amp;"Charactres Franais" SVP ignorer... ceci est un <strong
class="keyword">test</strong>. [ ]
</dd><dd><a
href="http://[snip]/pipermail//cipt/2003-June/000000.html";>http://[snip]/pipermail//cipt/2003-June/000000.html</a>
(1,860 octets)

3) ************ /var/mailman/archives/private/cipt/2003-June/000000.html ***

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=us-ascii">
<TITLE> Mailman &amp;&quot;Charact&#232;res Fran&#231;ais&quot;</TITLE>

<!--beginarticle-->
<PRE>SVP ignorer... ceci est un test.

[ &#224;&#233;&#232;&#234;&#231;&#244;&#238; ]


</PRE>
<!--endarticle-->

*****************************************************************************

Notice the &#integer; that namazu is removing from it's search results.

Any insight would be appreciated!