From gauthier.v at skynet.be Sun Dec 3 05:29:03 2006
From: gauthier.v at skynet.be (Gauthier Vandemoortele)
Date: Sun Dec 3 05:29:11 2006
Subject: [Namazu-users-en] What about images/EXIF ?
Message-ID: <20061202202903.GA2123@fantasio>
Hello,
At first, thanks to all developpers for this software.
I'm just beginning to use it, and I see nothing in the filters about
images. With the option to specify textual data in some format of images
like jpg (EXIF-tags) or gif, I think it could be very interessant to
search through images collections with Namazu.
Is it something done, or planned about that ? I've done some search via Google
(and namazu, yes :-) on www.namazu.org and found nothing in english
(but it seems talking about that in japanese:
http://emacs-w3m.namazu.org/ml/msg06711.html)
So ?
Thanks in advance,
--
Gauthier
From yw3t-trns at asahi-net.or.jp Sun Dec 3 05:42:42 2006
From: yw3t-trns at asahi-net.or.jp (Tadamasa Teranishi)
Date: Sun Dec 3 05:42:57 2006
Subject: [Namazu-users-en] Re: What about images/EXIF ?
References: <20061202202903.GA2123@fantasio>
Message-ID: <4571E542.F50730C4@asahi-net.or.jp>
Gauthier Vandemoortele wrote:
>
> I'm just beginning to use it, and I see nothing in the filters about
> images. With the option to specify textual data in some format of images
> like jpg (EXIF-tags) or gif, I think it could be very interessant to
> search through images collections with Namazu.
Mr. koi_san is making the image filter though it is not understood
whether to examine even EXIF-tag.
http://www.interq.or.jp/japan/koi_san/trash/2004/namazu_filter2.htm
--
=====================================================================
TADAMASA TERANISHI yw3t-trns@asahi-net.or.jp
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint = 474E 4D93 8E97 11F6 662D 8A42 17F5 52F4 10E7 D14E
From gauthier.v at skynet.be Sun Dec 3 18:47:31 2006
From: gauthier.v at skynet.be (Gauthier Vandemoortele)
Date: Sun Dec 3 18:47:46 2006
Subject: [Namazu-users-en] Re: What about images/EXIF ?
In-Reply-To: <4571E542.F50730C4@asahi-net.or.jp>
References: <20061202202903.GA2123@fantasio>
<4571E542.F50730C4@asahi-net.or.jp>
Message-ID: <20061203094731.GA1304@fantasio>
Hello,
Le Sun 3 Dec, Tadamasa Teranishi m'a ?crit:
> Mr. koi_san is making the image filter though it is not understood
> whether to examine even EXIF-tag.
>
> http://www.interq.or.jp/japan/koi_san/trash/2004/namazu_filter2.htm
Many thanks (and thanks to Google's translating page :-)
I've installed this filter with the others in usr/share/namazu/filter/,
installed the required modules Image::Info, IO::String (making the "make
test" and all seems ok).
The Image::Info comes with a directory of sample images, and a test-script
that dumps info from these images. There also, all seemed ok.
I've adapted the mknmsrc sample (see below) and tried as root:
mknmz -d -V -f /etc/namazu/mknmzrc.img -O /var/namazu/index/img/
/root/Image-Info-1.16/img/
and no file is indexed:
@@ Reading rcfile:
@@ Reading rcfile:
@@ /etc/namazu/mknmzrc.img
// Invoked: /usr/bin/wvWare --version
// Invoked: /usr/bin/pdftotext
// Invoked: /usr/bin/pdfinfo
// tmpnam: /var/namazu/index/img//NMZ.tmp_i.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_p.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_pi.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_w.tmp
// tmpnam: /var/namazu/index/img//NMZ.checkpoint.tmp
// tmpnam: /var/namazu/index/img//NMZ.flist.tmp
// tmpnam: /var/namazu/index/img//NMZ.i.tmp
// tmpnam: /var/namazu/index/img//NMZ.ii.tmp
// tmpnam: /var/namazu/index/img//NMZ.p.tmp
// tmpnam: /var/namazu/index/img//NMZ.pi.tmp
// tmpnam: /var/namazu/index/img//NMZ.r.tmp
// tmpnam: /var/namazu/index/img//NMZ.t.tmp
// tmpnam: /var/namazu/index/img//NMZ.w.tmp
// tmpnam: /var/namazu/index/img//NMZ.wi.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_i.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_p.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_pi.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_w.tmp
// NMZ: /var/namazu/index/img//NMZ.checkpoint.tmp
// NMZ: /var/namazu/index/img//NMZ.flist.tmp
// NMZ: /var/namazu/index/img//NMZ.i.tmp
// NMZ: /var/namazu/index/img//NMZ.ii.tmp
// NMZ: /var/namazu/index/img//NMZ.p.tmp
// NMZ: /var/namazu/index/img//NMZ.pi.tmp
// NMZ: /var/namazu/index/img//NMZ.r.tmp
// NMZ: /var/namazu/index/img//NMZ.t.tmp
// NMZ: /var/namazu/index/img//NMZ.w.tmp
// NMZ: /var/namazu/index/img//NMZ.wi.tmp
// NMZ: /var/namazu/index/img//NMZ.body
// NMZ: /var/namazu/index/img//NMZ.err
// NMZ: /var/namazu/index/img//NMZ.field
// NMZ: /var/namazu/index/img//NMZ.foot
// NMZ: /var/namazu/index/img//NMZ.head
// NMZ: /var/namazu/index/img//NMZ.i
// NMZ: /var/namazu/index/img//NMZ.ii
// NMZ: /var/namazu/index/img//NMZ.lock
// NMZ: /var/namazu/index/img//NMZ.lock2
// NMZ: /var/namazu/index/img//NMZ.log
// NMZ: /var/namazu/index/img//NMZ.msg
// NMZ: /var/namazu/index/img//NMZ.p
// NMZ: /var/namazu/index/img//NMZ.pi
// NMZ: /var/namazu/index/img//NMZ.r
// NMZ: /var/namazu/index/img//NMZ.result
// NMZ: /var/namazu/index/img//NMZ.slog
// NMZ: /var/namazu/index/img//NMZ.status
// NMZ: /var/namazu/index/img//NMZ.t
// NMZ: /var/namazu/index/img//NMZ.tips
// NMZ: /var/namazu/index/img//NMZ.version
// NMZ: /var/namazu/index/img//NMZ.w
// NMZ: /var/namazu/index/img//NMZ.wi
Looking for indexing files...
@@ find_target starting: Sun Dec 3 10:35:15 2006
@@ Denied: /root/Image-Info-1.16/img/test.jpg
@@ Not allowed: /root/Image-Info-1.16/img/test.svg
@@ Denied: /root/Image-Info-1.16/img/gps.jpg
@@ Denied: /root/Image-Info-1.16/img/test.png
@@ Not allowed: /root/Image-Info-1.16/img/test.rle
@@ Not allowed: /root/Image-Info-1.16/img/test.xbm
@@ Not allowed: /root/Image-Info-1.16/img/test.ppm
@@ Not allowed: /root/Image-Info-1.16/img/tiny.pgm
@@ Not allowed: /root/Image-Info-1.16/img/test.pgm
@@ Denied: /root/Image-Info-1.16/img/test.gif
@@ Not allowed: /root/Image-Info-1.16/img/test.xpm
@@ Not allowed: /root/Image-Info-1.16/img/test.pbm
@@ find_target finished: Sun Dec 3 10:35:15 2006
@@ Target Files: 0 (Scan Performance: Elapsed Sec.: 1, Files/sec: 0.0)
@@ Possible: 12, Not allowed: 8, Denied: 4, Excluded: 0
@@ MTIME too old: 0, MTIME too new: 0
No files to index.
Here is the result of mknmz -C; as you can see, the images are preceeded
with a minus (???). I've recompiled namazu after having installed the
new libraries. And to be sure that my previous /etc/namazu/mknmzrc
doesn't interfere, I've renamed it /etc/namazu/mknmzrc.all (for all other
mime-types)
Do you see some explanations ?
System: linux
Namazu: 2.0.16
Perl: 5.008004
File-MMagic: 1.25
NKF: no
KAKASI: no
ChaSen: no
MeCab: no
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /etc/namazu
LIBDIR: /usr/share/namazu/pl
FILTERDIR: /usr/share/namazu/filter
TEMPLATEDIR: /usr/share/namazu/template
Supported media types: (37)
Unsupported media types: (11) marked with minus (-) probably missing application in your $path.
application/excel: excel.pl
application/gnumeric: gnumeric.pl
application/ichitaro5: taro56.pl
application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
application/macbinary: macbinary.pl
application/msword: msword.pl
application/pdf: pdf.pl
application/postscript: postscript.pl
application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
application/vnd.kde.kivio: koffice.pl
application/vnd.kde.kpresenter: koffice.pl
application/vnd.kde.kspread: koffice.pl
application/vnd.kde.kword: koffice.pl
application/vnd.oasis.opendocument.graphics: ooo.pl
application/vnd.oasis.opendocument.presentation: ooo.pl
application/vnd.oasis.opendocument.spreadsheet: ooo.pl
application/vnd.oasis.opendocument.text: ooo.pl
application/vnd.sun.xml.calc: ooo.pl
application/vnd.sun.xml.draw: ooo.pl
application/vnd.sun.xml.impress: ooo.pl
application/vnd.sun.xml.writer: ooo.pl
application/x-apache-cache: apachecache.pl
application/x-bzip2: bzip2.pl
application/x-compress: compress.pl
- application/x-deb: deb.pl
- application/x-dvi: dvi.pl
application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
application/x-rpm: rpm.pl
- application/x-tex: tex.pl
application/x-zip: zip.pl
- audio/mpeg: mp3.pl
- image/bmp: image.pl
- image/gif: image.pl
- image/jpeg: image.pl
- image/png: image.pl
message/news: mailnews.pl
message/rfc822: mailnews.pl
text/hnf: hnf.pl
text/html: html.pl
text/html; x-type=mhonarc: mhonarc.pl
text/html; x-type=pipermail: pipermail.pl
text/plain
text/plain; x-type=rfc: rfc.pl
text/x-hdml: hdml.pl
text/x-roff: man.pl
Here is my /etc/namazu/mknmzrc.img
#
# This is a Namazu configuration file for mknmz.
#
package conf; # Don't remove this line!
#===================================================================
#
# Administrator's email address
#
$ADDRESS = 'gauthier@courrier.adt';
#===================================================================
#
# Regular Expression Patterns
#
#
# This pattern specifies HTML suffixes.
#
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";
#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
# Do NOT use `$' or `^' anchors.
# Case-insensitive.
#
$ALLOW_FILE = ".*\\.jpg|.*\\.jpeg" . # Jpeg files
"|.*\\.png" . #
"|.*\\.gif" #
;
# This pattern specifies fields which used for field-specified
# searching. NOTE: case-insensitive
#
# $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";
#
# This pattern specifies meta tags which used for field-specified
# searching. NOTE: case-insensitive
#
$META_TAGS = "keywords|description";
#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');
#
# This pattern specifies HTML elements which should be replaced with
# null string when removing them. Normally, the elements are replaced
# with a single space character.
#
$NON_SEPARATION_ELEMENTS = 'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';
#
# This pattern specifies attribute of a HTML tag which should be
# searchable.
#
$HTML_ATTRIBUTES = 'ALT|SUMMARY|TITLE';
#===================================================================
#
# Critical Numbers
#
#
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
$ON_MEMORY_MAX = 5000000;
#
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
# binary-formated files such as PDF, Word are larger.
#
$FILE_SIZE_MAX = 2000000;
#
# The max text size for indexing. Files larger than this
# will be ignored.
#
$TEXT_SIZE_MAX = 600000;
#
# The max length of a word. the word longer than this will be ignored.
#
$WORD_LENG_MAX = 128;
#
# Weights for HTML elements which are used for term weightning.
#
%Weight =
(
'html' => {
'title' => 16,
'h1' => 8,
'h2' => 7,
'h3' => 6,
'h4' => 5,
'h5' => 4,
'h6' => 3,
'a' => 4,
'strong' => 2,
'em' => 2,
'kbd' => 2,
'samp' => 2,
'var' => 2,
'code' => 2,
'cite' => 2,
'abbr' => 2,
'acronym'=> 2,
'dfn' => 2,
},
'metakey' => 32, # for
'headers' => 8, # for Mail/News' headers
);
#
# The max length of a HTML-tagged string which can be processed for
# term weighting.
# NOTE: There are not a few people has a bad manner using
# for changing a font size.
#
# $INVALID_LENG = 128;
#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
$MAX_FIELD_LENGTH = 200;
#===================================================================
#
# Softwares for handling a Japanese text
#
#
# Network Kanji Filter nkf v1.71 or later
#
$NKF = "no";
#
# KAKASI 2.x or later
# Text::Kakasi 1.05 or later
#
$KAKASI = "no";
#
# ChaSen 2.02 or later (simple wakatigaki)
# Text::ChaSen 1.03
#
$CHASEN = "no";
#
# ChaSen 2.02 or later (with noun words extraction)
#
$CHASEN_NOUN = "no";
#
# MeCab
#
$MECAB = "no";
#
# Default Japanese processer: KAKASI or ChaSen.
#
$WAKATI = $none;
#===================================================================
#
# Directories
#
# $LIBDIR = "@PERLLIBDIR@";
# $FILTERDIR = "@FILTERDIR@";
# $TEMPLATEDIR = "@TEMPLATEDIR@";
# 1;
--
Gauthier Vandemoortele
From yw3t-trns at asahi-net.or.jp Sun Dec 3 19:54:31 2006
From: yw3t-trns at asahi-net.or.jp (Tadamasa Teranishi)
Date: Sun Dec 3 19:54:45 2006
Subject: [Namazu-users-en] Re: What about images/EXIF ?
References: <20061202202903.GA2123@fantasio>
<4571E542.F50730C4@asahi-net.or.jp> <20061203094731.GA1304@fantasio>
Message-ID: <4572ACE7.4D7D1A63@asahi-net.or.jp>
Gauthier Vandemoortele wrote:
>
> I've installed this filter with the others in usr/share/namazu/filter/,
> installed the required modules Image::Info, IO::String (making the "make
> test" and all seems ok).
First of all, because image.pl is not what Namazu Project made it,
the support is off the subject.
But, I think the failure in the installation of Image::Info or
IO::String.
> Do you see some explanations ?
>
> System: linux
> Namazu: 2.0.16
> Perl: 5.008004
> File-MMagic: 1.25
> NKF: no
> KAKASI: no
> ChaSen: no
> MeCab: no
> Lang_Msg: C
> Lang: C
> Coding System: euc
> CONFDIR: /etc/namazu
> LIBDIR: /usr/share/namazu/pl
> FILTERDIR: /usr/share/namazu/filter
> TEMPLATEDIR: /usr/share/namazu/template
> Supported media types: (37)
> Unsupported media types: (11) marked with minus (-) probably missing application in your $path.
> application/excel: excel.pl
...
> - image/bmp: image.pl
> - image/gif: image.pl
> - image/jpeg: image.pl
> - image/png: image.pl
The image.pl filter is invalid.
How do it become it if the following command is executed?
$ perl -e "use Image::Info;"
$ perl -e "use IO::String;"
> Here is my /etc/namazu/mknmzrc.img
...
> #
> # This pattern specifies file names which will be targeted.
> # NOTE: It can be specified by --allow=regex option.
> # Do NOT use `$' or `^' anchors.
> # Case-insensitive.
> #
> $ALLOW_FILE = ".*\\.jpg|.*\\.jpeg" . # Jpeg files
> "|.*\\.png" . #
> "|.*\\.gif" #
> ;
need $DENY_FILE. The image file is included in $DENY_FILE in default
if it doesn't set it.
$DENY_FILE = ".*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
--
=====================================================================
TADAMASA TERANISHI yw3t-trns@asahi-net.or.jp
http://www.asahi-net.or.jp/~yw3t-trns/index.htm
Key fingerprint = 474E 4D93 8E97 11F6 662D 8A42 17F5 52F4 10E7 D14E
From gauthier.v at skynet.be Sun Dec 3 22:52:56 2006
From: gauthier.v at skynet.be (Gauthier Vandemoortele)
Date: Sun Dec 3 22:53:14 2006
Subject: [Namazu-users-en] Re: What about images/EXIF ?
In-Reply-To: <4572ACE7.4D7D1A63@asahi-net.or.jp>
References: <20061202202903.GA2123@fantasio>
<4571E542.F50730C4@asahi-net.or.jp>
<20061203094731.GA1304@fantasio>
<4572ACE7.4D7D1A63@asahi-net.or.jp>
Message-ID: <20061203135256.GA2007@fantasio>
Le Sun 3 Dec, Tadamasa Teranishi m'a ?crit:
> But, I think the failure in the installation of Image::Info or
> IO::String.
> How do it become it if the following command is executed?
>
> $ perl -e "use Image::Info;"
> $ perl -e "use IO::String;"
You're right: I've get some confusion about the IO::String module and
had choosen another module with a close name (String) in the Cpan
archives.
> need $DENY_FILE. The image file is included in $DENY_FILE in default
> if it doesn't set it.
>
> $DENY_FILE = ".*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
Maybe I'm wrong, but I didn't noticed in the doc that it was a
default value if this variable was'nt set. Now it works.
> First of all, because image.pl is not what Namazu Project made it,
> the support is off the subject.
So, more thanks ! I must say that I was a little disappointed when image.pl
finally worked, because the comments-fields in images were not indexed.
And I thought it was the more important !
So I looked inside image.pl and saw the next lines:
$$cont = $info->{'file_media_type'};
$$cont .= " Size: " . $info->{'width'} . "x" . $info->{'height'};
$$cont .= " DateTimeOriginal: " . $info->{'DateTimeOriginal'} if ($info->{'D
And I simply add this one:
$$cont .= " Comment: " . $info->{'Comment'} if ($info->{'Comment'});
Now the comments are indexed :-)
I hope that could help someone.
Regards,
--
Gauthier
Van Gogh, il ?tait peut-?tre fou,
mais quand on voit le tableau de sa chambre,
il faisait son lit.
(Br?ves de comptoir - J-M Gourio)
From knok at daionet.gr.jp Tue Dec 12 13:56:06 2006
From: knok at daionet.gr.jp (NOKUBI Takatsugu)
Date: Tue Dec 12 13:55:59 2006
Subject: [Namazu-users-en] namazu.org will stop because of moving
Message-ID: <87ac1t90gp.wl%knok@daionet.gr.jp>
namazu.org will stop all of services because the server will move to
another IDC, the period is the following:
2006-12-15 (Fri) 13:00 - 2006-12-17 (Sun) 13:00 JST
We apologize for the inconvenience.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.namazu.org/pipermail/namazu-users-en/attachments/20061212/07f6bfc8/attachment.sig
-------------- next part --------------
From saptec at bfst.bund.de Tue Dec 12 18:37:09 2006
From: saptec at bfst.bund.de (saptec@bfst.bund.de)
Date: Tue Dec 12 18:45:09 2006
Subject: [Namazu-users-en] NMZ.w File too large
Message-ID:
An HTML attachment was scrubbed...
URL: http://www.namazu.org/pipermail/namazu-users-en/attachments/20061212/5fd4cdcf/attachment.htm
From earl at earlhood.com Wed Dec 13 01:50:19 2006
From: earl at earlhood.com (Earl Hood)
Date: Wed Dec 13 02:09:15 2006
Subject: [Namazu-users-en] Re: NMZ.w File too large
In-Reply-To:
References:
Message-ID: <200612121650.kBCGoJq5011426@gator.earlhood.com>
On December 12, 2006 at 10:37, saptec@bfst.bund.de wrote:
> We are using Namazu 2.0.15 on SuSe Linux.
> Error from command namazu -d test:
> ...
> namazu(debug): /usr/local/var/namazu/index/NMZ.w: File too large
> Results:
> References: [ (can't open the index) ]
> ...
> The size of NMZ.w is 2421356497.
2GB is a common maximum for file sizes, however linux should
should support sizes larger than that.
However, for sizes larger than 2GB, programs sometimes have to
use different system calls, and wrt perl, perl must be compiled
to support large file sizes.
Since namazu uses perl, this could be cause of the error. Run
the following command:
perl -V
And search for "uselargefiles" in the output to see if your perl
has been compiled with large file support. If has not, you can
rebuild and reinstall perl with large-file support enabled.
Parts of namazu are written in C, so those parts could also be
a source of the error depending on how the code was written and
if support for large files was put in. I do not recall if the
build process for namazu has an option for large-file support.
A possible work-around is to use split up your search index. If the
files you are indexing can easily be indexed separately (e.g. files are
in different directories and easily divided into separate indexable
areas), you can create a separate search index for each area.
When doing a search, you can specify each index to search against.
Namazu suppports search against multiple indexes at a single time.
--ewh