[Namazu-users-en] Re: What about images/EXIF ?

Gauthier Vandemoortele gauthier.v at skynet.be
Sun Dec 3 18:47:31 JST 2006


Hello,

Le Sun  3 Dec, Tadamasa Teranishi m'a écrit:
> Mr. koi_san is making the image filter though it is not understood 
> whether to examine even EXIF-tag. 
> 
> http://www.interq.or.jp/japan/koi_san/trash/2004/namazu_filter2.htm

Many thanks (and thanks to Google's translating page :-)

I've installed this filter with the others in usr/share/namazu/filter/,
installed the required modules Image::Info, IO::String (making the "make
test" and all seems ok).

The Image::Info comes with a directory of sample images, and a test-script
that dumps info from these images. There also, all seemed ok. 
I've adapted the mknmsrc sample (see below) and tried as root:

mknmz -d -V -f /etc/namazu/mknmzrc.img  -O /var/namazu/index/img/
/root/Image-Info-1.16/img/

and no file is indexed:


@@ Reading rcfile: 
@@ Reading rcfile: 
@@  /etc/namazu/mknmzrc.img
// Invoked: /usr/bin/wvWare --version
// Invoked: /usr/bin/pdftotext
// Invoked: /usr/bin/pdfinfo
// tmpnam: /var/namazu/index/img//NMZ.tmp_i.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_p.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_pi.tmp
// tmpnam: /var/namazu/index/img//NMZ.tmp_w.tmp
// tmpnam: /var/namazu/index/img//NMZ.checkpoint.tmp
// tmpnam: /var/namazu/index/img//NMZ.flist.tmp
// tmpnam: /var/namazu/index/img//NMZ.i.tmp
// tmpnam: /var/namazu/index/img//NMZ.ii.tmp
// tmpnam: /var/namazu/index/img//NMZ.p.tmp
// tmpnam: /var/namazu/index/img//NMZ.pi.tmp
// tmpnam: /var/namazu/index/img//NMZ.r.tmp
// tmpnam: /var/namazu/index/img//NMZ.t.tmp
// tmpnam: /var/namazu/index/img//NMZ.w.tmp
// tmpnam: /var/namazu/index/img//NMZ.wi.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_i.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_p.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_pi.tmp
// NMZ: /var/namazu/index/img//NMZ.tmp_w.tmp
// NMZ: /var/namazu/index/img//NMZ.checkpoint.tmp
// NMZ: /var/namazu/index/img//NMZ.flist.tmp
// NMZ: /var/namazu/index/img//NMZ.i.tmp
// NMZ: /var/namazu/index/img//NMZ.ii.tmp
// NMZ: /var/namazu/index/img//NMZ.p.tmp
// NMZ: /var/namazu/index/img//NMZ.pi.tmp
// NMZ: /var/namazu/index/img//NMZ.r.tmp
// NMZ: /var/namazu/index/img//NMZ.t.tmp
// NMZ: /var/namazu/index/img//NMZ.w.tmp
// NMZ: /var/namazu/index/img//NMZ.wi.tmp
// NMZ: /var/namazu/index/img//NMZ.body
// NMZ: /var/namazu/index/img//NMZ.err
// NMZ: /var/namazu/index/img//NMZ.field
// NMZ: /var/namazu/index/img//NMZ.foot
// NMZ: /var/namazu/index/img//NMZ.head
// NMZ: /var/namazu/index/img//NMZ.i
// NMZ: /var/namazu/index/img//NMZ.ii
// NMZ: /var/namazu/index/img//NMZ.lock
// NMZ: /var/namazu/index/img//NMZ.lock2
// NMZ: /var/namazu/index/img//NMZ.log
// NMZ: /var/namazu/index/img//NMZ.msg
// NMZ: /var/namazu/index/img//NMZ.p
// NMZ: /var/namazu/index/img//NMZ.pi
// NMZ: /var/namazu/index/img//NMZ.r
// NMZ: /var/namazu/index/img//NMZ.result
// NMZ: /var/namazu/index/img//NMZ.slog
// NMZ: /var/namazu/index/img//NMZ.status
// NMZ: /var/namazu/index/img//NMZ.t
// NMZ: /var/namazu/index/img//NMZ.tips
// NMZ: /var/namazu/index/img//NMZ.version
// NMZ: /var/namazu/index/img//NMZ.w
// NMZ: /var/namazu/index/img//NMZ.wi
Looking for indexing files...
@@ find_target starting: Sun Dec  3 10:35:15 2006
@@ Denied:	/root/Image-Info-1.16/img/test.jpg
@@ Not allowed:	/root/Image-Info-1.16/img/test.svg
@@ Denied:	/root/Image-Info-1.16/img/gps.jpg
@@ Denied:	/root/Image-Info-1.16/img/test.png
@@ Not allowed:	/root/Image-Info-1.16/img/test.rle
@@ Not allowed:	/root/Image-Info-1.16/img/test.xbm
@@ Not allowed:	/root/Image-Info-1.16/img/test.ppm
@@ Not allowed:	/root/Image-Info-1.16/img/tiny.pgm
@@ Not allowed:	/root/Image-Info-1.16/img/test.pgm
@@ Denied:	/root/Image-Info-1.16/img/test.gif
@@ Not allowed:	/root/Image-Info-1.16/img/test.xpm
@@ Not allowed:	/root/Image-Info-1.16/img/test.pbm
@@ find_target finished: Sun Dec  3 10:35:15 2006
@@ Target Files: 0 (Scan Performance: Elapsed Sec.: 1, Files/sec: 0.0)
@@   Possible: 12, Not allowed: 8, Denied: 4, Excluded: 0
@@   MTIME too old: 0, MTIME too new: 0
No files to index.

Here is the result of mknmz -C; as you can see, the images are preceeded
with a minus (???). I've recompiled namazu after having installed the
new libraries. And to be sure that my previous /etc/namazu/mknmzrc
doesn't interfere, I've renamed it /etc/namazu/mknmzrc.all (for all other
mime-types)

Do you see some explanations ?


System: linux
Namazu: 2.0.16
Perl: 5.008004
File-MMagic: 1.25
NKF: no
KAKASI: no
ChaSen: no
MeCab: no
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /etc/namazu
LIBDIR: /usr/share/namazu/pl
FILTERDIR: /usr/share/namazu/filter
TEMPLATEDIR: /usr/share/namazu/template
Supported media types:   (37)
Unsupported media types: (11) marked with minus (-) probably missing application in your $path.
  application/excel: excel.pl
  application/gnumeric: gnumeric.pl
  application/ichitaro5: taro56.pl
  application/ichitaro6: taro56.pl
- application/ichitaro7: taro7_10.pl
  application/macbinary: macbinary.pl
  application/msword: msword.pl
  application/pdf: pdf.pl
  application/postscript: postscript.pl
  application/powerpoint: powerpoint.pl
- application/rtf: rtf.pl
  application/vnd.kde.kivio: koffice.pl
  application/vnd.kde.kpresenter: koffice.pl
  application/vnd.kde.kspread: koffice.pl
  application/vnd.kde.kword: koffice.pl
  application/vnd.oasis.opendocument.graphics: ooo.pl
  application/vnd.oasis.opendocument.presentation: ooo.pl
  application/vnd.oasis.opendocument.spreadsheet: ooo.pl
  application/vnd.oasis.opendocument.text: ooo.pl
  application/vnd.sun.xml.calc: ooo.pl
  application/vnd.sun.xml.draw: ooo.pl
  application/vnd.sun.xml.impress: ooo.pl
  application/vnd.sun.xml.writer: ooo.pl
  application/x-apache-cache: apachecache.pl
  application/x-bzip2: bzip2.pl
  application/x-compress: compress.pl
- application/x-deb: deb.pl
- application/x-dvi: dvi.pl
  application/x-gzip: gzip.pl
- application/x-js-taro: taro7_10.pl
  application/x-rpm: rpm.pl
- application/x-tex: tex.pl
  application/x-zip: zip.pl
- audio/mpeg: mp3.pl
- image/bmp: image.pl
- image/gif: image.pl
- image/jpeg: image.pl
- image/png: image.pl
  message/news: mailnews.pl
  message/rfc822: mailnews.pl
  text/hnf: hnf.pl
  text/html: html.pl
  text/html; x-type=mhonarc: mhonarc.pl
  text/html; x-type=pipermail: pipermail.pl
  text/plain
  text/plain; x-type=rfc: rfc.pl
  text/x-hdml: hdml.pl
  text/x-roff: man.pl

Here is my /etc/namazu/mknmzrc.img

#
# This is a Namazu configuration file for mknmz.
#
package conf;  # Don't remove this line!

#===================================================================
#
# Administrator's email address
#
$ADDRESS = 'gauthier at courrier.adt';


#===================================================================
#
# Regular Expression Patterns
#

#
# This pattern specifies HTML suffixes.
#
#  $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";

#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
#
  $ALLOW_FILE =	".*\\.jpg|.*\\.jpeg" .             # Jpeg files
  		"|.*\\.png" .                      # 
  		"|.*\\.gif"                        #
;

# This pattern specifies fields which used for field-specified 
# searching.  NOTE: case-insensitive
# 
# $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";

#
# This pattern specifies meta tags which used for field-specified 
# searching.  NOTE: case-insensitive
#
  $META_TAGS = "keywords|description";

#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');

#
# This pattern specifies HTML elements which should be replaced with 
# null string when removing them. Normally, the elements are replaced 
# with a single space character.
#
  $NON_SEPARATION_ELEMENTS = 'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
                         'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';

#
# This pattern specifies attribute of a HTML tag which should be 
# searchable.
#
  $HTML_ATTRIBUTES = 'ALT|SUMMARY|TITLE';


#===================================================================
# 
# Critical Numbers
# 

# 
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
  $ON_MEMORY_MAX   = 5000000;

#
# The max file size for indexing. Files larger than this 
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because 
#       binary-formated files such as PDF, Word are larger.
#
  $FILE_SIZE_MAX   = 2000000;

#
# The max text size for indexing. Files larger than this 
# will be ignored.
#
  $TEXT_SIZE_MAX   =  600000;

#
# The max length of a word. the word longer than this will be ignored.
#
  $WORD_LENG_MAX   = 128;


#
# Weights for HTML elements which are used for term weightning.
#
 %Weight = 
     (
      'html' => {
          'title'  => 16,
          'h1'     => 8,
          'h2'     => 7,
          'h3'     => 6,
          'h4'     => 5,
          'h5'     => 4,
          'h6'     => 3,
          'a'      => 4,
          'strong' => 2,
          'em'     => 2,
          'kbd'    => 2,
          'samp'   => 2,
          'var'    => 2,
          'code'   => 2,
          'cite'   => 2,
          'abbr'   => 2,
          'acronym'=> 2,
          'dfn'    => 2,
      },
      'metakey' => 32, # for <meta name="keywords" content="foo bar">
      'headers' => 8,  # for Mail/News' headers
 );

#
# The max length of a HTML-tagged string which can be processed for
# term weighting. 
# NOTE: There are not a few people has a bad manner using 
#       <h[1-6]> for changing a font size.
#
# $INVALID_LENG = 128; 

#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
  $MAX_FIELD_LENGTH = 200;


#===================================================================
#
# Softwares for handling a Japanese text
#

#
# Network Kanji Filter nkf v1.71 or later
#
  $NKF = "no"; 

#
# KAKASI 2.x or later
# Text::Kakasi 1.05 or later
#
  $KAKASI = "no";

#
# ChaSen 2.02 or later (simple wakatigaki)
# Text::ChaSen 1.03
#
  $CHASEN = "no";

#
# ChaSen 2.02 or later (with noun words extraction)
#
  $CHASEN_NOUN = "no";

#
# MeCab
#
  $MECAB = "no";

#
# Default Japanese processer: KAKASI or ChaSen.
#
  $WAKATI  = $none;


#===================================================================
#
# Directories
#
# $LIBDIR = "@PERLLIBDIR@";
# $FILTERDIR = "@FILTERDIR@";
# $TEMPLATEDIR = "@TEMPLATEDIR@";

# 1;


-- 
Gauthier Vandemoortele <gauthier.vandemoortele at skynet.be>


More information about the Namazu-users-en mailing list