Namazu-devel-ja(旧)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: filter/postscript.pl



 From: baba@xxxxxxxxxxxxxxxxxxxxxx
 Subject: [namazu-devel-ja] filter/postscript.pl
 Date: Wed, 27 Dec 2000 15:46:05 +0900

 > ファイル検出に問題がある

これは単に mknmzrc の $ALLOW_FILE で *.ps を検出させるようにすれば
いいだけでした。まだ text/plain と表示するけど、これでも良いのかな?

===================================================================
RCS file: /storage/cvsroot/namazu/pl/conf.pl.in,v
retrieving revision 1.27
diff -u -u -r1.27 conf.pl.in
--- conf.pl.in  2000/03/16 13:00:14     1.27
+++ conf.pl.in  2000/12/27 09:54:41
@@ -28,8 +28,10 @@
 #
 $ALLOW_FILE =  ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
                "|.*\\.gz|.*\\.Z|.*\\.bz2" .       # Compressed files
-               "|.*\\.pdf" .                      # PDF
                "|.*\\.tex" .                      # TeX
+               "|.*\\.dvi" .                      # DVI
+               "|.*\\.ps" .                       # PostScript
+               "|.*\\.pdf" .                      # PDF
                "|.*\\.doc|.*\\.xls" .             # Word, Excel
                "|.*\\.j[sab]w" .                  # Ichitaro 4, 5, 6
                "|\\d+|[-\\w]+\\.[1-9n]";          # Mail/News, man


ついでに、dvi ファイルもテキスト抽出できるはずだとおもうのだけれど、
dviware はどれが良いのかさっぱりわかりません。しかも同じ名前でちょっ
とづつ違うものがいっぱい。他にもたくさんあるんでしょうがよくわから
んです。

  dvi2tty
  ftp://ftp.web.ad.jp/pub/TeX/akiu/dviwares/dvi2tty/
  ftp://ftp.iis.u-tokyo.ac.jp/pub/TeX/CTAN/dviware/dvi2tty/
  ftp://contrib.redhat.com/pub/contrib/libc5/SRPMS/dvi2tty-5.1-1.src.rpm

  jdvi2tty
  http://www.geocities.co.jp/SiliconValley/7231/jdvi2tty.htm
  Nifty のものを勝手に転載したもの?

  dvi2text
  http://www.toc.lcs.mit.edu/~dmjones/dvi2text/
  Perl スクリプトで TeX::DVI::TXT, TeX::DVI::BYTE といったところを
  使っているらしい。試してない。


とりあえず、一番上の akiu の dvi2tty を持ってきて、同梱されている
日本語化パッチを当てて作ると jdvi2tty ができます。これと以下の
filter/dvi.pl で、なんとなくいけそうではあります。
--
馬場  肇 ( Hajime BABA )            E-mail: baba@xxxxxxxxxxxxxxxxxxxxxx
京都大学理学部宇宙物理学教室 博士後期課程
--



#
# -*- Perl -*-
# $Id: dvi.pl,v 1.16 2000/03/23 10:41:04 knok Exp $
# Copyright (C) 2000 Namazu Project All rights reserved ,
#     This is free software with ABSOLUTELY NO WARRANTY.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either versions 2, or (at your option)
#  any later version.
# 
#  This program is distributed in the hope that it will be useful
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
#  02111-1307, USA
#
#  This file must be encoded in EUC-JP encoding
#

package dvi;
use strict;
require 'util.pl';

my $dvipath = undef;

sub mediatype() {
    return ('application/x-dvi');
}

sub status() {
    if (util::islang("ja")) {
	$dvipath = util::checkcmd('jdvi2tty');
    } else {
	$dvipath = util::checkcmd('dvi2tty');
    }
    return 'no' unless (defined $dvipath);
    return 'yes';
}

sub recursive() {
    return 1;
}

sub pre_codeconv() {
    return 0;
}

sub post_codeconv () {
    return 0;
}

sub add_magic ($) {
    return;
}

sub filter ($$$$$) {
    my ($orig_cfile, $cont, $weighted_str, $headings, $fields)
      = @_;
    my $cfile = defined $orig_cfile ? $$orig_cfile : '';

    my $tmpfile = util::tmpnam('NMZ.dvi');
    my $tmpfile2 = util::tmpnam('NMZ.dvi2');

    # note that dvi2tty need suffix .dvi
    my $fh = util::efopen("> $tmpfile.dvi");
    print $fh $$cont;
    undef $fh;

    util::vprint("Processing dvi file ... (using  '$dvipath')\n");
    system("$dvipath -q $tmpfile -o $tmpfile2");
    return 'Unable to convert dvi file' unless (-e $tmpfile2);

    $fh = util::efopen("$tmpfile2");
    my $size = util::filesize($fh);
    if ($size > $conf::FILE_SIZE_MAX) {
	return 'too_large_dvi_file';
    }
    $$cont = util::readfile($fh);
    undef $fh;
    unlink($tmpfile);
    unlink($tmpfile2);
    return undef;
}

1;