Discussion:
Problem with doc2html
(too old to reply)
Christian Fredrickson
2002-03-20 09:40:23 UTC
Permalink
Raw Message
This same problem also occurs if I attempt to run parse_doc.pl as my Word
parser. While running, I get the following error:
!! Cannot load charset cp1251 - file not found
When I use the doc2html.pl to parse, the .DOC files are placed into the
index, however the only portion of the .DOC file that is indexed is the name
of the .DOC file. So the body of the .DOC files are not parsed at all.

I can attach anything you need to help me solve this problem. I have
followed all of the steps I could find in the list.
Gilles Detillieux
2002-03-20 10:13:02 UTC
Permalink
Raw Message
Post by Christian Fredrickson
This same problem also occurs if I attempt to run parse_doc.pl as my Word
!! Cannot load charset cp1251 - file not found
When I use the doc2html.pl to parse, the .DOC files are placed into the
index, however the only portion of the .DOC file that is indexed is the name
of the .DOC file. So the body of the .DOC files are not parsed at all.
I can attach anything you need to help me solve this problem. I have
followed all of the steps I could find in the list.
The error comes from "catdoc", which you most likely have not installed
correctly. It comes with a directory of several *.txt files that contain
the charset definitions, and you must install these where it expects to
find them. Read through the directions for compiling and installing
catdoc.
--
Gilles R. Detillieux E-mail: <***@scrc.umanitoba.ca>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
Gilles Detillieux
2002-03-20 10:41:02 UTC
Permalink
Raw Message
I did install and have tested catdoc. It is setup. Where do I set the
charset definitions for catdoc?
I've you're running version 0.90.1 of catdoc, as I am, then a ./configure,
make and make install would install all the charset definition files in
/usr/local/lib/catdoc/ by default. I expect other versions, if there
are more recent ones, would do something much the same.

If the files are there, make sure they're readable by the user ID under
which you run htdig. E.g., if you did the make install as root, with
a umask of 77, then the files and/or directories leading up to them may
have wound up accessible only to root. Check the permissions on the files
and directories to make sure anyone can read them. Also, test catdoc
directly on one of the Word files that's causing the error, running catdoc
using the same user ID as you use when running htdig.
--
Gilles R. Detillieux E-mail: <***@scrc.umanitoba.ca>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
Gilles Detillieux
2002-03-20 13:13:02 UTC
Permalink
Raw Message
That is the issue. I installed it to a different directory. I installed it
to $HOME/htdig/contrib so the charsets are in the
$HOME/htdig/contrib/lib/catdoc directory. So what do I have to modify to
tell catdoc where the charset directory is?
... and in a later message (also off-list)...
That is great info and you were correct that the catdoc files were the
problem, only when I did the config - make - make install for catdoc, I
pointed it to another directory and the lib/catdoc directory is there with
all the correct files, so why is catdoc failing to find those files?
Well, I'm assuming that when you ran ./configure, you gave it the argument
--prefix=$HOME/htdig/contrib, or did you point it to this directory by some
other means? Is it possible that your home directory, i.e. the value of
$HOME, has changed between when you compiled catdoc and now? If you gave
the --prefix option above to ./configure, it would have expanded $HOME to
wherever your home directory was at the time, so when catdoc was compiled,
it would have had a compile time option of

-DCHARSETPATH="/path/to/home/yourname/htdig/contrib/lib/catdoc"

given to it by src/Makefile, where /path/to/home/yourname would actually
have been the true location of your home directory at the time. If that
location has changed since, catdoc would still have the old location
compiled in. You could probably find out what that location was by
using the command:

strings $HOME/htdig/contrib/bin/catdoc | grep lib/catdoc

and seeing if that's different than what "echo $HOME" reports. If it's
different, you'll need to reconfigure, recompile and reinstall catdoc.
--
Gilles R. Detillieux E-mail: <***@scrc.umanitoba.ca>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
Loading...