I'm attempting to use htdig to index a secure intranet site,
and I am
running into a problem with the indexing of ms word files.
It seems
like the logic of the program cannot handle any files other
than html,
pdf, ps and txt when running in local_urls_only mode, there
is a check
for those extensions in Document.cc, if they are not found
it returns
Document_not_local to Retriever.cc, which marks the file as
not found.
It seems like this is fixed in 3.2.0, but that is still in
beta. Could
the 4.8 FAQ entry be updated on the dig site with something
like, "If
you are using 3.1.6 along with local_urls_only you will not
be able to
index files other than html, pdf, ps or txt. You must use
the 3.2.0
series." Maybe someone else won't have to spend the
time looking up
this bug again then.
If this topic has been covered to death, sorry, the SF
mailing list
search is currently down for me so I couldn't search on this
topic. I
was also impressed by how understandable the code is for
this project,
it was incredibly easy to find the relevant parts in the
code that dealt
with the errors I was having.
Thanks
Josh
--
Lake Agassiz Regional Library - Moorhead MN larl.org
Josh Stompro | Office 218.233.3757 EXT-139
LARL Network Administrator | Cell 218.790.2110
------------------------------------------------------------
-------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and
a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
ht://Dig general mailing list: <htdig-general lists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral
|