Author: Dan Bauman
Email: npm-music sil.org.pg
Message:
Hi,
Thank you for providing this search engine. Over the years I
keep
running into a problem with mnogosearch accessing files on
an FTP
server and would like to get to the root of the problem.
Here are
the specs for our system:
Linux: Fedora Core 4 - 2.6.11-1.1369_FC4smp #1 SMP
Mysql: Ver 14.7 Distrib 4.1.14, for redhat-linux-gnu (i386)
using
readline 4.3
Mnogosearch: 3.2.38
When indexing an FTP site it will get a Status 200 on some
files and
a 404 on most other files. I have tried authenticating
anonymously
and with valid user accounts. For files that indexer could
not get I
would copy the URL and do a wget on that URL and it would
get the
file. Any ideas would be appreciated.
Below are listings parts of the indexer.conf file and the -v
5
output from indexer. If you need more info let me know.
indexer.conf
====
...
AuthBasic login:password
Server ftp://ftp.x.org.pg/
...
indexer output
====
indexer from mnogosearch-3.2.34-mysql started with
'/usr/local/mnogosearch/etc/indexer.conf'
...
[3234] URL: ftp://ftp.x.org.pg/Public/Documents/
[3234] Server Allow 'ftp://ftp.x.org.pg/'
[3234] Allow by default
[3234] Request.Accept-Encoding: gzip,deflate,compress
[3234] Request.Authorization: Basic
TlBNLU1hbmFnZXI6MmxlZnRmZWV0
[3234] Request.Host: ftp.x.org.pg
[3234] Request.User-Agent: MnoGoSearch/3.2.34
[3234] Response.AuthBasic: TlBNLU1hbmFnZXI6MmxlZnRmZWV0
[3234] Response.body: <NULL>
[3234] Response.case_sense: 1
[3234] Response.Charset: <NULL>
[3234] Response.Content-Language: <NULL>
[3234] Response.Content-Length: 26
[3234] Response.Content-Type: text/html
[3234] Response.crc32: 0
[3234] Response.crc32old: 0
[3234] Response.crosswords: <NULL>
[3234] Response.DetectClones: 1
[3234] Response.Follow: 1
[3234] Response.HoldBadHrefs: 259200
[3234] Response.Hops: 0
[3234] Response.ID: 2047218
[3234] Response.match_type: 1
[3234] Response.MaxDocPerSite: 0
[3234] Response.MaxHops: 256
[3234] Response.meta.description: <NULL>
[3234] Response.meta.keywords: <NULL>
[3234] Response.Method: Allow
[3234] Response.nomatch: 0
[3234] Response.Period: 3600
[3234] Response.PrevStatus: 200
[3234] Response.ResponseLine: HTTP/1.1 200 OK
[3234] Response.ResponseSize: 70
[3234] Response.Server_id: -8347684
[3234] Response.Site_id: -8347684
[3234] Response.Status: 200
[3234] Response.title: <NULL>
[3234] Response.URL:
ftp://ftp.x.org.pg/Public/Documents/
[3234] Response.url.file: <NULL>
[3234] Response.url.host: <NULL>
[3234] Response.url.path: <NULL>
[3234] Response.url.proto: <NULL>
[3234] Response.URL_ID: -1842586386
[3234] Status: 200 OK
[3234] Guesser: 794h:196m en-iso-8859-1
[3234] Guesser: Lang: en, Charset: iso-8859-1
...
[3234] URL:
ftp://ftp.x.org.pg/Departments/HighSchool/Registrar/Course%2
0Category%20Placement.doc
[3234] Server Allow 'ftp://ftp.x.org.pg/'
[3234] Allow by default
[3234] Request.Accept-Encoding: gzip,deflate,compress
[3234] Request.Authorization: Basic
TlBNLU1hbmFnZXI6MmxlZnRmZWV0
[3234] Request.Host: ftp.x.org.pg
[3234] Request.User-Agent: MnoGoSearch/3.2.34
[3234] Response.AuthBasic: TlBNLU1hbmFnZXI6MmxlZnRmZWV0
[3234] Response.body: <NULL>
[3234] Response.case_sense: 1
[3234] Response.Charset: <NULL>
[3234] Response.Content-Language: <NULL>
[3234] Response.Content-Length: 0
[3234] Response.Content-Type: application/msword
[3234] Response.crc32: 0
[3234] Response.crc32old: 0
[3234] Response.crosswords: <NULL>
[3234] Response.DetectClones: 1
[3234] Response.Follow: 1
[3234] Response.HoldBadHrefs: 259200
[3234] Response.Hops: 4
[3234] Response.ID: 2045732
[3234] Response.match_type: 1
[3234] Response.MaxDocPerSite: 0
[3234] Response.MaxHops: 256
[3234] Response.meta.description: <NULL>
[3234] Response.meta.keywords: <NULL>
[3234] Response.Method: Allow
[3234] Response.nomatch: 0
[3234] Response.Period: 3600
[3234] Response.PrevStatus: 404
[3234] Response.ResponseLine: HTTP/1.1 404 OK
[3234] Response.ResponseSize: 19
[3234] Response.Server_id: -8347684
[3234] Response.Site_id: -8347684
[3234] Response.Status: 404
[3234] Response.title: <NULL>
[3234] Response.URL:
ftp://ftp.x.org.pg/Departments/HighSchool/Registrar/Course%2
0Category%20Placement.doc
[3234] Response.url.file: <NULL>
[3234] Response.url.host: <NULL>
[3234] Response.url.path: <NULL>
[3234] Response.url.proto: <NULL>
[3234] Response.URL_ID: 1089293553
[3234] Status: 404 Not found
...
Reply: <http://www.mnogosearch.org/board/message.php?id=18241&g
t;
------------------------------------------------------------
---------
To unsubscribe, e-mail: general-unsubscribe mnogosearch.org
For additional commands, e-mail: general-help mnogosearch.org
|