The debug info indicates that you are successfully
connecting to the
server hosting the site, but that server is choosing to tell
you that
the pages do not exist. Is this a site that you control or
have some
agreement with? One explanation that fits the facts is that
the
server has been configured to deny you access and redirect
your
requests to a 404 error page (perhaps based on IP address or
similar). One way to at least partially test this
possibility would
be to ssh back to your server and try requesting the page
with a text
browser. If there is a block based solely on IP address or
server/
domain name, you should see the same 404 response.
If you are sure that you aren't being blocked, you might try
copying
your config file, changing the start_url, and indexing some
other
site just to make sure all the settings are sane.
I tried using htdig (3.1.6) to start indexing this site and
had no
problem retrieving pages with a nearly stock configuration.
Jim
On Mar 8, 2007, at 2:00 PM, Clint Davis wrote:
> I ran rundig from an ssh session to the server. I can
pull up the
> first page
> from my desktop with no problem. I can also retrieve
the robots.txt
> with no
> problem via my desktop browser.
>
> Any other ideas?
>
>
> On 3/8/07 2:51 PM, "Jim Cole" <lists yggdrasill.net> wrote:
>
>> For some reason htdig was unable to retrieve the
first page from the
>> site in question. The server is claiming that the
file does not exist
>> (404 response). If this only happened at one time,
or is always
>> happening at the same time, it might be due to a
server problem,
>> server maintenance, etc. If it is happening all the
time, a first
>> step would be to fire up a browser on the machine
that runs htdig and
>> make sure you can load the page from there.
>>
>> The "DB2 problem..." message is just due
to the fact there was
>> nothing in the database when htmerge ran.
>>
>> Jim
>>
>> On Mar 8, 2007, at 9:46 AM, Clint Davis wrote:
>>
>>
>>> After using Htdig for years, I just noticed
that one of my sites
>>> hasn't been
>>> indexed properly in a while.
>>>
>> ...
>>
>>> pick: www.realtree.com, # servers = 1
>>> 0:0:0:http://w
ww.realtree.com/: GET / HTTP/1.0
>>> User-Agent: htdig/3.1.6 (webmaster grayloon.com)
>>> Host: www.realtree.com
>>>
>>> Header line: HTTP/1.1 404 Not Found
>>>
>> ...
>>
>>> htmerge: Sorting...
>>> htmerge: Removing doc #0
>>> DB2 problem...: missing or empty key value
specified
>>>
>>> Deleted, no excerpt: 0/http://www.realtree.com/
>
------------------------------------------------------------
-------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the
chance to share your
opinions on IT & business topics through brief
surveys-and earn cash
http://www.techsay.com/default.
php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
ht://Dig general mailing list: <htdig-general lists.sourceforge.net>
ht://Dig FAQ: http://htdig.so
urceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-gen
eral
|