One particular site doesn't show up in cached.jsp as cached
page. It
shows this message in the cache ->
Display of this content was administratively prohibited by
the
webmaster. You may visit the original page instead: http://journals/
The robots.txt file for this website is
User-Agent: *
Disallow: /directory.bml
#
# Blocked journals aren't listed here because robots.txt
files
# can't be above 50k or so, depending on the spider.
This site is based on livejournal, an open source blogging
application. Why hasn't Nutch cached the content of this
page?
- B. Hugh
|