List Info

Thread: No results in cached.jsp ; Why?




No results in cached.jsp ; Why?
user name
2007-09-27 07:28:53
One particular site doesn't show up in cached.jsp as cached
page. It
shows this message in the cache ->

Display of this content was administratively prohibited by
the
webmaster. You may visit the original page instead: http://journals/



The robots.txt file for this website is

User-Agent: *
Disallow: /directory.bml

#
# Blocked journals aren't listed here because robots.txt
files
# can't be above 50k or so, depending on the spider.



This site is based on livejournal, an open source blogging
application. Why hasn't Nutch cached the content of this
page?

- B. Hugh

Re: No results in cached.jsp ; Why?
user name
2007-09-27 07:32:36
> This site is based on livejournal, an open source
blogging
> application. Why hasn't Nutch cached the content of
this page?
>


https
://issues.apache.org/jira/browse/NUTCH-167

Lots (all?) of livejournals put a noarchive tag in the html,
which  
nutch follows.



[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )