I am only interested in searching across a corpus of
injected domains. The
problem with this, however, is that two of the most valuable
elements
towards achieving ranking accuracy won't be there: incoming
anchor text and
the authority level inherited from sites linking to it.
I can get backlink information for each url I'm interested
in from Yahoo
Site Explorer or Alexa's set of web search tools. If I
started the crawl at
these URLs, I would capture the anchor text and authority
levels of the
pages I'm really interested in - but I would then have to
remove the pages
I'm not interested in.
I'm wondering if anyone has ever tried to do what I'm trying
to do - and if
so, please share any tips/ideas that might make the process
a little less
painful.
Thanks!
--
View this message in context: htt
p://www.nabble.com/Mimicking-Anchor-Text-Relevance---Authori
ty-On-a-Focused-Crawl-tf4668564.html#a13336338
Sent from the Nutch - User mailing list archive at
Nabble.com.
|