[ https://issues.apache.org/jira/browse/N
UTCH-353?page=com.atlassian.jira.plugin.system.issuetabpanel
s:comment-tabpanel#action_12466285 ]
Chris A. Mattmann commented on NUTCH-353:
-----------------------------------------
Doug,
Let's see what you got. I'd be happy to take a look at it.
Cheers,
Chris
> pages that serverside forwards will be refetched every
time
>
-----------------------------------------------------------
>
> Key: NUTCH-353
> URL: https
://issues.apache.org/jira/browse/NUTCH-353
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 0.8.1, 0.9.0
> Reporter: Stefan Groschupf
> Assigned To: Andrzej Bialecki
> Priority: Blocker
> Fix For: 0.9.0
>
> Attachments:
doNotRefecthForwarderPagesV1.patch
>
>
> Pages that do a serverside forward are not written with
a status change back into the crawlDb. Also the
nextFetchTime is not changed.
> This causes a refetch of the same page again and again.
The result is nutch is not polite and refetching the
forwarding and target page in each segment iteration. Also
it effects the scoring since the forward page contribute
it's score to all outlinks.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: https://issues.apache.org/jira/secure/Administrators.js
pa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira
|