List Info

Thread: Updated: (NUTCH-353) pages that serverside forwards will be refetched every time




Updated: (NUTCH-353) pages that serverside forwards will be refetched every time
country flaguser name
United States
2007-03-19 18:51:32
     [ https://issues.apache.org/jira/browse/NUTCH-353?page=com.at
lassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  updated NUTCH-353:
------------------------------------

    Priority: Major  (was: Blocker)

This i partially fixed so that page status is consistent.
LinkDb related changes will be implemented later.

> pages that serverside forwards will be refetched every
time
>
-----------------------------------------------------------
>
>                 Key: NUTCH-353
>                 URL: https
://issues.apache.org/jira/browse/NUTCH-353
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Stefan Groschupf
>         Assigned To: Andrzej Bialecki 
>             Fix For: 0.9.0
>
>         Attachments:
doNotRefecthForwarderPagesV1.patch
>
>
> Pages that do a serverside forward are not written with
a status change back into the crawlDb. Also the
nextFetchTime is not changed. 
> This causes a refetch of the same page again and again.
The result is nutch is not polite and refetching the
forwarding and target page in each segment iteration. Also
it effects the scoring since the forward page contribute
it's score to all outlinks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )