[ http://issues.apache.org/jira/browse
/NUTCH-411?page=comments#action_12454649 ]
Dogacan Güney commented on NUTCH-411:
-------------------------------------
My not-necessarily-correct patch for this. We add the new
url as a newly discovered url (so it gets initialScore),
which is different from what happens if we parse in fetcher.
I believe that in the long term, nutch should associate
source url with the redirected url. But this patch (or a
more correct version of this ) can be
applied so that we do not lose urls in the short term.
> Parse ignores meta refresh redirection
> --------------------------------------
>
> Key: NUTCH-411
> URL: http:/
/issues.apache.org/jira/browse/NUTCH-411
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 0.9.0
> Reporter: Dogacan Güney
> Priority: Minor
>
> If fetching and parsing are run as seperate jobs, then
redirection coming from meta refresh tag (i.e. <meta
http-equiv="refresh"
content="0;url=foo/">) is ignored, resulting in
the loss of that ("foo/") url.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira
|