Dennis, +1
On 6/25/07 4:42 PM, "Dennis Kubes" <kubes apache.org> wrote:
> If no one has any objections, I will go ahead and
commit this.
>
> Dennis Kubes
>
> Dennis Kubes (JIRA) wrote:
>> [
>> https://issues.apache.org/jira/brow
se/NUTCH-497?page=com.atlassian.jira.plugi
>> n.system.issuetabpanels:all-tabpanel ]
>>
>> Dennis Kubes updated NUTCH-497:
>> -------------------------------
>>
>> Attachment: nested-tags-trap3.patch
>>
>> added nested-tags-trap3.patch with apache grant
>>
>>> Extreme Nested Tags causes
StackOverflowException in
>>> DomContentUtils...Spider Trap
>>>
------------------------------------------------------------
----------------
>>> ------
>>>
>>> Key: NUTCH-497
>>> URL: https
://issues.apache.org/jira/browse/NUTCH-497
>>> Project: Nutch
>>> Issue Type: Bug
>>> Components: fetcher
>>> Affects Versions: 0.8.1, 0.9.0, 1.0.0
>>> Environment: all
>>> Reporter: Dennis Kubes
>>> Assignee: Dennis Kubes
>>> Fix For: 1.0.0
>>>
>>> Attachments: ExtremeNestedTags.patch,
nested-tags-trap.patch,
>>> nested-tags-trap2.patch,
nested-tags-trap3.patch
>>>
>>>
>>> Some webpages have a form of a spider trap that
causes a
>>> StackOverflowException in DomContentUtils by
having nested tags with
>>> thousands of layers deep. DomContentUtils when
trying to get outlinks uses
>>> a recursive method to parse the html. With
this type of nesting it errors
>>> out.
>>
|