List Info

Thread: Re: Updated: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils




Re: Updated: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils
country flaguser name
United States
2007-06-25 18:42:47
If no one has any objections, I will go ahead and commit
this.

Dennis Kubes

Dennis Kubes (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/NUTCH-497?page=com.at
lassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
> 
> Dennis Kubes updated NUTCH-497:
> -------------------------------
> 
>     Attachment: nested-tags-trap3.patch
> 
> added nested-tags-trap3.patch with apache grant
> 
>> Extreme Nested Tags causes StackOverflowException
in DomContentUtils...Spider Trap
>>
------------------------------------------------------------
----------------------
>>
>>                 Key: NUTCH-497
>>                 URL: https
://issues.apache.org/jira/browse/NUTCH-497
>>             Project: Nutch
>>          Issue Type: Bug
>>          Components: fetcher
>>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>>         Environment: all
>>            Reporter: Dennis Kubes
>>            Assignee: Dennis Kubes
>>             Fix For: 1.0.0
>>
>>         Attachments: ExtremeNestedTags.patch,
nested-tags-trap.patch, nested-tags-trap2.patch,
nested-tags-trap3.patch
>>
>>
>> Some webpages have a form of a spider trap that
causes a StackOverflowException in DomContentUtils by having
nested tags with thousands of layers deep.  DomContentUtils
when trying to get outlinks uses a recursive method to parse
the html.  With this type of nesting it errors out.
> 

Re: Updated: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils
country flaguser name
United States
2007-06-25 19:09:02
Dennis, +1


On 6/25/07 4:42 PM, "Dennis Kubes" <kubesapache.org> wrote:

> If no one has any objections, I will go ahead and
commit this.
> 
> Dennis Kubes
> 
> Dennis Kubes (JIRA) wrote:
>>      [ 
>> https://issues.apache.org/jira/brow
se/NUTCH-497?page=com.atlassian.jira.plugi
>> n.system.issuetabpanels:all-tabpanel ]
>> 
>> Dennis Kubes updated NUTCH-497:
>> -------------------------------
>> 
>>     Attachment: nested-tags-trap3.patch
>> 
>> added nested-tags-trap3.patch with apache grant
>> 
>>> Extreme Nested Tags causes
StackOverflowException in
>>> DomContentUtils...Spider Trap
>>>
------------------------------------------------------------
----------------
>>> ------
>>> 
>>>                 Key: NUTCH-497
>>>                 URL: https
://issues.apache.org/jira/browse/NUTCH-497
>>>             Project: Nutch
>>>          Issue Type: Bug
>>>          Components: fetcher
>>>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>>>         Environment: all
>>>            Reporter: Dennis Kubes
>>>            Assignee: Dennis Kubes
>>>             Fix For: 1.0.0
>>> 
>>>         Attachments: ExtremeNestedTags.patch,
nested-tags-trap.patch,
>>> nested-tags-trap2.patch,
nested-tags-trap3.patch
>>> 
>>> 
>>> Some webpages have a form of a spider trap that
causes a
>>> StackOverflowException in DomContentUtils by
having nested tags with
>>> thousands of layers deep.  DomContentUtils when
trying to get outlinks uses
>>> a recursive method to parse the html.  With
this type of nesting it errors
>>> out.
>> 



[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )