List Info

Thread: OOM error during parsing with nekohtml




OOM error during parsing with nekohtml
user name
2007-07-16 05:04:39
Hi All,

We are getting an OOM Exception during the processing of
http:
//www.fotofinity.com/cgi-bin/homepages.cgi . We have
also applied
Nutch-497 patch to our source code. But actually the error
is coming during
the parse method.
Does anybody has any idea regarding this.  Here is the
complete stacktrace :

java.lang.OutOfMemoryError: Java heap space
	at java.lang.String.toUpperCase(String.java:2637)
	at java.lang.String.toUpperCase(String.java:2660)
	at
org.cyberneko.html.filters.NamespaceBinder.bindNamespaces(Na
mespaceBinder.java:443)
	at
org.cyberneko.html.filters.NamespaceBinder.startElement(Name
spaceBinder.java:252)
	at
org.cyberneko.html.HTMLTagBalancer.callStartElement(HTMLTagB
alancer.java:1009)
	at
org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalan
cer.java:639)
	at
org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalan
cer.java:646)
	at
org.cyberneko.html.HTMLScanner$ContentScanner.scanStartEleme
nt(HTMLScanner.java:2343)
	at
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScann
er.java:1820)
	at
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java
:789)
	at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration
.java:478)
	at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration
.java:431)
	at
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragme
ntParser.java:164)
	at
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.
java:265)
	at
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java
:229)
	at
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.j
ava:168)
	at
org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:84)
	at
org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:75
)
	at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
	at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1445)


Regards,
Shailendra
RE: OOM error during parsing with nekohtml
country flaguser name
United States
2007-07-16 05:45:59
I successfully run the whole-web crawl with the my new
ubuntu OS, and I am
ready to fix the bug.  I need someone to guide me to get the
most updated
source code and the bug assignment.

Thank you in advance!! 

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Shailendra Mudgal [mailto:mudgal.shailendragmail.com] 
Sent: Monday, July 16, 2007 3:05 AM
To: nutch-userlucene.apache.org; nutch-devlucene.apache.org
Subject: OOM error during parsing with nekohtml

Hi All,

We are getting an OOM Exception during the processing of
http:
//www.fotofinity.com/cgi-bin/homepages.cgi . We have
also applied
Nutch-497 patch to our source code. But actually the error
is coming during
the parse method.
Does anybody has any idea regarding this.  Here is the
complete stacktrace :

java.lang.OutOfMemoryError: Java heap space
	at java.lang.String.toUpperCase(String.java:2637)
	at java.lang.String.toUpperCase(String.java:2660)
	at
org.cyberneko.html.filters.NamespaceBinder.bindNamespaces(Na
mespaceBinder.ja
va:443)
	at
org.cyberneko.html.filters.NamespaceBinder.startElement(Name
spaceBinder.java
:252)
	at
org.cyberneko.html.HTMLTagBalancer.callStartElement(HTMLTagB
alancer.java:100
9)
	at
org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalan
cer.java:639)
	at
org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalan
cer.java:646)
	at
org.cyberneko.html.HTMLScanner$ContentScanner.scanStartEleme
nt(HTMLScanner.j
ava:2343)
	at
org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScann
er.java:1820)
	at
org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java
:789)
	at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration
.java:478)
	at
org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration
.java:431)
	at
org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragme
ntParser.java:16
4)
	at
org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.
java:265)
	at
org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java
:229)
	at
org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.j
ava:168)
	at
org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:84)
	at
org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:75
)
	at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
	at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1445)


Regards,
Shailendra


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )