List Info

Thread: Re: linkdb - Out of Memory Error




Re: linkdb - Out of Memory Error
user name
2007-10-09 11:55:27
Try setting your child opts to -Xmx512M or higher.  This
config variable 
is found in the hadoop-default.xml.  AFAIK there is no way
to change the 
  memory options for a single stage.

Dennis Kubes

Daniel Clark wrote:
> I received the following error during the linkdb stage
of indexing.  Has
> anyone encountered this before?  Is there a way of
increasing memory for
> this stage in config file?  Is there a known linkdb
memory leak problem?
> 
>  
> 
> 2007-10-09 10:56:37,787 INFO  crawl.LinkDb - LinkDb:
starting
> 
> 2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb:
linkdb: crawl/linkdb
> 
> 2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb:
URL normalize: true
> 
> 2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb:
URL filter: true
> 
> 2007-10-09 10:56:37,886 INFO  crawl.LinkDb - LinkDb:
adding segment:
> /user/daclark/crawl/segments/20071008185033
> 
> 2007-10-09 10:56:39,977 WARN  util.NativeCodeLoader -
Unable to load
> native-hadoop library for your platform... using
builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:42,495 WARN  util.NativeCodeLoader -
Unable to load
> native-hadoop library for your platform... using
builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:51,415 WARN  mapred.TaskTracker -
Error running child
> 
> java.lang.OutOfMemoryError: Java heap space
> 
>         at
>
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.ja
va:95)
> 
>         at
java.io.DataOutputStream.write(DataOutputStream.java:90)
> 
>         at
org.apache.hadoop.io.Text.writeString(Text.java:399)
> 
>         at
org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
> 
>         at
org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
> 
>         at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(Map
Task.java:315)
> 
>         at
org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
> 
>         at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 
>         at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> 
>         at
>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1445)
> 
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb:
java.io.IOException:
> Job failed!
> 
>         at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604
)
> 
>         at
org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
> 
>         at
org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
> 
>         at
org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
> 
>         at
org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
> 
>  
> 
>  
> 
>  
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> Daniel Clark, President
> 
> DAC Systems, Inc.
> 
> (703) 403-0340
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
>  
> 
> 

Re: linkdb - Out of Memory Error
user name
2007-10-16 09:57:37
I am getting the same out of memory exception in linkdb. I
have a configuration of 4 machines running Nutch0.9 trunk.
   
  Please let me know if you found a way to resolve this
issue. All tasks (master and slaves) are running with
-Xmx1000m option and I am reluctant to increase heap size
further.
   
  Thanks.

Dennis Kubes <kubesapache.org> wrote:
  Try setting your child opts to -Xmx512M or higher. This
config variable 
is found in the hadoop-default.xml. AFAIK there is no way to
change the 
memory options for a single stage.

Dennis Kubes

Daniel Clark wrote:
> I received the following error during the linkdb stage
of indexing. Has
> anyone encountered this before? Is there a way of
increasing memory for
> this stage in config file? Is there a known linkdb
memory leak problem?
> 
> 
> 
> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb:
starting
> 
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb:
linkdb: crawl/linkdb
> 
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL
normalize: true
> 
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL
filter: true
> 
> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb:
adding segment:
> /user/daclark/crawl/segments/20071008185033
> 
> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader -
Unable to load
> native-hadoop library for your platform... using
builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader -
Unable to load
> native-hadoop library for your platform... using
builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error
running child
> 
> java.lang.OutOfMemoryError: Java heap space
> 
> at
>
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.ja
va:95)
> 
> at
java.io.DataOutputStream.write(DataOutputStream.java:90)
> 
> at
org.apache.hadoop.io.Text.writeString(Text.java:399)
> 
> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
> 
> at
org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
> 
> at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(Map
Task.java:315)
> 
> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
> 
> at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 
> at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> 
> at
>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1445)
> 
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb:
java.io.IOException:
> Job failed!
> 
> at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604
)
> 
> at
org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
> 
> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
> 
> at
org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
> 
> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
> 
> 
> 
> 
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> Daniel Clark, President
> 
> DAC Systems, Inc.
> 
> (703) 403-0340
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
> 


       
---------------------------------
Looking for a deal? Find great prices on flights and hotels
with Yahoo! FareChase.
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )