Try setting your child opts to -Xmx512M or higher. This
config variable
is found in the hadoop-default.xml. AFAIK there is no way
to change the
memory options for a single stage.
Dennis Kubes
Daniel Clark wrote:
> I received the following error during the linkdb stage
of indexing. Has
> anyone encountered this before? Is there a way of
increasing memory for
> this stage in config file? Is there a known linkdb
memory leak problem?
>
>
>
> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb:
starting
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb:
linkdb: crawl/linkdb
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb:
URL normalize: true
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb:
URL filter: true
>
> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb:
adding segment:
> /user/daclark/crawl/segments/20071008185033
>
> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader -
Unable to load
> native-hadoop library for your platform... using
builtin-java classes where
> applicable
>
> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader -
Unable to load
> native-hadoop library for your platform... using
builtin-java classes where
> applicable
>
> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker -
Error running child
>
> java.lang.OutOfMemoryError: Java heap space
>
> at
>
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.ja
va:95)
>
> at
java.io.DataOutputStream.write(DataOutputStream.java:90)
>
> at
org.apache.hadoop.io.Text.writeString(Text.java:399)
>
> at
org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>
> at
org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>
> at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(Map
Task.java:315)
>
> at
org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>
> at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>
> at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>
> at
>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.
java:1445)
>
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb:
java.io.IOException:
> Job failed!
>
> at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604
)
>
> at
org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>
> at
org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>
> at
org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>
> at
org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~
>
> Daniel Clark, President
>
> DAC Systems, Inc.
>
> (703) 403-0340
>
> ~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
|