List Info

Thread: Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from




Commented: (NUTCH-433) java.io.EOFException in newer nightlies in mergesegs or indexing from
user name
2007-01-25 11:08:49
[ https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467471 ] Brian Whitman commented on NUTCH-433: ------------------------------------- This is still not fixed in the latest nightly -- http://people.apache.org/builds/lucene/nutch/nightly/nutch-2007-01-25.tar.gz -- same error. Also tried the svn trunk, no change. I imagine it's because it's a hadoop issue and not a nutch one, but the nutch nightly package should include the latest hadoop as well. > java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer > ------------------------------------------------------------------------------------------------ > > Key: NUTCH-433 > URL: https://issues.apache.org/jira/browse/NUTCH-433 > Project: Nutch > Issue Type: Bug > Components: generator, indexer > Affects Versions: 0.9.0 > Environment: Both Linux/i686 and Mac OS X PPC/Intel, but platform independent > Reporter: Brian Whitman > Assigned To: Sami Siren > Priority: Critical > Fix For: 0.9.0 > > > The nightly builds have not been working at all for the past couple of weeks. Sami Siren has narrowed it down to HADOOP-331. > To replicate: download the nightly, then: > bin/nutch inject crawl/crawldb urls/ # a single URL is in urls/urls -- http://apache.org > bin/nutch generate crawl/crawldb crawl/segments > bin/nutch fetch crawl/segments/2007... > bin/nutch updatedb crawl/crawldb crawl/segments/2007... > # generate a new segment with 5 URIs > bin/nutch generate crawl/crawldb crawl/segments -topN 5 > bin/nutch fetch crawl/segments/2007... # new segment > bin/nutch updatedb crawl/crawldb crawl/segments/2007... # new segment > # merge the segments and index > bin/nutch mergesegs crawl/merged -dir crawl/segments > .. > We get a crash in the mergesegs. This crash, with the exact same script and start URI, configuration and plugins, does not happen on a nightly from early January. > 2007-01-18 14:57:11,411 INFO segment.SegmentMerger - Merging 2 segments to crawl/merged_07_01_18_14_56_22/20070118145711 > 2007-01-18 14:57:11,482 INFO segment.SegmentMerger - SegmentMerger: adding crawl/segments/20070118145628 > 2007-01-18 14:57:11,489 INFO segment.SegmentMerger - SegmentMerger: adding crawl/segments/20070118145641 > 2007-01-18 14:57:11,495 INFO segment.SegmentMerger - SegmentMerger: using segment data from: content crawl_generate crawl_fetch crawl_parse parse_data parse_text > 2007-01-18 14:57:11,594 INFO mapred.InputFormatBase - Total input paths to process : 12 > 2007-01-18 14:57:11,819 INFO mapred.JobClient - Running job: job_5ug2ip > 2007-01-18 14:57:12,073 WARN mapred.LocalJobRunner - job_5ug2ip > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57) > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91) > at org.apache.hadoop.io.UTF8.readChars(UTF8.java:212) > at org.apache.hadoop.io.UTF8.readString(UTF8.java:204) > at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:173) > at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:61) > at org.apache.nutch.metadata.MetaWrapper.readFields(MetaWrapper.java:100) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spill(MapTask.java:427) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:385) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$200(MapTask.java:239) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:109) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )