There was no result due to the fact it does not complete and
the process just hangs with zero processor utilization.
There was nothing in the logs to show you, but I took a
stack trace before killing the process completely and here
it is;
Full thread dump Java HotSpot(TM) 64-Bit Server VM
(diablo-1.5.0_07-b01 mixed mode):
"Low Memory Detector" daemon prio=5
tid=0x00000000006cfc00 nid=0x6d5800 runnable
[0x0000000000000000..0x0000000000000000]
"CompilerThread1" daemon prio=9
tid=0x00000000006c9c00 nid=0x6cf800 waiting on condition
[0x0000000000000000..0x00007fffff1f4320]
"CompilerThread0" daemon prio=9
tid=0x00000000006c3c00 nid=0x6c9800 waiting on condition
[0x0000000000000000..0x00007fffff2f5400]
"AdapterThread" daemon prio=9
tid=0x00000000006bac00 nid=0x6c3800 waiting on condition
[0x0000000000000000..0x0000000000000000]
"Signal Dispatcher" daemon prio=9
tid=0x00000000006a7c00 nid=0x6ba800 waiting on condition
[0x0000000000000000..0x0000000000000000]
"Finalizer" daemon prio=8 tid=0x00000000006a7000
nid=0x6a7800 in Object.wait()
[0x00007fffff5f9000..0x00007fffff5f9910]
at java.lang.Object.wait(Native Method)
- waiting on <0x00000008b7860ad0> (a
java.lang.ref.ReferenceQueue$Lock)
at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x00000008b7860ad0> (a
java.lang.ref.ReferenceQueue$Lock)
at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:1
59)
"Reference Handler" daemon prio=10
tid=0x000000000062b800 nid=0x62bc00 in Object.wait()
[0x00007fffff6fa000..0x00007fffff6fac90]
"main" prio=5 tid=0x0000000000516800 nid=0x516000
waiting on condition
[0x00007fffffffc000..0x00007fffffffd2f0]
at java.lang.Thread.sleep(Native Method)
at
org.apache.nutch.segment.SegmentReader.get(SegmentReader.jav
a:348)
at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.ja
va:590)
"VM Thread" prio=9 tid=0x000000000065f200
nid=0x62b400 runnable
"GC task thread#0 (ParallelGC)" prio=5
tid=0x0000000000527c00 nid=0x5af400 runnable
"GC task thread#1 (ParallelGC)" prio=5
tid=0x00000000005b5200 nid=0x5bd000 runnable
"VM Periodic Task Thread" prio=9
tid=0x0000000000527800 nid=0x6dc800 waiting on condition
----- Original Message ----
From: Do»acan Güney <dogacan gmail.com>
To: nutch-user lucene.apache.org
Sent: Sunday, June 17, 2007 8:28:39 AM
Subject: Re: Indexing problems in nutch-nightly
On 6/17/07, Sean Dean <seandean rogers.com> wrote:
> After the change to Indexer.java here is the more
verbose log of the error that seems to be happening during
the indexing phase;
>
> 2007-06-15 21:53:06,098 INFO indexer.Indexer - url=http://acadisc.com/, parseDa
> ta=Version: 5
> Status: failed(2,200): sun.io.MalformedInputException:
Missing byte-order mark
> Title:
> Outlinks: 0
> Content Metadata:
> Parse Metadata:
> 2007-06-15 21:53:06,101 WARN mapred.LocalJobRunner -
job_73pqhd
> java.lang.NullPointerException: value cannot be null
> at
org.apache.lucene.document.Field.<init>(Field.java:188
)
> at
org.apache.lucene.document.Field.<init>(Field.java:164
)
> at
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200)
> at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
> at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunn
er.java:1
> 55)
> 2007-06-15 21:53:06,845 FATAL indexer.Indexer -
Indexer: java.io.IOException: Jo
> b failed!
> at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604
)
> at
org.apache.nutch.indexer.Indexer.index(Indexer.java:280)
> at
org.apache.nutch.indexer.Indexer.run(Indexer.java:302)
> at
org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
> at
org.apache.nutch.indexer.Indexer.main(Indexer.java:285)
>
>
Can you also do a
readseg -get <segment> "http://acadisc.com/";
-nogenerate -noparsetext
-noparsedata
and send the result?
--
Do»acan Güney |