List Info

Thread: Re: Indexing problems in nutch-nightly




Re: Indexing problems in nutch-nightly
country flaguser name
Canada
2007-06-17 16:58:31
There was no result due to the fact it does not complete and
the process just hangs with zero processor utilization.
There was nothing in the logs to show you, but I took a
stack trace before killing the process completely and here
it is;
 
Full thread dump Java HotSpot(TM) 64-Bit Server VM
(diablo-1.5.0_07-b01 mixed mode):
"Low Memory Detector" daemon prio=5
tid=0x00000000006cfc00 nid=0x6d5800 runnable
[0x0000000000000000..0x0000000000000000]
"CompilerThread1" daemon prio=9
tid=0x00000000006c9c00 nid=0x6cf800 waiting on condition
[0x0000000000000000..0x00007fffff1f4320]
"CompilerThread0" daemon prio=9
tid=0x00000000006c3c00 nid=0x6c9800 waiting on condition
[0x0000000000000000..0x00007fffff2f5400]
"AdapterThread" daemon prio=9
tid=0x00000000006bac00 nid=0x6c3800 waiting on condition
[0x0000000000000000..0x0000000000000000]
"Signal Dispatcher" daemon prio=9
tid=0x00000000006a7c00 nid=0x6ba800 waiting on condition
[0x0000000000000000..0x0000000000000000]
"Finalizer" daemon prio=8 tid=0x00000000006a7000
nid=0x6a7800 in Object.wait()
[0x00007fffff5f9000..0x00007fffff5f9910]
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000008b7860ad0> (a
java.lang.ref.ReferenceQueue$Lock)
        at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x00000008b7860ad0> (a
java.lang.ref.ReferenceQueue$Lock)
        at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:1
59)
"Reference Handler" daemon prio=10
tid=0x000000000062b800 nid=0x62bc00 in Object.wait()
[0x00007fffff6fa000..0x00007fffff6fac90]
"main" prio=5 tid=0x0000000000516800 nid=0x516000
waiting on condition
[0x00007fffffffc000..0x00007fffffffd2f0]
        at java.lang.Thread.sleep(Native Method)
        at
org.apache.nutch.segment.SegmentReader.get(SegmentReader.jav
a:348)
        at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.ja
va:590)
"VM Thread" prio=9 tid=0x000000000065f200
nid=0x62b400 runnable
"GC task thread#0 (ParallelGC)" prio=5
tid=0x0000000000527c00 nid=0x5af400 runnable
"GC task thread#1 (ParallelGC)" prio=5
tid=0x00000000005b5200 nid=0x5bd000 runnable
"VM Periodic Task Thread" prio=9
tid=0x0000000000527800 nid=0x6dc800 waiting on condition




----- Original Message ----
From: Do»acan Güney <dogacangmail.com>
To: nutch-userlucene.apache.org
Sent: Sunday, June 17, 2007 8:28:39 AM
Subject: Re: Indexing problems in nutch-nightly


On 6/17/07, Sean Dean <seandeanrogers.com> wrote:
> After the change to Indexer.java here is the more
verbose log of the error that seems to be happening during
the indexing phase;
>
> 2007-06-15 21:53:06,098 INFO  indexer.Indexer - url=http://acadisc.com/, parseDa
> ta=Version: 5
> Status: failed(2,200): sun.io.MalformedInputException:
Missing byte-order mark
> Title:
> Outlinks: 0
> Content Metadata:
> Parse Metadata:
> 2007-06-15 21:53:06,101 WARN  mapred.LocalJobRunner -
job_73pqhd
> java.lang.NullPointerException: value cannot be null
>         at
org.apache.lucene.document.Field.<init>(Field.java:188
)
>         at
org.apache.lucene.document.Field.<init>(Field.java:164
)
>         at
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200)
>         at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
>         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunn
er.java:1
> 55)
> 2007-06-15 21:53:06,845 FATAL indexer.Indexer -
Indexer: java.io.IOException: Jo
> b failed!
>         at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604
)
>         at
org.apache.nutch.indexer.Indexer.index(Indexer.java:280)
>         at
org.apache.nutch.indexer.Indexer.run(Indexer.java:302)
>         at
org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>         at
org.apache.nutch.indexer.Indexer.main(Indexer.java:285)
>
>

Can you also do a

readseg -get <segment> "http://acadisc.com/";
 -nogenerate -noparsetext
-noparsedata

and send the result?

-- 
Do»acan Güney
Re: Indexing problems in nutch-nightly
user name
2007-06-18 01:01:14
On 6/18/07, Sean Dean <seandeanrogers.com> wrote:
> There was no result due to the fact it does not
complete and the process just hangs with zero processor
utilization. There was nothing in the logs to show you, but
I took a stack trace before killing the process completely
and here it is;
>
> Full thread dump Java HotSpot(TM) 64-Bit Server VM
(diablo-1.5.0_07-b01 mixed mode):
> "Low Memory Detector" daemon prio=5
tid=0x00000000006cfc00 nid=0x6d5800 runnable
[0x0000000000000000..0x0000000000000000]
> "CompilerThread1" daemon prio=9
tid=0x00000000006c9c00 nid=0x6cf800 waiting on condition
[0x0000000000000000..0x00007fffff1f4320]
> "CompilerThread0" daemon prio=9
tid=0x00000000006c3c00 nid=0x6c9800 waiting on condition
[0x0000000000000000..0x00007fffff2f5400]
> "AdapterThread" daemon prio=9
tid=0x00000000006bac00 nid=0x6c3800 waiting on condition
[0x0000000000000000..0x0000000000000000]
> "Signal Dispatcher" daemon prio=9
tid=0x00000000006a7c00 nid=0x6ba800 waiting on condition
[0x0000000000000000..0x0000000000000000]
> "Finalizer" daemon prio=8
tid=0x00000000006a7000 nid=0x6a7800 in Object.wait()
[0x00007fffff5f9000..0x00007fffff5f9910]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000008b7860ad0> (a
java.lang.ref.ReferenceQueue$Lock)
>         at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>         - locked <0x00000008b7860ad0> (a
java.lang.ref.ReferenceQueue$Lock)
>         at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>         at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:1
59)
> "Reference Handler" daemon prio=10
tid=0x000000000062b800 nid=0x62bc00 in Object.wait()
[0x00007fffff6fa000..0x00007fffff6fac90]
> "main" prio=5 tid=0x0000000000516800
nid=0x516000 waiting on condition
[0x00007fffffffc000..0x00007fffffffd2f0]
>         at java.lang.Thread.sleep(Native Method)
>         at
org.apache.nutch.segment.SegmentReader.get(SegmentReader.jav
a:348)
>         at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.ja
va:590)
> "VM Thread" prio=9 tid=0x000000000065f200
nid=0x62b400 runnable
> "GC task thread#0 (ParallelGC)" prio=5
tid=0x0000000000527c00 nid=0x5af400 runnable
> "GC task thread#1 (ParallelGC)" prio=5
tid=0x00000000005b5200 nid=0x5bd000 runnable
> "VM Periodic Task Thread" prio=9
tid=0x0000000000527800 nid=0x6dc800 waiting on condition
>
>
>

Ah, non-debuggable problems.... so much fun

Anyway, it seems you are running into the problem described
here:
http://www.nabble.com/bug-in-SegmentReader-tf3788992.ht
ml

I have put up a "patchified" version here:
http://www.ceng.metu.edu.tr/~e1345172/segment_rea
der_hang.patch

Can you retry with this patch?

Thanks!

-- 
Doğacan Güney
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )