Thanks for your reply. I'm wondering, is it possible to skip
this
dedup phase then, or to not acquire a lock? The reason I'd
like to
use 0.14 code is that I've instrumented it to add some
tracing and
I'd like to collect traces of how Nutch uses Hadoop. It may
be
possible to port the changes back to 0.12 but I'd prefer not
to
because I may have other apps that use things in 0.14 and
because I
want to trace the best-performing Hadoop version possible.
Matei
On Oct 18, 2007, at 12:58 AM, Nguyen Manh Tien wrote:
> You should you hadoop 0.12.3 for example to dedup. The
current version
> 0.14.x don't support Lock operation.
>
> 2007/10/18, Matei Zaharia <matei eecs.berkeley.edu>:
>>
>> Hi,
>>
>> I'm sometimes getting the following error in the
dedup 3 job when
>> running Nutch 0.9 on top of Hadoop 0.14.2:
>>
>> java.io.IOException: Lock obtain timed out:
Lock hdfs://r37:54310/
>> user/matei/crawl4/indexes/part-00000/write.lock
>> at
org.apache.lucene.store.Lock.obtain(Lock.java:69)
>> at
org.apache.lucene.index.IndexReader.aquireWriteLock
>> (IndexReader.java:526)
>> at
org.apache.lucene.index.IndexReader.deleteDocument
>> (IndexReader.java:551)
>> at
org.apache.nutch.indexer.DeleteDuplicates.reduce
>> (DeleteDuplicates.java:378)
>> at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
>> 322)
>> at
org.apache.hadoop.mapred.TaskTracker$Child.main(
>> TaskTracker.java:
>> 1782)
>>
>> Other times, it works just fine. Do you know why
this is happening?
>>
>> Thanks,
>>
>> Matei Zaharia
>>
|