List Info

Thread: is crawldb format in Nutch 0.8 compatible with Nutch0.7




is crawldb format in Nutch 0.8 compatible with Nutch0.7
user name
2007-01-23 13:26:35
Hi guys, I am running in some nightmares when trying to iterate over values in the Nutch 0.8.2 crawldb. I am getting some hadoop exception such as the following: 07/01/23 18:33:56 INFO conf.Configuration: parsing jar:file:/C:/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/hadoop-default.xm l 07/01/23 18:33:56 INFO conf.Configuration: parsing jar:file:/C:/nutch-0.8.2-dev/nutch-0.8.2-dev.jar!/nutch-default.xml 07/01/23 18:33:56 INFO conf.Configuration: parsing jar:file:/C:/nutch-0.8.2-dev/nutch-0.8.2-dev.jar!/nutch-site.xml Exception in thread "main" java.lang.ArithmeticException: / by zero at org.apache.hadoop.mapred.lib.HashPartitioner.getPartition(HashPartitioner.ja va:33) at org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.ja va:88) at org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:321) therefore, if I can iterate over the values contained in the crawldb using Nutch 0.7 API, I should think this will fix the issue. So the question is; is Nutch 0.8 backward compatible with Nutch 0.7.2 Thanks, Armel ------------------------------------------------- Armel T. Nene iDNA Solutions Tel: +44 (207) 257 6124 Mobile: +44 (788) 695 0483 http://blog.idna-solutions.com
Re: is crawldb format in Nutch 0.8 compatible with Nutch0.7
user name
2007-01-23 14:55:21
Armel T. Nene wrote:
> therefore, if I can iterate over the values contained
in the crawldb using
> Nutch 0.7 API, I should think this will fix the issue.
So the question is;
>
>  
>
> is Nutch 0.8 backward compatible with Nutch 0.7.2
>   


This subject was discussed several times in the past. The
answer is 
still no - Nutch 0.8 is NOT compatible with 0.7. It is
possible to write 
converters for some of the data, but it's a lot of tedious
work. The 
recommended upgrade path is to dump your 0.7 WebDB to a text
file, and 
then inject this text file to a 0.8 CrawlDb.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||/|  Information Retrieval, Semantic Web
___|||__||  |  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com



[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )