List Info

Thread: Nutch Dedup Question




Nutch Dedup Question
country flaguser name
United States
2007-09-20 10:36:16
Hi,

I am little confused about what exactly dedup does?

a. Does dedup delete duplicate documents from Index and
Segments?

b. Is there a way that we could delete duplicated documents
for two
segments? 

Let me know. Thanks.

-- 
View this message in context: http://www.nabble.com/Nutch-Dedup-Question-tf
4488321.html#a12799680
Sent from the Nutch - User mailing list archive at
Nabble.com.


Re: Nutch Dedup Question
country flaguser name
Poland
2007-09-20 11:47:27
karthik085 wrote:
> Hi,
> 
> I am little confused about what exactly dedup does?
> 
> a. Does dedup delete duplicate documents from Index and
Segments?

Only from the index.

> 
> b. Is there a way that we could delete duplicated
documents for two
> segments? 

bin/nutch mergesegs


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||/|  Information Retrieval, Semantic Web
___|||__||  |  ||  |  Embedded Unix, System Integration
http://www.sigram.com 
Contact: info at sigram dot com


Re: Nutch Dedup Question
country flaguser name
United States
2007-09-20 11:55:47
Thanks - that's much clearer.


Andrzej Bialecki wrote:
> 
> karthik085 wrote:
>> Hi,
>> 
>> I am little confused about what exactly dedup
does?
>> 
>> a. Does dedup delete duplicate documents from Index
and Segments?
> 
> Only from the index.
> 
>> 
>> b. Is there a way that we could delete duplicated
documents for two
>> segments? 
> 
> bin/nutch mergesegs
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _  
__________________________________
> [__ || __|__/|__||/|  Information Retrieval, Semantic
Web
> ___|||__||  |  ||  |  Embedded Unix, System
Integration
> http://www.sigram.com 
Contact: info at sigram dot com
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Nutch-Dedup-Question-tf
4488321.html#a12801358
Sent from the Nutch - User mailing list archive at
Nabble.com.


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )