List Info

Thread: MD5 vs TextProfile Signature




MD5 vs TextProfile Signature
user name
2007-11-06 18:27:45
Hi,
Wondering which does a better job - MD5 or TextProfile
signature? From what
I get from the apis and if there is content on a page, MD5
calculates the
raw binary content of a page and TextProfile calculates the
plain text
profile of the page. I believe the values calculated are
used to delete
duplicates. 

Wouldn't be better if pages that contain unwanted characters
- where these
characters are removed before doing a hash of them (because
content on two
pages could be same, except they differ with these unwanted
characters) ?

Thanks,
Karthik
-- 
View this message in context: http://www.nabble.com/MD5-vs-TextProf
ile-Signature-tf4761944.html#a13619085
Sent from the Nutch - Dev mailing list archive at
Nabble.com.


[1]

about | contact  Other archives ( Real Estate discussion Medical topics )