Matija Grabnar writes
> That is going to lead to trouble. Some years ago I had
occasion to
> calculate checksums
> of very large number of files (looking to remove
duplicates).
> I discovered, to my dismay, that
> a) I was getting collisions (same checksum) on files
which were
> obviously different (because they were different
size).
With checksums, collisions are to be expected. The purpose
of a
checksum is to ensure that a file hasn't been damaged after
being
transmitted through a network or copied onto a medium, for
example.
It's not meant to identify duplicates between a large number
of files.
But SHA1 is not a checksum, it's a cryptographic hash.
I'm not an expert in the field, but from reading what is
available on
the web, I gather that the probability of two different
files sharing
"accidentally" the same SHA1 hash is believed to
be about 1/2^80.
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.o
rg
_______________________________________________
DBmail mailing list
DBmail dbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail
|