List Info

Thread: Re: Re: DBMail 2.3.0 released




Re: Re: DBMail 2.3.0 released
country flaguser name
United States
2007-12-29 02:49:43
Paul J Stevens wrote:
> and you can select
> the hash(es) you trust not to generate collisions. 
Any hash you choose (as long as it shorter than the
messages) WILL 
generate collisions. It is a mathematical fact. You can not
represent 
all possible attachments with a short hash value.
 (If you could, it wouldn't be a hash algorithm, it would be
a 
compression algorithm 

I re-iterate: regardless of which digest algorithm is
chosen, the code 
MUST be able to
detect and correctly handle collisions. Collisions WILL
occur, 
regardless of the algorithm
chosen. It is a mathematically provable fact.
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: Re: DBMail 2.3.0 released
country flaguser name
Netherlands
2007-12-29 08:11:27
Matija Grabnar wrote:
> Paul J Stevens wrote:
>> and you can select
>> the hash(es) you trust not to generate collisions.

> Any hash you choose (as long as it shorter than the
messages) WILL
> generate collisions. It is a mathematical fact. You can
not represent
> all possible attachments with a short hash value.
> (If you could, it wouldn't be a hash algorithm, it
would be a
> compression algorithm 

I think I understand the math. Adding support for stronger
hashes, and
compound hashes was done mostly to take the pressure off.

> I re-iterate: regardless of which digest algorithm is
chosen, the code
> MUST be able to
> detect and correctly handle collisions. Collisions WILL
occur,
> regardless of the algorithm
> chosen. It is a mathematically provable fact.

The basic concept of single-instance storage relies on being
able to
identify unique blobs. Using a hash is one way of quickly
generating
them. Detecting and dealing with collisions can be done
without too much
loss in performance, but it will take some really care and I
wont rush
it if you dont mind.



-- 
 
____________________________________________________________
____
  Paul Stevens                                      paul at
nfg.nl
  NET FACILITIES GROUP                     GPG/PGP:
1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: Re: DBMail 2.3.0 released
country flaguser name
United States
2007-12-29 13:09:10
On Sat, Dec 29, 2007, Matija Grabnar <matija+dbmailserverflow.com> said:

> Paul J Stevens wrote:
>> and you can select
>> the hash(es) you trust not to generate collisions.

> Any hash you choose (as long as it shorter than the
messages) WILL 
> generate collisions. It is a mathematical fact. You can
not represent 
> all possible attachments with a short hash value.
>  (If you could, it wouldn't be a hash algorithm, it
would be a 
> compression algorithm 
> 
> I re-iterate: regardless of which digest algorithm is
chosen, the code 
> MUST be able to
> detect and correctly handle collisions. Collisions WILL
occur, 
> regardless of the algorithm
> chosen. It is a mathematically provable fact.

The risk of collision is very small, but is real -- every
has acknowledged
that. Then there's the evaluation of the risk, which we also
all know to
be extremely small, and the cost of mitigation, which
involves a little
decision making and some engineering time (to do
byte-by-byte checking, or
to double hash, or both, or some other approach to be
thought up). It'll
happen, really.

Aaron
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: Re: DBMail 2.3.0 released
country flaguser name
Netherlands
2007-12-31 09:38:23
Matija Grabnar wrote:
> I re-iterate: regardless of which digest algorithm is
chosen, the code
> MUST be able to
> detect and correctly handle collisions. Collisions WILL
occur,
> regardless of the algorithm
> chosen. It is a mathematically provable fact.

For those of you who have been following this discussion:
I've done this
thing.

- we now use the cryptographic hash only to quickly locate
possibly
duplicate mime-parts, If the hash doesn't occur yet, a new
mimepart is
stored using the hash, but generating an auto-increment
bigint as it's
primary key. If the hash does occur, the insertion code
compares the
blobs to make sure no hash collision occurs on different
blobs.

- I've added support for a whole dumpload of hashes: we now
support md5,
sha1, sha256, sha512, tiger and whirlpool. Since I'm relying
on mhash
for this, it would be trivial to add other hashes like
ghost, but I'm
currently restricting things to the ones documenten on the
nessie (EU)
pages. Looking back, adding all these was probably not
really necessary
for single-instance storage, but libmhash is rock-solid and
widely
available, and I have a hunch they might come in handy along
the road.


-- 
 
____________________________________________________________
____
  Paul Stevens                                      paul at
nfg.nl
  NET FACILITIES GROUP                     GPG/PGP:
1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )