List Info

Thread: Re: Re: DBMail 2.3.0 released




Re: Re: DBMail 2.3.0 released
country flaguser name
United States
2007-12-31 15:12:38
While there has been quite a bit of discussion already on
this topic, 
and I know the purpose of this was to avoid collisions, has
there been 
any thought in a configuration option to skip the blob
comparison and 
assume they are the same? Perhaps a size threshold that has
to be 
exceeded to skip the check. My only thought is a large
attachment - say 
100-200 MB - has to be read from the db server and into
memory. This 
would be very taxing on the network and memory of the whole
system. For 
small shops with the db on the same box, its not that big of
an issue, 
but for larger shops with a lot of email traffic, it could
become an 
issue. A value of "0" should mean check every
part, while of value of 
"16777216" would mean check all parts less than 16
MB. As file sizes 
increase, the likely hood of a size and hash collision is
going to 
decrease, especially since larger attachments are rare
compared to the 
1.5 MB jpeg.

-Jon


Paul J Stevens wrote:
> Matija Grabnar wrote:
>   
>> I re-iterate: regardless of which digest algorithm
is chosen, the code
>> MUST be able to
>> detect and correctly handle collisions. Collisions
WILL occur,
>> regardless of the algorithm
>> chosen. It is a mathematically provable fact.
>>     
>
> For those of you who have been following this
discussion: I've done this
> thing.
>
> - we now use the cryptographic hash only to quickly
locate possibly
> duplicate mime-parts, If the hash doesn't occur yet, a
new mimepart is
> stored using the hash, but generating an auto-increment
bigint as it's
> primary key. If the hash does occur, the insertion code
compares the
> blobs to make sure no hash collision occurs on
different blobs.
>
> - I've added support for a whole dumpload of hashes: we
now support md5,
> sha1, sha256, sha512, tiger and whirlpool. Since I'm
relying on mhash
> for this, it would be trivial to add other hashes like
ghost, but I'm
> currently restricting things to the ones documenten on
the nessie (EU)
> pages. Looking back, adding all these was probably not
really necessary
> for single-instance storage, but libmhash is rock-solid
and widely
> available, and I have a hunch they might come in handy
along the road.
>
>
>   



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

RE: Re: DBMail 2.3.0 released
country flaguser name
Portugal
2007-12-31 20:04:26
On message size, i think most of systems won't accept per
email more than
20/30MB.
Most email clients or all, will crash with such message
size, specially the
*outlook family.


-----Original Message-----
From: dbmail-bouncesdbmail.org [mailto:dbmail-bouncesdbmail.org] On Behalf
Of Jonathan Feally
Sent: segunda-feira, 31 de Dezembro de 2007 21:13
To: DBMail mailinglist
Subject: Re: [Dbmail] Re: DBMail 2.3.0 released

While there has been quite a bit of discussion already on
this topic, 
and I know the purpose of this was to avoid collisions, has
there been 
any thought in a configuration option to skip the blob
comparison and 
assume they are the same? Perhaps a size threshold that has
to be 
exceeded to skip the check. My only thought is a large
attachment - say 
100-200 MB - has to be read from the db server and into
memory. This 
would be very taxing on the network and memory of the whole
system. For 
small shops with the db on the same box, its not that big of
an issue, 
but for larger shops with a lot of email traffic, it could
become an 
issue. A value of "0" should mean check every
part, while of value of 
"16777216" would mean check all parts less than 16
MB. As file sizes 
increase, the likely hood of a size and hash collision is
going to 
decrease, especially since larger attachments are rare
compared to the 
1.5 MB jpeg.

-Jon


Paul J Stevens wrote:
> Matija Grabnar wrote:
>   
>> I re-iterate: regardless of which digest algorithm
is chosen, the code
>> MUST be able to
>> detect and correctly handle collisions. Collisions
WILL occur,
>> regardless of the algorithm
>> chosen. It is a mathematically provable fact.
>>     
>
> For those of you who have been following this
discussion: I've done this
> thing.
>
> - we now use the cryptographic hash only to quickly
locate possibly
> duplicate mime-parts, If the hash doesn't occur yet, a
new mimepart is
> stored using the hash, but generating an auto-increment
bigint as it's
> primary key. If the hash does occur, the insertion code
compares the
> blobs to make sure no hash collision occurs on
different blobs.
>
> - I've added support for a whole dumpload of hashes: we
now support md5,
> sha1, sha256, sha512, tiger and whirlpool. Since I'm
relying on mhash
> for this, it would be trivial to add other hashes like
ghost, but I'm
> currently restricting things to the ones documenten on
the nessie (EU)
> pages. Looking back, adding all these was probably not
really necessary
> for single-instance storage, but libmhash is rock-solid
and widely
> available, and I have a hunch they might come in handy
along the road.
>
>
>   



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

_______________________________________________
DBmail mailing list
DBmaildbmail.org
htt
ps://mailman.fastxs.nl/mailman/listinfo/dbmail

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )