|
List Info
Thread: Private FETCH items
|
|
| Private FETCH items |
  United States |
2007-08-22 14:05:18 |
Folks,
I was recently given a patch to Cyrus IMAP which adds 3 new
FETCH items
(GUID, RFC822.MD5, and RFC822.FILESIZE) for a message which
helps with
consistency checking when a message store is replicated.
I'm considering applying this patch, since it will be useful
for CMU and
others, but I'd like to solicit opinions regarding adding
new FETCH
items vs. making these items new ANNOTATE entries (although
Cyrus
doesn't currently implement ANNOTATE).
FYI, the difference between RFC822.SIZE and RFC822.FILESIZE
is that the
former uses the cached size in the message index file, and
the latter
stat()s for the file size at the time of the request. I'm
not thrilled
with the name of this item, so I'd be open to suggestions.
--
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  Canada |
2007-08-22 14:27:15 |
On Aug 22, 2007, at 12:05 PM, Ken Murchison wrote:
> FYI, the difference between RFC822.SIZE and
RFC822.FILESIZE is that
> the former uses the cached size in the message index
file, and the
> latter stat()s for the file size at the time of the
request.
.FILESIZE is just asking for trouble. Nobody cares what
the on-disk
representation is, and comparing file sizes is NOT the same
as
comparing message content.
.MD5 is useful in the general sense as a fast way to compare
messages
for equality, although I think it would be better served in
that
context if you could MD5 the body content without the 822
headers to
be able to handle Received: differences for the same message
delivered to two or more folders via different [SL]MTP
transactions.
It would probably make more sense to be able to generically
MD5 by
MIME body part.
What's a GUID?
But overall I guess I have to question why it's necessary to
burden
the protocol with this just because somebody wrote some
broken code.
I have done an awful lot of server migrations over the years
and I
have never yet come up against the problem this is trying to
detect.
I realize you intend this as private data, but once it's in
the wild,
it will get used, for better and worse.
--lyndon
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  United Kingdom |
2007-08-22 16:13:42 |
On Wed Aug 22 20:27:15 2007, Lyndon Nerenberg wrote:
> What's a GUID?
>
>
Past participle of the verb "to gooey".
> But overall I guess I have to question why it's
necessary to burden
> the protocol with this just because somebody wrote
some broken
> code. I have done an awful lot of server migrations
over the
> years and I have never yet come up against the problem
this is
> trying to detect. I realize you intend this as private
data, but
> once it's in the wild, it will get used, for better
and worse.
I can understand why adding private fetchable data would be
in some
way useful, though, irrespective of this particular case.
I'd personally opt for an X- prefix, at the very least.
X-CMU- might
be better. Then you have X-CMU-UID, X-CMU-CRC,
X-CMU-STATSIZE,
perhaps.
Note that I've proposed "CRC" instead of
explicitly naming MD5,
partly in case people are tempted to use it in clients, and
partly to
stop the flurry of emails you'll otherwise get telling you
that MD5
is insecure.
Dave.
--
Dave Cridland - mailto:dave cridland.net - xmpp:dwd jabber.org
-
acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
a>
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  United States |
2007-08-22 16:14:14 |
On Wed, 22 Aug 2007, Lyndon Nerenberg wrote:
> .FILESIZE is just asking for trouble. Nobody cares
what the on-disk
> representation is, and comparing file sizes is NOT the
same as comparing
> message content.
I agree. The c-client library (the beating heart of UW
imapd, Pine, etc.)
has a concept of an internal size, but that depends upon the
mail store
and thus is only used by mail store drivers. It never gets
as far as
imapd (or other application), much less exported in the
protocol.
Let us be clear here; any .FILESIZE value in any server
which supports
multiple mail store formats could (and WOULD) vary depending
upon the
mailbox format with no obvious indication to the user as to
why. This, by
itself, should be enough cause to reject it.
> .MD5 is useful in the general sense as a fast way to
compare messages for
> equality, although I think it would be better served in
that context if you
> could MD5 the body content without the 822 headers to
be able to handle
> Received: differences for the same message delivered to
two or more folders
> via different [SL]MTP transactions. It would probably
make more sense to be
> able to generically MD5 by MIME body part.
IMHO, this functionality belongs as part of CONVERT. An MD5
checksum is,
in effect, a conversion.
> What's a GUID?
Presumably this is a global UID which uniquely and
permanently identifies
the message on the server.
I think that IMAP should eventually have a GUID, but it
needs careful
design and review. I don't think that we should take the
first patch that
comes along.
> But overall I guess I have to question why it's
necessary to burden the
> protocol with this just because somebody wrote some
broken code. I have done
> an awful lot of server migrations over the years and I
have never yet come up
> against the problem this is trying to detect. I realize
you intend this as
> private data, but once it's in the wild, it will get
used, for better and
> worse.
I agree with this sentiment.
-- Mark --
http://staff.washingt
on.edu/mrc
Science does not emerge from voting, party politics, or
public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  United States |
2007-08-23 09:32:53 |
Lyndon Nerenberg wrote:
>
> On Aug 22, 2007, at 12:05 PM, Ken Murchison wrote:
>
>> FYI, the difference between RFC822.SIZE and
RFC822.FILESIZE is that
>> the former uses the cached size in the message
index file, and the
>> latter stat()s for the file size at the time of the
request.
>
> .FILESIZE is just asking for trouble. Nobody cares
what the on-disk
> representation is, and comparing file sizes is NOT the
same as comparing
> message content.
True. The author of the patch may be using this for
something other
than what I was thinking. I'll have to ask.
> .MD5 is useful in the general sense as a fast way to
compare messages
> for equality, although I think it would be better
served in that context
> if you could MD5 the body content without the 822
headers to be able to
> handle Received: differences for the same message
delivered to two or
> more folders via different [SL]MTP transactions. It
would probably make
> more sense to be able to generically MD5 by MIME body
part.
I probably wasn't clear in the use case. Cyrus now has the
ability to
replicate a mailstore in near realtime. Several sites
(including CMU in
the near future) are using this to have redundant mailstores
in case of
a hardware failure. Rather than create a new tool or
protocol to verify
the consistency of the mailstore pairs, extending IMAP with
a couple of
new FETCH items seems like a reasonable thing to do.
> What's a GUID?
As Mark deduced, its Global Unique ID
--
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  United States |
2007-08-23 09:34:06 |
Arnt Gulbrandsen wrote:
> Sounds to me as the patch author really wants a FETCH
COOKIE instead of
> MD5/FILESIZE, returning an opaque cookie which should
be the same if the
> same message is present in two mailboxes/servers and
different otherwise.
Correct.
> What does the GUID do?
Uniquely identifies a message so that if the same message is
present in
more than one mailbox, its contents only gets replicated
once.
--
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  United States |
2007-08-24 15:42:18 |
Arnt Gulbrandsen wrote:
> Ken Murchison writes:
>> Arnt Gulbrandsen wrote:
>>> What does the GUID do?
>>
>> Uniquely identifies a message so that if the same
message is present
>> in more than one mailbox, its contents only gets
replicated once.
>
> I'm afraid I phrased the question badly. What does the
code do when
> asked for a GUID?
Pulls the message GUID out of the index file for the
mailbox.
--
Kenneth Murchison
Systems Programmer
Project Cyrus Developer/Maintainer
Carnegie Mellon University
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  United Kingdom |
2007-08-27 05:31:20 |
On Thu, 23 Aug 2007, Ken Murchison wrote:
>
> I probably wasn't clear in the use case. Cyrus now has
the ability to
> replicate a mailstore in near realtime. Several sites
(including CMU in the
> near future) are using this to have redundant
mailstores in case of a hardware
> failure. Rather than create a new tool or protocol to
verify the consistency
> of the mailstore pairs, extending IMAP with a couple of
new FETCH items seems
> like a reasonable thing to do.
I should note that the MD5 replica consistency check has
been very useful
in finding lurking bugs in the code, both the original
version and across
versions as the code has evolved. It's also good for
discovering bit-
smashing hardware failures that haven't (yet) affected lower
levels of the
os/driver/raid stack. You can only be sure of data integrity
if you test
it end-to-end.
Tony.
--
f.a.n.finch <dot dotat.at> http://dotat.at/
IRISH SEA: SOUTHERLY, BACKING NORTHEASTERLY FOR A TIME, 3 OR
4. SLIGHT OR
MODERATE. SHOWERS. MODERATE OR GOOD, OCCASIONALLY POOR.
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: Private FETCH items |
  Canada |
2007-12-27 16:16:07 |
On 2007-Dec-27, at 15:09 , Bron Gondwana wrote:
> You're lucky if you've never seen file corruption due
to bitrot on
> large
> drive arrays. We have about 30TB online at the moment
and that's
> growing fairly rapidly. The research out there
suggests we're
> likely to
> have a block level corruption every couple of months
with the current
> level of affordable component reliability.
But that's a local hardware issue, and should be handled
locally to
the host. I.e. have the IMAP server checksum the files when
writing to
disk, and verify that checksum when reading the data. Doing
random
checks via the IMAP protocol won't prevent you from offering
corrupted
data to clients. You have the right solution, but you're
applying it
at the wrong layer.
--lyndon
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
[1-9]
|
|