|
List Info
Thread: SEARCH SUBJECT
|
|
| SEARCH SUBJECT |
  Finland |
2008-02-21 13:43:43 |
SEARCH SUBJECT is defined to match to envelope's subject
field.
ENVELOPE's SUBJECT may not be an exact match to Subject:
header itself.
RFC 3501 doesn't seem to give a clear definition how it
should be
generated, so many servers at least compress LWSP to single
spaces.
SEARCH HEADER is defined to match to the header's value.
So if we have a message:
Subject: hello<TAB>world
And the server returns ENVELOPE's subject field with the
<TAB> replaced
with a space ("hello world").
Now I think SEARCHes should work like:
SEARCH SUBJECT "hello world" -> match
SEARCH SUBJECT "hello<TAB>world" ->
non-match
SEARCH HEADER subject "hello world" ->
non-match
SEARCH HEADER subject "hello<TAB>world"
-> match
Right?
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  United States |
2008-02-21 14:37:06 |
My opinion is that most searches are user-originated. A
few, like flag
searches, are used in server processes (to discover all the
unseen
messages, for instance, or to limit retrieval to a certain
date
threshold), but most are user-originated. Because of that,
the point is
more to do what's likely to make users happy, to give them
what they
expect... than it is to exactly match some picky spec.
Therefore, I'd say that any server that normalized white
space in a text
search would be doing everyone a favour, whether or not it's
to-the-letter "compliant" to anything. The same
for a search that
spanned "lines" (CRLF boundaries), treated email
addresses
intelligently, or normalized parts of speech or conjugations
(treating
"swim", "swims", and "swam" as
the same word, say).
None of that sort of thing is likely to have any
interoperability
effect. It's just likely to help users find what they're
looking for.
Barry
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  United States |
2008-02-21 15:56:21 |
I agree. I feel that a server is allowed to implement fuzzy
string
searching, with case-independence being the only absolute
requirement.
More specifically, I never intended to forbid fuzzy
matching, and
deliberately left it open-ended to allow implementations to
experiment
with what worked best. [Google considered it good news when
I told them
this was something in their server that I thought did NOT
need fixing!]
This means that SEARCH compliance testing can only test for
false
negatives; that is, for failure to match cases that both a
rigid and a
fuzzy server would catch.
Clearly, if a message has
Subject: Hello<tab>world
then
tag SEARCH SUBJECT "Hello<tab>world"
and
tag SEARCH HEADER "SUBJECT"
"Hello<tab>world"
and
tag SEARCH SUBJECT "HELLO<tab>WORLD"
and
tag SEARCH HEADER "SUBJECT"
"hello<tab>WoRlD"
should all match, but it is server-dependent if
tag SEARCH SUBJECT "Hello world"
and
tag SEARCH HEADER "SUBJECT" "Hello,
world"
and
tag SEARCH SUBJECT "hi, planet!"
and
tag SEARCH HEADER "SUBJECT" "konnichi ha,
seikai"
match. [The last two being extreme examples that I wouldn't
expect to
work.]
On Thu, 21 Feb 2008, Barry Leiba wrote:
> My opinion is that most searches are user-originated.
A few, like flag
> searches, are used in server processes (to discover all
the unseen messages,
> for instance, or to limit retrieval to a certain date
threshold), but most
> are user-originated. Because of that, the point is
more to do what's likely
> to make users happy, to give them what they expect...
than it is to exactly
> match some picky spec.
>
> Therefore, I'd say that any server that normalized
white space in a text
> search would be doing everyone a favour, whether or not
it's to-the-letter
> "compliant" to anything. The same for a
search that spanned "lines" (CRLF
> boundaries), treated email addresses intelligently, or
normalized parts of
> speech or conjugations (treating "swim",
"swims", and "swam" as the same
> word, say).
>
> None of that sort of thing is likely to have any
interoperability effect.
> It's just likely to help users find what they're
looking for.
>
> Barry
> _______________________________________________
> Imap-protocol mailing list
> Imap-protocol u.washington.edu
> https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
>
-- Mark --
http://staff.washingt
on.edu/mrc
Science does not emerge from voting, party politics, or
public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  Finland |
2008-02-21 16:07:36 |
On Thu, 2008-02-21 at 13:56 -0800, Mark Crispin wrote:
> I agree. I feel that a server is allowed to implement
fuzzy string
> searching, with case-independence being the only
absolute requirement.
>
> More specifically, I never intended to forbid fuzzy
matching, and
> deliberately left it open-ended to allow
implementations to experiment
> with what worked best. [Google considered it good news
when I told them
> this was something in their server that I thought did
NOT need fixing!]
So, how is this related to what you said about substring
searches a year
ago?
http://mailman1.u.washington.e
du/pipermail/imap-protocol/2006-December/000328.html
I doubt Google (or anyone else implementing fuzzy matching)
supports
substring matching.
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  United States |
2008-02-21 16:14:08 |
On Fri, 22 Feb 2008, Timo Sirainen wrote:
> So, how is this related to what you said about
substring searches a year
> ago?
> http://mailman1.u.washington.e
du/pipermail/imap-protocol/2006-December/000328.html
That dealt with false *negatives* due to failure to do
substring matching.
I don't object to fuzzy matching that adds positives that a
non-fuzzy
search would not match.
But that is a good question. It deserves clarification in
the
specification. The principle should be "match what you
are required to
match, but if you have some fuzzy algorithms that produce
useful
additional matches, then go for it."
In spam filtering, we want to err on the side of false
negatives. But in
IMAP searches, we err on the side of false positives.
> I doubt Google (or anyone else implementing fuzzy
matching) supports
> substring matching.
They claimed that it works.
-- Mark --
http://staff.washingt
on.edu/mrc
Science does not emerge from voting, party politics, or
public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  Finland |
2008-02-21 18:35:14 |
On Thu, 2008-02-21 at 14:14 -0800, Mark Crispin wrote:
> On Fri, 22 Feb 2008, Timo Sirainen wrote:
> > So, how is this related to what you said about
substring searches a year
> > ago?
> > http://mailman1.u.washington.e
du/pipermail/imap-protocol/2006-December/000328.html
>
> That dealt with false *negatives* due to failure to do
substring matching.
>
> I don't object to fuzzy matching that adds positives
that a non-fuzzy
> search would not match.
>
> But that is a good question. It deserves clarification
in the
> specification. The principle should be "match
what you are required to
> match, but if you have some fuzzy algorithms that
produce useful
> additional matches, then go for it."
What do you think the fuzzy matching fields could be?
- SUBJECT, TEXT, BODY at least
- FROM, TO, CC, BCC real name fields, user domain
maybe?
- HEADER x y? HEADER message-id, in-reply-to, references
(and maybe
others?) probably a bad idea.
- SMALLER, LARGER probably not? (So server couldn't decide
that 1MB+1
wouldn't match with SMALLER 1048576)
- Date searches not(?)
- Keywords not
> > I doubt Google (or anyone else implementing fuzzy
matching) supports
> > substring matching.
>
> They claimed that it works.
Not at least in the current public implementation:
x search subject different
x* SEARCH 1
x OK SEARCH completed (Success)
search subject ifferent
* SEARCH
x OK SEARCH completed (Success)
x search body thanks
* SEARCH 1 5 8
x OK SEARCH completed (Success)
x search body hanks
* SEARCH
x OK SEARCH completed (Success)
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  United States |
2008-02-21 19:31:47 |
On Fri, 22 Feb 2008, Timo Sirainen wrote:
> - SUBJECT, TEXT, BODY at least
Yes.
> - FROM, TO, CC, BCC real name fields, user domain
maybe?
Yes. For the address list, I canonicalize the names into
RFC 2822
shortest form. I probably should always use phrase
route-addr form in
order to make "<user example.com>"
always match.
> - HEADER x y? HEADER message-id, in-reply-to,
references (and maybe
> others?) probably a bad idea.
You're probably right here, but I don't want to commit to
any definite
statement here since I haven't thoroughly considered all the
possibilities.
> - SMALLER, LARGER probably not? (So server couldn't
decide that 1MB+1
> wouldn't match with SMALLER 1048576)
> - Date searches not(?)
> - Keywords not
Probably not for all of these. The client can easily
broaden these if it
wanted a bit of fuzz.
>>> I doubt Google (or anyone else implementing
fuzzy matching) supports
>>> substring matching.
> Not at least in the current public implementation:
Oh well... I hope that they fix that. I think that they
would have a
difficult time arguing that a search for "tokyo"
should not match
"neotokyo"...
-- Mark --
http://staff.washingt
on.edu/mrc
Science does not emerge from voting, party politics, or
public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
| Re: SEARCH SUBJECT |
  Germany |
2008-02-22 06:26:04 |
Timo Sirainen writes:
> What do you think the fuzzy matching fields could be?
>
> ...
> - FROM, TO, CC, BCC real name fields, user domain
maybe?
Both fuzzy, inexact and exact matching is useful. Addresses
are
important in email, which means there are many different
useful things
one can do with them ;)
> - HEADER x y?
I like the idea of fuzzy matching on unstructured fields.
Not so keen on
fuzzily matching structured fields.
> ...
> - Date searches not(?)
Date searches are slightly fuzzy now. I'm not sure my code
handles
timezone the way the RFC says to.
Arnt
_______________________________________________
Imap-protocol mailing list
Imap-protocol u.washington.edu
https://mailman1.u.washington.edu/mailman/listin
fo/imap-protocol
|
|
[1-8]
|
|