List Info

Thread: Re: email.header.decode_header eats my spaces




Re: email.header.decode_header eats my spaces
country flaguser name
United States
2007-03-28 23:24:42
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mar 28, 2007, at 8:13 PM, Tokio Kikuchi wrote:

> Well, it looks to me that RFC2047 prohibits this at
least in header  
> text.  An example for comment text in section 8
states:
>
>    (=?ISO-8859-1?Q?a?= b)                      (a b)
>
>            Within a 'comment', white space MUST appear
between an
>            'encoded-word' and surrounding text. 
[Section 5,
>            paragraph (2)].  However, white space is not
needed between
>            the initial "(" that begins the
'comment', and the
>            'encoded-word'.
>
> The word MUST means there is no way omitting spaces
between encoded- 
> word and surrounding ascii text.  The '(' before the
encoded-word  
> appears to violate this but it is a higher syntax
token.
>
> Current email.header violate this example because we
have no class  
> which recognizes comment in a structured header.

Thanks Tokio, I agree with all of this.  I think you're
right in  
identifying that the problem here is that we don't really
have any  
way to understand the semantics of the a particular header's
body.

> This current behavior is correct if '(' is in a *text
field and the  
> example is not appropriate.  The problem in
email.header module is  
> it can not distiguish between the structured and
unstructured (text  
> only) headers.  The Header class may have a member
function like  
> 'add_comment', IMHO.

I think we might want to try to address this in a more
general and  
extensible way, so that we can support future semantically
meaningful  
headers.

>>  >>> h = Header()
>>  >>> h.append('hello', 'us-ascii')
>>  >>> h.append('world', 'us-ascii')
>>  >>> print h
>> hello world
>>  >>> print unicode(h)
>> helloworld
>> I think we're nearly correct here.  The unicode
version is what  
>> I'd expect, but the string version is not.  I think
in both cases  
>> we should print 'helloworld'.
>
> No.  email.header module is not a word processor. 
Because RFC2047  
> is dealing with 'word's, we should treat these parts as
'word's for  
> consitency.  unicode() function should be fixed.  If
these words  
> are to be concatnated without a space, it should be
done outside  
> header module.

Right, but these parts aren't being encoded, and yet we've
still  
stuck a space between the parts that didn't exist there
before.  I'd  
feel better about it if we encoded these chunks too.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRgs/k3EjvBPtnXfVAQLv3gQAl3598ge8qge7epkdqqjBq4F+4783
74z6
DuvfcBWeBGNZ/b4PEesPbtOwUKprz9mp988N1aoiMWiBa3p5OMQvhIl6q0w1
d7Tj
Gm2aCxrXa2JRfkFsj+VygDalK8aYT0XcDxh+56vCjfwhTvKHz1MmkAEwWLbJ
6Cp/
GxGfW4l6a6g=
=7akO
-----END PGP SIGNATURE-----
_______________________________________________
Email-SIG mailing list
Email-SIGpython.org
Your options: http://mail.python.org/mailman/options/em
ail-sig/nessto%40sharedlog.com

[1]

about | contact  Other archives ( Real Estate discussion Medical topics )