List Info

Thread: decode tool doesn't decode filename of uploaded files




decode tool doesn't decode filename of uploaded files
user name
2007-05-01 12:56:18
Hello again,

It seems the decode tool doesn't decode the filename of
uploaded files.

Is it the intented behavior? If not, how to change it to
correctly
decode the filename, and thus convert it to unicode?

I guess we should add the decoding code in the function
decode_params
in file lib/encoding.py.

Thanks for your advice about this issue,

-- Nicolas Grilly

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "cherrypy-users" group.
To post to this group, send email to cherrypy-usersgooglegroups.com
To unsubscribe from this group, send email to
cherrypy-users-unsubscribegooglegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/cherrypy-users?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: decode tool doesn't decode filename of uploaded files
country flaguser name
United States
2007-05-01 13:19:56
Nicolas Grilly wrote:
> It seems the decode tool doesn't decode the filename of

> uploaded files.
> 
> Is it the intented behavior? If not, how to change it
to correctly
> decode the filename, and thus convert it to unicode?
> 
> I guess we should add the decoding code in the function
decode_params
> in file lib/encoding.py.

I wouldn't mind decoding the filename, as long as the
charset is either
1) unambiguously declared in the payload, or 2) explicitly
declared by
the developer.

AFAIK, declaring charset in the payload is already defined
for
multipart/form-data. http://www.ietf.o
rg/rfc/rfc2388.txt says:

   The original local file name may be supplied as well,
either as a
   "filename" parameter either of the
"content-disposition: form-data"
   header or, in the case of multiple files, in a
"content-disposition:
   file" header of the subpart. The sending application
MAY supply a
   file name; if the file name of the sender's operating
system is not
   in US-ASCII, the file name might be approximated, or
encoded using
   the method of RFC 2231.

and http://www.ietf.o
rg/rfc/rfc2231.txt says:

   Specifically, an asterisk at the end of a parameter name
acts as an
   indicator that character set and language information may
appear at
   the beginning of the parameter value. A single quote is
used to
   separate the character set, language, and actual value
information in
   the parameter value string, and an percent sign is used
to flag
   octets encoded in hexadecimal.  For example:

        Content-Type: application/x-stuff;
        
title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A

...so implementing that should be straightforward. However,
I'd be
surprised if your user-agent is doing this. Is it?

In the absence of explicit declaration in the payload, the
only hope is
for the developer to use or override the default of
US-ASCII. If you
just use the default, there's little point in adding this to
CP, since
unicode(val) uses the default encoding for Python, which
tends to be
ASCII anyway:

    >>> unicode('xF3')
    Traceback (most recent call last):
      File "<interactive input>", line 1, in
?
    UnicodeDecodeError: 'ascii' codec can't decode byte
0xf3
         in position 0: ordinal not in range(128)



Robert Brewer
System Architect
Amor Ministries
fumanchuamor.org

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "cherrypy-users" group.
To post to this group, send email to cherrypy-usersgooglegroups.com
To unsubscribe from this group, send email to
cherrypy-users-unsubscribegooglegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/cherrypy-users?hl=en
-~----------~----~----~----~------~----~------~--~---


Re: decode tool doesn't decode filename of uploaded files
user name
2007-05-01 18:34:46
Robert Brewer wrote:
> AFAIK, declaring charset in the payload is already
defined for
> multipart/form-data. http://www.ietf.o
rg/rfc/rfc2388.txt says:
> ...
> ...so implementing that should be straightforward.
However, I'd be
> surprised if your user-agent is doing this. Is it?

You're right: my user-agents don't respect the RFCs (I've
checked with
Firefox 2.0 and Internet Explorer 7.0).

I've looked at the data sent by the user-agents. As
expected, the
filename is given in the Content-Disposition header, but is
not
encoded according to the RFCs. Here is a sample:

-----------------------------7d71f41c450754
Content-Disposition: form-data; name="your_file";
filename="L'été est beau.pdf"
Content-Type: application/pdf

I did some tests and observed the filename encoding depends
on the
encoding declared in the HTML page. For example, if the page
is
encoded in ISO-8859-1, the filename is encoded in ISO-8859-1
too, and
if the page is encoded in UTF-8, the filename is encoded in
UTF-8 too.

Can you confirm this behavior with your own user-agents? If
most
user-agents behave like IE 7.0 and Firefox 2.0, we can
change the
decode tool to decode the filename, using the encoding
explicitly
specified by the developer when initializing the decode
tool.

-- Nicolas Grilly

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "cherrypy-users" group.
To post to this group, send email to cherrypy-usersgooglegroups.com
To unsubscribe from this group, send email to
cherrypy-users-unsubscribegooglegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/cherrypy-users?hl=en
-~----------~----~----~----~------~----~------~--~---


[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )