|
List Info
Thread: Bug: Multipart posts with files named using UTF-8 characters
|
|
| Bug: Multipart posts with files named
using UTF-8 characters |

|
2006-10-19 13:36:03 |
On Thu, 2006-10-19 at 14:29 +0200, Tumidajewicz, Przemyslaw
wrote:
> Hello everyone,
>
> First post here, hope I'm doing it right ;)
>
> I've been having problems with sending multipart posts
containing files
> named using UTF-8 characters - all non-ASCII characters
are turned into
> question marks. I've tried to specify the charset when
creating the
> FilePart like this
>
> FilePart fp = new FilePart(name, file, null,
"UTF-8");
>
> as well as setting the charset later on like this
>
> fp.setCharSet("UTF-8");
>
> with no result. So I took a deeper look at the
HttpClient code (thank
> god for open source!) and found that the loss of
special characters
> happens in the FilePart.sendDispositionHeader method,
at line
>
> out.write(EncodingUtil.getAsciiBytes(filename));
>
> where the filename is forced to fit into the US-ASCII
charset.
>
Przemyslaw,
This behavior is in line with the requirements of the MIME
specification
as outlined in RFC 1521 and RFC 1522. The use of non-ASCII
characters in
MIME headers is not permitted. One is supposed to escape
non-ASCII
characters using BASE64 or Quoted-Printable encoding.
See this feature request for details
https://issues.apache.org/jira/browse/HTTPCLIENT-293
Oleg
> My workaround for this problem is to substitute the
above line with a
> charset-aware version:
>
> out.write(EncodingUtil.getBytes(filename,
getCharSet()));
>
> but I'm not sure if it's the correct way to do it.
>
> What I'm quite sure of at this point is that it works
for UTF-8 and
> results are consistent with what I get out of IE6 when
posting the same
> file using a form like this:
>
> <form action="http://localhost:1235&quo
t; method="POST"
> enctype="multipart/form-data"
accept-charset="UTF-8">
> <input type="file"
name="file"></input>
> <input type="submit"></input>
> </form>
>
> It's also parsed correctly by FileUpload 1.1.
>
> I've had a look at the HTTPClient 3.1-alpha1 source and
the problematic
> line in FilePart looks the same - which means that
either my fix is a
> heresy and/or there is a better way of doing this - or
that this bug has
> not been reported before (I failed to find anything on
this in the archive).
>
> Please let me know if this is the right way of fixing
this problem and
> if so, will this fix make it into HTTPClient 3.1
>
> Thanks and best regards!
> --Przemek
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
> For additional commands, e-mail:
httpclient-dev-help jakarta.apache.org
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help jakarta.apache.org
|
|
| Bug: Multipart posts with files named
using UTF-8 characters |

|
2006-10-19 13:45:23 |
Guys,
Look at RFC 2047 which updates RFC 1521. This method is
quite popular in
E-Mail traffic. Maybe real-world HTTP servers and clients
support it?
Odi
Oleg Kalnichevski wrote:
> On Thu, 2006-10-19 at 14:29 +0200, Tumidajewicz,
Przemyslaw wrote:
>> Hello everyone,
>>
>> First post here, hope I'm doing it right ;)
>>
>> I've been having problems with sending multipart
posts containing files
>> named using UTF-8 characters - all non-ASCII
characters are turned into
>> question marks. I've tried to specify the charset
when creating the
>> FilePart like this
>>
>> FilePart fp = new FilePart(name, file, null,
"UTF-8");
>>
>> as well as setting the charset later on like this
>>
>> fp.setCharSet("UTF-8");
>>
>> with no result. So I took a deeper look at the
HttpClient code (thank
>> god for open source!) and found that the loss of
special characters
>> happens in the FilePart.sendDispositionHeader
method, at line
>>
>> out.write(EncodingUtil.getAsciiBytes(filename));
>>
>> where the filename is forced to fit into the
US-ASCII charset.
>>
>
> Przemyslaw,
>
> This behavior is in line with the requirements of the
MIME specification
> as outlined in RFC 1521 and RFC 1522. The use of
non-ASCII characters in
> MIME headers is not permitted. One is supposed to
escape non-ASCII
> characters using BASE64 or Quoted-Printable encoding.
>
> See this feature request for details
>
>
https://issues.apache.org/jira/browse/HTTPCLIENT-293
>
> Oleg
>
>
>> My workaround for this problem is to substitute the
above line with a
>> charset-aware version:
>>
>> out.write(EncodingUtil.getBytes(filename,
getCharSet()));
>>
>> but I'm not sure if it's the correct way to do it.
>>
>> What I'm quite sure of at this point is that it
works for UTF-8 and
>> results are consistent with what I get out of IE6
when posting the same
>> file using a form like this:
>>
>> <form action="http://localhost:1235&quo
t; method="POST"
>> enctype="multipart/form-data"
accept-charset="UTF-8">
>> <input type="file"
name="file"></input>
>> <input type="submit"></input>
>> </form>
>>
>> It's also parsed correctly by FileUpload 1.1.
>>
>> I've had a look at the HTTPClient 3.1-alpha1 source
and the problematic
>> line in FilePart looks the same - which means that
either my fix is a
>> heresy and/or there is a better way of doing this -
or that this bug has
>> not been reported before (I failed to find anything
on this in the archive).
>>
>> Please let me know if this is the right way of
fixing this problem and
>> if so, will this fix make it into HTTPClient 3.1
>>
>> Thanks and best regards!
>> --Przemek
>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail:
httpclient-dev-unsubscribe jakarta.apache.org
>> For additional commands, e-mail:
httpclient-dev-help jakarta.apache.org
>>
>>
>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
> For additional commands, e-mail:
httpclient-dev-help jakarta.apache.org
>
--
[web] http://www.odi.ch/
[blog] http://www.odi.ch/weblog/
a>
[pgp] key 0x81CF3416
finger print F2B1 B21F F056 D53E 5D79 A5AF 02BE 70F5
81CF 3416
------------------------------------------------------------
---------
To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help jakarta.apache.org
|
|
| Bug: Multipart posts with files named
using UTF-8 characters |

|
2006-10-19 15:07:49 |
Hi Odi,
> Look at RFC 2047 which updates RFC 1521. This method is
quite popular in
> E-Mail traffic. Maybe real-world HTTP servers and
clients support it?
Maybe, but MIME encoding is not really our line of work. If
somebody
is willing to come up with a patch, I would suggest to
implement
something similar to the non-ASCII HTTP headers we already
have,
to be used at the application developer's risk.
http://jakarta.apache.org/commons/httpcl
ient/apidocs/org/apache/commons/httpclient/params/HttpMethod
Params.html#HTTP_ELEMENT_CHARSET
cheers,
Roland
------------------------------------------------------------
---------
To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help jakarta.apache.org
|
|
| Bug: Multipart posts with files named
using UTF-8 characters |

|
2006-10-19 16:51:28 |
I agree this is the way to go. We can add a mechanism to
change the
default encoding, but leave things as they are by default.
Mike
On 10/19/06, Roland Weber <http-async dubioso.net> wrote:
> Hi Odi,
>
> > Look at RFC 2047 which updates RFC 1521. This
method is quite popular in
> > E-Mail traffic. Maybe real-world HTTP servers and
clients support it?
>
> Maybe, but MIME encoding is not really our line of
work. If somebody
> is willing to come up with a patch, I would suggest to
implement
> something similar to the non-ASCII HTTP headers we
already have,
> to be used at the application developer's risk.
>
> http://jakarta.apache.org/commons/httpcl
ient/apidocs/org/apache/commons/httpclient/params/HttpMethod
Params.html#HTTP_ELEMENT_CHARSET
>
> cheers,
> Roland
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
> For additional commands, e-mail:
httpclient-dev-help jakarta.apache.org
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: httpclient-dev-unsubscribe jakarta.apache.org
For additional commands, e-mail: httpclient-dev-help jakarta.apache.org
|
|
[1-4]
|
|