|
List Info
Thread: Chunked Tranfer encoding on request content.
|
|
| Chunked Tranfer encoding on request
content. |
  United States |
2007-03-04 17:28:26 |
The WSGI specification doesn't really say much about chunked
transfer encoding
for content sent within the body of a request. The only
thing that appears to
apply is the comment:
WSGI servers must handle any supported inbound
"hop-by-hop" headers on their
own, such as by decoding any inbound Transfer-Encoding,
including chunked
encoding if applicable.
What does this really mean in practice though?
As a means of getting feedback on what is the correct
approach I'll go through
how the CherryPy WSGI server handles it. The problem is that
the CherryPy
approach raises a few issues which makes me wander if it is
doing it in the
most appropriate way.
In CherryPy, when it sees that the Transfer-Encoding is set
to 'chunked' while
parsing the HTTP headers, it will at that point, even before
it has called
start_response for the WSGI application, read in all content
from the body of
the request.
CherryPy reads in the content like this for two reasons. The
first is so that
it can then determine the overall length of the content that
was available and
set the CONTENT_LENGTH value in the WSGI environ. The second
reason is so that
it can read in any additional HTTP header fields that may
occur in the trailer
after the last data chunk and also incorporate them into the
WSGI environ.
The first issue with what it does is that it has read in all
the content. This denies
a WSGI application the ability to stream content from the
body of a request and
process it a bit at a time. If the content is huge, that it
buffers it can also mean
the application process size will grow significantly.
The second issue, although I am confused on whether the
CherryPy WSGI server
actually implements this correctly, is that if the client
was expecting to see a
100 continue response, this will need to be sent back to the
client before any
content can be read. When chunked transfer encoding is not
used, such a 100
continue response would in a good WSGI server only be sent
when the WSGI
application called read() on wsgi.input for the first time.
Ie., the 100 continue
indicates that the application which is consuming the data
is actually ready to
start processing it. What CherryPy WSGI server is doing is
circumventing that and
the client could think the final consumer application is
ready before it actually is.
Note that I am assuming here that 100 continue is still
usable in conjunction
with chunked transfer encoding. In CherryPy WSGI server it
only actually sends
the 100 continue after it attempts to try and read content
in the presence of a
chunked transfer encoding header. Not sure if this is
actually a bug or not.
CherryPy WSGI server also doesn't wait until first read() by
WSGI application
before sending back the 100 continue either and instead
sends it as soon as the
headers are parsed. This may be fine, but possibly not most
optimal as it denies
an application the ability to fail a request and avoid a
client sending the
actual content.
Now, to my mind, the preferred approach would be that the
content would not
be read up front like this and instead CONTENT_LENGTH would
simply be unset
in the WSGI environ.
>From prior discussions related to input filtering on the
list, a WSGI
application shouldn't really be paying much attention to
CONTENT_LENGTH anyway
and should just be using read() to get data until it returns
an empty string.
Thus, for chunked data, that it doesn't know the content
length up front
shouldn't matter as it should just call read() until there
is no more. BTW, it may
not be this simple for something like a proxy, but that is a
discussion for another
time.
Doing this also means that the 100 continue only gets sent
when the application
is ready and there is no need to for the content to be
buffered up.
That it is the actual application which is consuming the
data and not some
intermediary means that an application could implement some
mechanism whereby
it reads some data, acts on that and starts sending some
data in response. The
client then might send more data based on that response
which the application
only then reads, send more data as response etc. Thus an end
to end
communication stream can be established where the actual
overall content length
of the request could never be established up front.
The only problem with deferring any reading of data to when
the application
wants to actually read it, is that if the overall length of
content in the request
is bounded, there is no way to get access to the additional
headers in the trailer
of the request and have them available in the WSGI environ
since processing of
the WSGI environ has already occurred before any data was
read.
So, what gives. What should a WSGI server do for chunked
transfer encoding on
a request?
I may not totally understand 100 continue and chunked
transfer encoding and
am happy to be correct in my understanding of them, but what
CherryPy WSGI
server does doesn't seem right to me at first look.
Graham
_______________________________________________
Web-SIG mailing list
Web-SIG python.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com
|
|
| Re: Chunked Tranfer encoding on request
content. |

|
2007-03-04 18:55:04 |
I'm not quite aware of the 100 Continue semantics, but I
know that
applications which request Transfer-Encoding: chunked should
*not*
expect a Content-Length response header, nor should the WSGI
thingie
doing the 'chunking' need to know it in advance.
'chunked' is actually very simple. Simplifying it a lot, it
basically
needs to output '%xrn%srn' % (len(chunk), chunk) for
every chunk
of data except the last which should be '0rnrn'. The
only trick
here is ensuring that no chunk of length '0' is written
except the
last.
What might be happening is that CherryPy is outputting the
whole
response body as a single chunk, and relying on the
'Content-Length'
header, which would be silly, I hope that's not what's
happening
though I haven't looked.
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
_______________________________________________
Web-SIG mailing list
Web-SIG python.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com
|
|
| Re: Chunked Tranfer encoding on request
content. |
  United States |
2007-03-04 20:02:25 |
|
|
Graham Dumpleton wrote:
> In CherryPy, when it sees that the Transfer-Encoding
> is set to 'chunked' while parsing the HTTP headers,
> it will at that point, even before it has called
> start_response for the WSGI application, read in all
> content from the body of the request.
>
> CherryPy reads in the content like this for two reasons.
> The first is so that it can then determine the overall
> length of the content that was available and set the
> CONTENT_LENGTH value in the WSGI environ.
Right; IIRC the rfile just hangs if you try to read
past Content-Length. Perhaps that can be fixed inside
socket.makefile somewhere?
> The second reason is so that it can read in any
> additional HTTP header fields that may occur in
> the trailer after the last data chunk and also
> incorporate them into the WSGI environ.
Yeah; I didn't see any other way to get Trailers into
the environ. Perhaps that can be added to WSGI 2.0?
I also just haven't had time to write a dechunker
which worked on the fly. Patches welcome ;)
> When chunked transfer encoding is not used, such a
> 100 continue response would in a good WSGI server
> only be sent when the WSGI application called read()
> on wsgi.input for the first time.
Sounds reasonable. Again, patches welcome ;)
> Note that I am assuming here that 100 continue is
> still usable in conjunction with chunked transfer
> encoding. In CherryPy WSGI server it only actually
> sends the 100 continue after it attempts to try
> and read content in the presence of a chunked
> transfer encoding header. Not sure if this is
> actually a bug or not.
It looks like a bug. The Expect header should be
checked before decode_chunked (at least until the
100 response can be moved inside read()).
Thanks for catching those!
Robert Brewer
System Architect
Amor Ministries
fumanchu amor.org
|
| Re: Chunked Tranfer encoding on request
content. |
  United States |
2007-09-04 21:04:15 |
Are you actually seeing chunked request bodies in the wild?
If so,
from what UAs?
IME they're not very common, because of lack of support in
most
servers, and some interop issues with proxies (IIRC).
Cheers,
On 05/03/2007, at 10:28 AM, Graham Dumpleton wrote:
> The WSGI specification doesn't really say much about
chunked
> transfer encoding
> for content sent within the body of a request. The only
thing that
> appears to
> apply is the comment:
>
> WSGI servers must handle any supported inbound
"hop-by-hop"
> headers on their
> own, such as by decoding any inbound
Transfer-Encoding, including
> chunked
> encoding if applicable.
>
> What does this really mean in practice though?
>
> As a means of getting feedback on what is the correct
approach I'll
> go through
> how the CherryPy WSGI server handles it. The problem is
that the
> CherryPy
> approach raises a few issues which makes me wander if
it is doing
> it in the
> most appropriate way.
>
> In CherryPy, when it sees that the Transfer-Encoding is
set to
> 'chunked' while
> parsing the HTTP headers, it will at that point, even
before it has
> called
> start_response for the WSGI application, read in all
content from
> the body of
> the request.
>
> CherryPy reads in the content like this for two
reasons. The first
> is so that
> it can then determine the overall length of the content
that was
> available and
> set the CONTENT_LENGTH value in the WSGI environ. The
second reason
> is so that
> it can read in any additional HTTP header fields that
may occur in
> the trailer
> after the last data chunk and also incorporate them
into the WSGI
> environ.
>
> The first issue with what it does is that it has read
in all the
> content. This denies
> a WSGI application the ability to stream content from
the body of a
> request and
> process it a bit at a time. If the content is huge,
that it buffers
> it can also mean
> the application process size will grow significantly.
>
> The second issue, although I am confused on whether the
CherryPy
> WSGI server
> actually implements this correctly, is that if the
client was
> expecting to see a
> 100 continue response, this will need to be sent back
to the client
> before any
> content can be read. When chunked transfer encoding is
not used,
> such a 100
> continue response would in a good WSGI server only be
sent when the
> WSGI
> application called read() on wsgi.input for the first
time. Ie.,
> the 100 continue
> indicates that the application which is consuming the
data is
> actually ready to
> start processing it. What CherryPy WSGI server is doing
is
> circumventing that and
> the client could think the final consumer application
is ready
> before it actually is.
>
> Note that I am assuming here that 100 continue is still
usable in
> conjunction
> with chunked transfer encoding. In CherryPy WSGI server
it only
> actually sends
> the 100 continue after it attempts to try and read
content in the
> presence of a
> chunked transfer encoding header. Not sure if this is
actually a
> bug or not.
>
> CherryPy WSGI server also doesn't wait until first
read() by WSGI
> application
> before sending back the 100 continue either and instead
sends it as
> soon as the
> headers are parsed. This may be fine, but possibly not
most optimal
> as it denies
> an application the ability to fail a request and avoid
a client
> sending the
> actual content.
>
> Now, to my mind, the preferred approach would be that
the content
> would not
> be read up front like this and instead CONTENT_LENGTH
would simply
> be unset
> in the WSGI environ.
>
>> From prior discussions related to input filtering
on the list, a WSGI
> application shouldn't really be paying much attention
to
> CONTENT_LENGTH anyway
> and should just be using read() to get data until it
returns an
> empty string.
> Thus, for chunked data, that it doesn't know the
content length up
> front
> shouldn't matter as it should just call read() until
there is no
> more. BTW, it may
> not be this simple for something like a proxy, but that
is a
> discussion for another
> time.
>
> Doing this also means that the 100 continue only gets
sent when the
> application
> is ready and there is no need to for the content to be
buffered up.
>
> That it is the actual application which is consuming
the data and
> not some
> intermediary means that an application could implement
some
> mechanism whereby
> it reads some data, acts on that and starts sending
some data in
> response. The
> client then might send more data based on that response
which the
> application
> only then reads, send more data as response etc. Thus
an end to end
> communication stream can be established where the
actual overall
> content length
> of the request could never be established up front.
>
> The only problem with deferring any reading of data to
when the
> application
> wants to actually read it, is that if the overall
length of content
> in the request
> is bounded, there is no way to get access to the
additional headers
> in the trailer
> of the request and have them available in the WSGI
environ since
> processing of
> the WSGI environ has already occurred before any data
was read.
>
> So, what gives. What should a WSGI server do for
chunked transfer
> encoding on
> a request?
>
> I may not totally understand 100 continue and chunked
transfer
> encoding and
> am happy to be correct in my understanding of them, but
what
> CherryPy WSGI
> server does doesn't seem right to me at first look.
>
> Graham
> _______________________________________________
> Web-SIG mailing list
> Web-SIG python.org
> Web SIG: http://www.python.
org/sigs/web-sig
> Unsubscribe:
http://mail.python.org/mailman/options/web-sig/mnot%
> 40mnot.net
--
Mark Nottingham http://www.mnot.net/
_______________________________________________
Web-SIG mailing list
Web-SIG python.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com
|
|
| Re: Chunked Tranfer encoding on request
content. |

|
2007-09-05 06:55:14 |
On 05/09/07, Mark Nottingham <mnot mnot.net> wrote:
> Are you actually seeing chunked request bodies in the
wild? If so,
> from what UAs?
>
> IME they're not very common, because of lack of support
in most
> servers, and some interop issues with proxies (IIRC).
It has come up as an issue on mod_python list a couple of
times. Agree
though that it isn't common. From memory the people were
using custom
user agents designed for a special purpose.
Just because it isn't common doesn't mean that an attempt
shouldn't be
made to support it, especially if it is part of the HTTP
standard.
Also, the same solution for handling this would also be
applicable in
cases where mutating input filters are used which change the
length of
the request content but are unable to update the content
length
header. Thus, like with chunked encoding, a way is needed in
this
circumstance to indicate that there is content, but the
length isn't
known.
Graham
> On 05/03/2007, at 10:28 AM, Graham Dumpleton wrote:
>
> > The WSGI specification doesn't really say much
about chunked
> > transfer encoding
> > for content sent within the body of a request. The
only thing that
> > appears to
> > apply is the comment:
> >
> > WSGI servers must handle any supported inbound
"hop-by-hop"
> > headers on their
> > own, such as by decoding any inbound
Transfer-Encoding, including
> > chunked
> > encoding if applicable.
> >
> > What does this really mean in practice though?
> >
> > As a means of getting feedback on what is the
correct approach I'll
> > go through
> > how the CherryPy WSGI server handles it. The
problem is that the
> > CherryPy
> > approach raises a few issues which makes me wander
if it is doing
> > it in the
> > most appropriate way.
> >
> > In CherryPy, when it sees that the
Transfer-Encoding is set to
> > 'chunked' while
> > parsing the HTTP headers, it will at that point,
even before it has
> > called
> > start_response for the WSGI application, read in
all content from
> > the body of
> > the request.
> >
> > CherryPy reads in the content like this for two
reasons. The first
> > is so that
> > it can then determine the overall length of the
content that was
> > available and
> > set the CONTENT_LENGTH value in the WSGI environ.
The second reason
> > is so that
> > it can read in any additional HTTP header fields
that may occur in
> > the trailer
> > after the last data chunk and also incorporate
them into the WSGI
> > environ.
> >
> > The first issue with what it does is that it has
read in all the
> > content. This denies
> > a WSGI application the ability to stream content
from the body of a
> > request and
> > process it a bit at a time. If the content is
huge, that it buffers
> > it can also mean
> > the application process size will grow
significantly.
> >
> > The second issue, although I am confused on
whether the CherryPy
> > WSGI server
> > actually implements this correctly, is that if the
client was
> > expecting to see a
> > 100 continue response, this will need to be sent
back to the client
> > before any
> > content can be read. When chunked transfer
encoding is not used,
> > such a 100
> > continue response would in a good WSGI server only
be sent when the
> > WSGI
> > application called read() on wsgi.input for the
first time. Ie.,
> > the 100 continue
> > indicates that the application which is consuming
the data is
> > actually ready to
> > start processing it. What CherryPy WSGI server is
doing is
> > circumventing that and
> > the client could think the final consumer
application is ready
> > before it actually is.
> >
> > Note that I am assuming here that 100 continue is
still usable in
> > conjunction
> > with chunked transfer encoding. In CherryPy WSGI
server it only
> > actually sends
> > the 100 continue after it attempts to try and read
content in the
> > presence of a
> > chunked transfer encoding header. Not sure if this
is actually a
> > bug or not.
> >
> > CherryPy WSGI server also doesn't wait until first
read() by WSGI
> > application
> > before sending back the 100 continue either and
instead sends it as
> > soon as the
> > headers are parsed. This may be fine, but possibly
not most optimal
> > as it denies
> > an application the ability to fail a request and
avoid a client
> > sending the
> > actual content.
> >
> > Now, to my mind, the preferred approach would be
that the content
> > would not
> > be read up front like this and instead
CONTENT_LENGTH would simply
> > be unset
> > in the WSGI environ.
> >
> >> From prior discussions related to input
filtering on the list, a WSGI
> > application shouldn't really be paying much
attention to
> > CONTENT_LENGTH anyway
> > and should just be using read() to get data until
it returns an
> > empty string.
> > Thus, for chunked data, that it doesn't know the
content length up
> > front
> > shouldn't matter as it should just call read()
until there is no
> > more. BTW, it may
> > not be this simple for something like a proxy, but
that is a
> > discussion for another
> > time.
> >
> > Doing this also means that the 100 continue only
gets sent when the
> > application
> > is ready and there is no need to for the content
to be buffered up.
> >
> > That it is the actual application which is
consuming the data and
> > not some
> > intermediary means that an application could
implement some
> > mechanism whereby
> > it reads some data, acts on that and starts
sending some data in
> > response. The
> > client then might send more data based on that
response which the
> > application
> > only then reads, send more data as response etc.
Thus an end to end
> > communication stream can be established where the
actual overall
> > content length
> > of the request could never be established up
front.
> >
> > The only problem with deferring any reading of
data to when the
> > application
> > wants to actually read it, is that if the overall
length of content
> > in the request
> > is bounded, there is no way to get access to the
additional headers
> > in the trailer
> > of the request and have them available in the WSGI
environ since
> > processing of
> > the WSGI environ has already occurred before any
data was read.
> >
> > So, what gives. What should a WSGI server do for
chunked transfer
> > encoding on
> > a request?
> >
> > I may not totally understand 100 continue and
chunked transfer
> > encoding and
> > am happy to be correct in my understanding of
them, but what
> > CherryPy WSGI
> > server does doesn't seem right to me at first
look.
> >
> > Graham
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG python.org
> > Web SIG: http://www.python.
org/sigs/web-sig
> > Unsubscribe:
http://mail.python.org/mailman/options/web-sig/mnot%
> > 40mnot.net
>
>
> --
> Mark Nottingham http://www.mnot.net/
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG python.org
> Web SIG: http://www.python.
org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/option
s/web-sig/graham.dumpleton%40gmail.com
>
_______________________________________________
Web-SIG mailing list
Web-SIG python.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com
|
|
[1-5]
|
|