List Info

Thread: Re: CherryPy WSGI server and wsgi.input.read() with no argument.




Re: CherryPy WSGI server and wsgi.input.read() with no argument.
user name
2007-03-29 17:09:49
Have cc'd this other to the web-sig list in case anyone
wants to shoot
me down. 

On 30/03/07, Robert Brewer <fumanchuamor.org> wrote:
> > Robert, was doing some testing with CherryPy WSGI
server and noted
> > that if read() is called with no arguments on
wsgi.input that it just
> > seems to hang indefinitely. Is there a problem
here or have I managed
> > to stuff up very simple test. It works okay when I
explicitly specific
> > content length.
>
> That's right. We simply hand the (blocking, makefiled)
socket to the app
> as wsgi.input. PEP 333 says:
>
>     "The server is not required to read past the
client's
>     specified Content-Length, and is allowed to
simulate
>     an end-of-file condition if the application
attempts
>     to read past that point. The application should
not
>     attempt to read more data than is specified by the
>     CONTENT_LENGTH variable."
>
> We chose to not simulate the EOF, requiring app authors
do that for
> themselves (mostly to give apps more flexibility). Note
that the app
> side of CherryPy handles this for you by default. But
since the spec
> clearly places the responsibility or checking
content-length on the
> application side, it seemed redundant to perform the
check both on the
> app side and the server side.

As I believe I have pointed out on the Python web-sig list
before, the
statement:

""The application should not attempt to read more
data than is
specified by the CONTENT_LENGTH variable."""

is actually a bit bogus.

This is because a WSGI middleware component or web server
could be
acting as an input filter and decompressing a content
encoding of gzip
for request. Since it knows the size will change but will
not know
what the new size would be, except by buffering it all, it
by rights
should remove CONTENT_LENGTH. This presents a problem for
an
application as no CONTENT_LENGTH then to rely on to know
whether it
has read to much input. If you leave CONTENT_LENGTH intact,
it may
think it has read everything when there is in fact more.

Also, with chunked transfer encoding you will not have
CONTENT_LENGTH
either. I know you read it all in and buffer it so you can
calculate
it, but that prevents streaming with chunked encoding where
content
length may be based on a series of end to communications.

Thus, an application should really be just ignoring
CONTENT_LENGTH and
just successively calling read() in some way until it
returns an empty
string. It can't really work reliably in any other way. I
believe that
the WSGI adapter should be required (not just allowed) to
simulate EOF
if it believes that no more input is available for that
request. For
example, it knows at low level that CONTENT_LENGTH was valid
because
no filtering by that point, or that in chunked encoding that
null
block has been sent. The adapter is the only place it will
generally
know that this is the case.

The only time that CONTENT_LENGTH may be of interest to an
application
is if it is acting as a proxy to downstream web server as
then it
needs to put it in downstream request. If no CONTENT_LENGTH
or chunked
transfer encoding it would be forced to use chunked encoding
for
downstream request.

FWIW, what I have come to the conclusion of is that read()
with no
arguments is used then rather than say attempt to read all
input in in
one go based on some content length, is that underneath the
adapter
should insert its own size argument transparently. This size
would be
based on some block size deemed to perhaps give best
performance based
on technology being used. Thus read() with no arguments
would always
return potentially partial data and not all data.

This is valid because semantics of read() for a file like
object is
that one should call it until it returns an empty string as
EOF
indicator. WSGI PEP is ambiguous in that respect as it says
it is a
file like object but then says you aren't supposed to read
more than
CONTENT_LENGTH and that an adapter doesn't have to simulate
to EOF.
One may say that this overrides file like object properties,
but the
WSGI way will not work all the time.

Graham
_______________________________________________
Web-SIG mailing list
Web-SIGpython.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com

Re: CherryPy WSGI server and wsgi.input.read() with no argument.
country flaguser name
United States
2007-03-29 17:52:41
On Mar 29, 2007, at 6:09 PM, Graham Dumpleton wrote:
> On 30/03/07, Robert Brewer <fumanchuamor.org> wrote:
>
>> We chose to not simulate the EOF, requiring app
authors do that for
>> themselves

CherryPy's deveopers are correct: they are following the
WSGI spec.  
It is your app that is broken.

> As I believe I have pointed out on the Python web-sig
list before, the
> statement:
>
> ""The application should not attempt to read
more data than is
> specified by the CONTENT_LENGTH
variable."""
>
> is actually a bit bogus.

This requirement comes from CGI. CGI scripts cannot support
unknown  
data lengths (yes, this means no chunked transfer).
CONTENT_LENGTH is  
required to be provided if there is data, and the server is
not  
required to provide an EOF after reading CONTENT_LENGTH
bytes. WSGI  
inherits the same restrictions.

I do agree with you that this was a mistake. WSGI should
require WSGI  
servers/gateway to provide an EOF for read(), always, and
should make  
a break from CGI and declare that CONTENT_LENGTH=0 means no
data and  
CONTENT_LENGTH empty/missing means undefined length. This is
 
something which ought to be fixed for the next revision of
WSGI. This  
makes it a tiny bit harder to write a CGI gateway, of
course, but  
it's worth it in my opinion, for the reasons you describe.

HOWEVER, given that the current WSGI spec does not specify
that, apps  
*cannot* depend upon that behavior. If your app does an
unbounded read 
(), it's wrong. And, by reference to the CGI spec, if a
server omits  
CONTENT_LENGTH, and there is data, it is wrong. The server
ought to  
return a 411 Length Required if you attempt to access a WSGI
app and  
provide chunked data.

And, indeed, server code I wrote is wrong in just this way:
it can  
omit CONTENT_LENGTH when given chunked data on input.
Spec-compliant  
WSGI apps would then assume there's no input data which will
then  
cause data loss. Luckily nobody ever passes chunked data on
input. 

James

PS: what about the readline(size) problem? Are we just going
to  
continue indefinitely pretending that it's okay that the
spec forbids  
using readline(size) and that cgi.FieldStorage calls it?
Perhaps a  
WSGI 1.1 fixing these issues would be a good idea? 
_______________________________________________
Web-SIG mailing list
Web-SIGpython.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com

Re: CherryPy WSGI server and wsgi.input.read() with no argument.
country flaguser name
United States
2007-03-29 19:19:44
Graham Dumpleton wrote:
> ""The application should not attempt to read
more data than is
> specified by the CONTENT_LENGTH
variable."""
> 
> is actually a bit bogus.
> 
> This is because a WSGI middleware component or web
server could be
> acting as an input filter and decompressing a content
encoding of gzip
> for request. Since it knows the size will change but
will not know
> what the new size would be, except by buffering it all,
it by rights
> should remove CONTENT_LENGTH. This presents a problem
for an
> application as no CONTENT_LENGTH then to rely on to
know whether it
> has read to much input. If you leave CONTENT_LENGTH
intact, it may
> think it has read everything when there is in fact
more.

I thought leaving it out might be a good way to indicate 
content-length-unknown, but now I'm not so sure.  I think a
better 
indication is "-1", which works with
cgi.FieldStorage and lots of other 
code, and generally .read(-1) means "give me everything
you have".


-- 
Ian Bicking | ianbcolorstudy.com | http://blog.ianbicking.org

             | Write code, do good | http://topp.openpla
ns.org/careers
_______________________________________________
Web-SIG mailing list
Web-SIGpython.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com

Re: CherryPy WSGI server and wsgi.input.read() with no argument.
country flaguser name
United States
2007-03-29 19:30:37
At 06:52 PM 3/29/2007 -0400, James Y Knight wrote:
>Perhaps a WSGI 1.1 fixing these issues would be a good
idea?

I would personally rather see a WSGI 2.0 that also gets rid
of 
start_response(), write(), and perhaps adds better async
support.

I suspect that the current approach to using yield
boundaries to indicate 
buffer flushing should be replaced with yielding an explicit
flush request 
object.  WSGI beginners seem to think that write() and yield
are like 
"print" in CGI, and thus end up writing code that
performs crappily on 
compliant servers.  In retrospect, the "server
push" use case is much less 
common and it's reasonable to have to do something explicit
to support 
it.  Middleware would also be happier if it could tell when
the application 
really wanted to flush the output.

Combining this with some way to yield "pauses" to
better support async 
servers would be ideal.  It would also be nice if you could
cleanly adapt 
WSGI 1.0 to 2.0 and vice versa, as long as you're using a
reasonable subset 
(i.e. a subset that doesn't care about some of the quirks we
need to fix).

_______________________________________________
Web-SIG mailing list
Web-SIGpython.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )