|
List Info
Thread: Re: Proposal: Avoiding Serialization When Stacking Middleware
|
|
| Re: Proposal: Avoiding Serialization
When Stacking Middleware |
  United States |
2007-03-06 21:43:43 |
Phillip J. Eby wrote:
> At 08:08 PM 3/6/2007 -0600, Ian Bicking wrote:
>> Posted here: http://wsgi.org/wsgi/Specifications/avoiding_serializa
tion
>>
>> Text copied below for discussion:
>>
>>
>> :Title: Avoiding Serialization When Stacking
Middleware
>> :Author: Ian Bicking <ianb colorstudy.com>
>> iscussio
ns-To: Python Web-SIG <web-sig python.org>
>> :Status: Proposed
>> :Created: 06-03-2007
>>
>> .. contents::
>>
>> Abstract
>> --------
>>
>> This proposal gives a strategy for avoiding
unnecessary serialization
>> and deserialization of request and response bodies.
It does so by
>> attaching attributes to ``wsgi.input`` and the
``app_iter``, as well as
>> a new environment key
``x-wsgiorg.want_parsed_response``.
>>
>> Rationale
>> ---------
>>
>> Output-transforming middleware often has to parse
the upstream content,
>> transform it, then serialize it back to a string
for output. The
>> original output may have already been in the parsed
form that the
>> middleware wanted. Or there may be more middleware
that does similar
>> transformations on the same kind of objects.
>
> HTTP already includes a mechanism for specifying what
types are accepted
> by a content consumer: the "Accept" header.
You can always add other
> values to it to indicate the parsed values you can
accept.
>
> Of course, this doesn't really work well with WSGI -
you want the result
> to actually *be* WSGI... so you can use the WSGI way
of doing this,
> which is to have a standard wrapper for the specific
content type you
> want to use.
Yeah, using Accept is clever, but not really accurate, since
if you
serialize the WSGI request to HTTP the addition no longer
makes sense.
> The wrapper (as with the wsgi "file wrapper")
simply puts a WSGI face on
> a non-WSGI result body, converting it to an iterator of
strings, and
> holding other attributes known to the middleware or
other application
> object.
That just calls for a series of ad hoc techniques,
basically, where each
object type results in a new key in the environment and a
new ad hoc
specification to be made (e.g., wsgi.file_wrapper takes a
block size,
which is specific only to that case).
> This could be implemented as an environ key containing
a mapping from
> types to wrapper functions. Middleware that wants a
type just copies
> the mapping and overwrites any entries it cares about.
Applications
> that want to return a non-serialized result just look
up the type (using
> __mro__ order) to find an applicable wrapper.
OK, the dict would avoid multiple different kinds of keys,
and
presumably they'd all have the same signature. Block size
doesn't
really make any sense to me as a common parameter. Content
type should
be a common parameter, as something like an lxml object can
be
serialized as either XML or HTML. I don't think any
response headers
are likely to effect the serialization... though with my
specification
that remains an application concern, so it doesn't have to
be resolved
in the specification.
I hadn't really thought about MRO, though generally I don't
trust
inheritance to be meaningful anyway -- I feel like I'd be
more likely to
a switch on the type than test inheritance.
> Notice that this approach doesn't require any special
protocol for these
> wrappers -- just WSGI. It's simpler to specify, and
simpler to
> implement than what you propose, while addressing some
of the open issues.
The specification isn't particularly long or complicated,
IMHO. The
implementation is complicated mostly for reasons unrelated
to the
specification -- any output-transforming middleware will be
similarly
complicated.
> Yes, it does have some problems with interface vs.
implementation. ISTM
> that trying to solve that problem is effectively asking
to revive or
> reinvent PEP 246, however. But we could explicitly
allow the use of
> type names instead of the actual types.
When playing with implementation I used type names, and
actually I
rather prefer them, but it's not always clear what name to
use. For
instance, "lxml", "lxml.etree",
"lxml.etree.Element", and
"lxml.etree._Element" all are reasonable names.
Or "ElementTree",
"ElementTree.Element",
"ElementTree._Element", "xml.etree",
"xml.etree.Element", and
"xml.etree._Element". Or even something like
"IElement" could make sense in some context (e.g.,
what if you can
accept the overlapping interfaces of both lxml and
ElementTree?)
At least the actual type object seems easy enough. OTOH,
there are
actually cases when I'd like to say that I could accept a
certain type
without having to import the type. E.g., if I wanted to do
an XSLT
transformation, I *could* support several kinds of objects
without
requiring any of them (e.g., lxml, 4DOM, and Genshi
Markup).
>> The same things apply to the parsing of
``wsgi.input``, specifically
>> parsing form data. A similar strategy is presented
to avoid
>> unnecessarily reparsing that data.
>
> I would rather offer an optional 'get_file_storage()'
method or some
> such as a blessed WSGI extension, than have such an
open-ended "get
> whatever you want from the input object" concept
floating around. A
> strategy which reinvents half of PEP 246 (the *old* PEP
246, before it
> became almost as complicated as WSGI) seems like
overkill to me.
I don't really understand what you are proposing. This part
addresses
the same issues as presented in
http://wsgi.org/wsgi/Specifications/handling_post_forms
a>
I really don't *want* to write every wsgi.input to a
temporary file just
because someone else *might* want to reparse the input. I'd
much rather
do it lazily, as 99% of the time reparsing won't happen.
>> Obviously the code is not simple, but this is the
nature of WSGI
>> output-transforming middleware.
>
> Something I'd like to fix in WSGI 2.0, by getting rid
of both
> "start_response" and "write", but
that's a discussion for another time.
Yeah, that'd be nice, but another discussion for another
time.
>> Other Possibilities
>> -------------------
>>
>> * You could simply parse everything ever time.
>> * You could pass data through callbacks in the
environment (but this can
>> break non-aware middleware).
>> * You can make custom methods and keys for each
case.
>> * You can use something other than WSGI.
>
> And you can use the established WSGI method for adding
semantics to a
> response, using a middleware-supplied wrapper. I think
this is actually
> the best alternative.
I really don't understand the advantage.
> In truth, it could be as simple as using the class's
fully-qualified
> name as an environ key (perhaps with a prefix or
suffix), with the value
> being a wrapper for objects implementing that protocol.
No
> x-foobar-wsgiorg-whatchamacallit cruft needed.
>
> And, it's lightweight enough of a concept to be
expressed as a simple
> "best practice" design pattern.
Best practice is fine, though of course still needs to be
documented, as
this is hardly a practice that people would naturally think
about or
implement. But I don't really think that practice would be
any simpler
or easier to describe if done completely. In fact, I think
it would
take exactly the same amount of space to describe.
--
Ian Bicking | ianb colorstudy.com | http://blog.ianbicking.org
_______________________________________________
Web-SIG mailing list
Web-SIG python.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com
|
|
| Re: Proposal: Avoiding Serialization
When Stacking Middleware |
  United States |
2007-03-06 22:51:39 |
At 09:43 PM 3/6/2007 -0600, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>The wrapper (as with the wsgi "file
wrapper") simply puts a WSGI face on
>>a non-WSGI result body, converting it to an iterator
of strings, and
>>holding other attributes known to the middleware or
other application object.
>
>That just calls for a series of ad hoc techniques,
As is appropriate for a "series of tubes".
> basically, where each object type results in a new key
in the
> environment and a new ad hoc specification to be made
(e.g.,
> wsgi.file_wrapper takes a block size, which is specific
only to that case).
Right. I'm specifically saying that a collection of
individual
specifications is much *better* than a single overarching
specification
generalized from a single example. Single use cases make
bad general specs.
>OK, the dict would avoid multiple different kinds of
keys, and presumably
>they'd all have the same signature. Block size doesn't
really make any
>sense to me as a common parameter. Content type should
be a common
>parameter, as something like an lxml object can be
serialized as either
>XML or HTML. I don't think any response headers are
likely to effect the
>serialization... though with my specification that
remains an application
>concern, so it doesn't have to be resolved in the
specification.
Please don't keep trying to generalize this. They're called
"specific-ations", not
"general-izations".
>>Notice that this approach doesn't require any
special protocol for these
>>wrappers -- just WSGI. It's simpler to specify, and
simpler to implement
>>than what you propose, while addressing some of the
open issues.
>
>The specification isn't particularly long or
complicated, IMHO.
That's because it doesn't address any of the real issues --
they're all
deferred to your "open issues" section. That's
why I don't think having
the specification adds any value over highlighting the
existing WSGI
pattern for extending the response (i.e. server-supplied
iterator-wrappers).
>When playing with implementation I used type names, and
actually I rather
>prefer them, but it's not always clear what name to use.
For instance,
>"lxml", "lxml.etree",
"lxml.etree.Element", and
"lxml.etree._Element" all
>are reasonable names. Or "ElementTree",
"ElementTree.Element",
>"ElementTree._Element", "xml.etree",
"xml.etree.Element", and
>"xml.etree._Element". Or even something like
"IElement" could make sense
>in some context (e.g., what if you can accept the
overlapping interfaces
>of both lxml and ElementTree?)
>
>At least the actual type object seems easy enough.
OTOH, there are
>actually cases when I'd like to say that I could accept
a certain type
>without having to import the type. E.g., if I wanted to
do an XSLT
>transformation, I *could* support several kinds of
objects without
>requiring any of them (e.g., lxml, 4DOM, and Genshi
Markup).
These problems all stem from premature generalization. It's
a trivial
problem to fix, however, if you are trying to share one
particular content
type: just pick a key and use it!
Libraries such as wsgiref can support this pattern by
providing a utility
like "wrap_content(environ, content, default_wrapper,
*keys)" function that
looks up "keys" to find a wrapper to use in place
of the default_wrapper.
>>>The same things apply to the parsing of
``wsgi.input``, specifically
>>>parsing form data. A similar strategy is
presented to avoid
>>>unnecessarily reparsing that data.
>>I would rather offer an optional
'get_file_storage()' method or some such
>>as a blessed WSGI extension, than have such an
open-ended "get whatever
>>you want from the input object" concept
floating around. A strategy
>>which reinvents half of PEP 246 (the *old* PEP 246,
before it became
>>almost as complicated as WSGI) seems like overkill
to me.
>
>I don't really understand what you are proposing.
That wsgi.input be allowed to have a 'get_file_storage()'
method that can
be called by applications, and that calling it means the
input stream must
not have been read and will no longer be readable.
>This part addresses the same issues as presented in
>http://wsgi.org/wsgi/Specifications/handling_post_forms
a>
>
>I really don't *want* to write every wsgi.input to a
temporary file just
>because someone else *might* want to reparse the input.
I'd much rather
>do it lazily, as 99% of the time reparsing won't
happen.
I don't understand your complaint, as it seems unrelated to
what I propose.
>>>Other Possibilities
>>>-------------------
>>>
>>>* You could simply parse everything ever time.
>>>* You could pass data through callbacks in the
environment (but this can
>>>break non-aware middleware).
>>>* You can make custom methods and keys for each
case.
>>>* You can use something other than WSGI.
>>And you can use the established WSGI method for
adding semantics to a
>>response, using a middleware-supplied wrapper. I
think this is actually
>>the best alternative.
>
>I really don't understand the advantage.
It's simple: *specifications are a liability in the general
case*. They
are supposed to be the record of negotiations between people
who need to
co-operate, not an attempt to solve all possible problems.
So, if your spec is only about how relatively tight-coupled
WFC's (WSGI
framework components) talk to each other, it seems more
properly the
business of a web framework, not WSGI.
However, it *is* WSGI (wsgi-onic?) for the authors of
certain components to
get together and say, "hey let's agree on this wrapper
protocol"... or
better yet, a wrapper *implementation*.
This is way way better than having another spec. Every
godforsaken new
spec attached to WSGI just makes the whole thing seem way
too
complicated. In retrospect, I wish I hadn't supported some
of the options
and doodads and whatnots that are in WSGI today. If I had
it to do over,
WSGI would be a lot simpler.
However, it's not too late to stop adding new cruft -- and I
consider the
idea of reinventing PEP 246 inside of WSGI to be cruft of a
most horrible kind.
>Best practice is fine, though of course still needs to
be documented, as
>this is hardly a practice that people would naturally
think about or implement.
Well, it's in PEP 333.
> But I don't really think that practice would be any
simpler or easier
> to describe if done completely. In fact, I think it
would take exactly
> the same amount of space to describe.
Even if it *did*, it'd still be better. However, since it's
not a spec, it
can be presented informally. Here's an example:
"If you want to give applications underneath your
middleware a chance to
return rich responses (i.e., objects instead of strings),
follow the
pattern used for the WSGI 'file wrapper' object. That is,
have your server
or middleware add an environ key with a wrapper API that can
convert the
richer objects you're expecting into a standard WSGI
iterator. Then, your
server can simply inspect the iterators it receives to see
if they are
instances of your wrapper type, and pull out the objects you
want. In this
way, if there is middleware between you and the application
returning the
rich response that modifies the response body, you will
receive an iterator
of a different type, which you can process in the usual way.
However, if
you receive an instance of your wrapper type, you will know
that you can
access the rich data directly."
Now, can you expand this into more of a tutorial, give more
hints and so
on? Absolutely. It'd be a great idea to. But the basic
idea is simple
and doesn't require rigorous definitions -- it just needs
people to publish
what keys they're using and the *specifications thereof*.
What you're trying to specify is effectively a
*meta*-specification: much
more difficult to do well, and not nearly as useful to have
in this case.
_______________________________________________
Web-SIG mailing list
Web-SIG python.org
Web SIG: http://www.python.
org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/bo
nd%40yahoo.com
|
|
[1-2]
|
|