List Info

Thread: Re: problem with elementtree 1.2.6




Re: problem with elementtree 1.2.6
country flaguser name
United Kingdom
2007-11-29 17:30:59
Fredrik Lundh wrote:
> Chris Withers wrote:
> 
>>> That's how escaping works, be it in XML,
encodings, compression, whatever.
>> Well yes and no. I'd expect escaping to work such
that whatever we're 
>> dealing with can be round tripped, ie: parsed,
serialiazed, parsed 
>> again, etc.
> 
> that's exactly how it works in ET, of course.  

I didn't say it didn't 

> cdata is character data; see
> 
>      http://
www.w3.org/TR/html401/types.html#h-6.2
> 
> that's not the same thing as a "CDATA
section" (which is just one of 
> several ways to store character data in an XML file). 

Ug. How confusing :-(

> how things are 
> stored doesn't matter; that's just a serialization
detail:
> 
>      http://www.
w3.org/TR/xml-infoset/#omitted
> 
>      What is not in the Information Set
> 
>      6. Whether characters are represented by character
references.
>      19. The boundaries of CDATA marked sections.
>      ...

I'm not sure I follow what you're trying to say...

>> I and many others do not  When
writing content into an html template, 
>> that content often comes from other sources that
spit out lumps of html. 
>> Being able to insert them without escaping is a
common use case.
> 
> HTML might be similar to XML, but an XML parser cannot
parse HTML, so 
> you cannot insert HTML fragments into an XML document
without either
> escaping it, or pre-processing it to make sure it's
well-formed.

What about xhtml?

> if you want to embed HTML fragments in an ET tree, use
ElementTidy or 
> ElementSoup (or equivalent) to turn the fragment into
properly nested 
> and properly namespaced XHTML.

Fair enough...

> if you want to do unstructured string handling, use a
template library 

I'm using/building a templating library, it just happens
that ET is an 
implementation detail of that template library 

>> That's true, sometimes. That inserted lump may have
come from a process 
>> which can only spit out perfect html fragments, in
which case you're 
>> fine, or it may come from user input, in which case
you're doomed but 
>> will likely have happy customers 
> 
> the hackers will be happy, at least:
> 
>      htt
p://en.wikipedia.org/wiki/Cross_site_scripting

user -> content author in this case.
Since they usually own and run the system to which they're
adding 
content, a much more effective attack would just be to turn
the box off :-P

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python
Consulting
            - http://www.simplistix.co.
uk
_______________________________________________
XML-SIG maillist  -  XML-SIGpython.org
http:
//mail.python.org/mailman/listinfo/xml-sig

Re: problem with elementtree 1.2.6
user name
2007-11-30 02:05:23
>>      What is not in the Information Set
>>
>>      6. Whether characters are represented by
character references.
>>      19. The boundaries of CDATA marked sections.
>>      ...
> 
> I'm not sure I follow what you're trying to say...

That it is irrelevant in XML whether the less-than character
is
represented as < or < or
<![CDATA[<]]>

So if some XML library choses to represent < as &lt;
you should
not be surprised.

It's not clear to me (perhaps because I lack the starting of
this
discussion) what the actual problem *is* that you are trying
to
resolve.

>>> I and many others do not  When
writing content into an html template, 
>>> that content often comes from other sources
that spit out lumps of html. 
>>> Being able to insert them without escaping is a
common use case.
>> HTML might be similar to XML, but an XML parser
cannot parse HTML, so 
>> you cannot insert HTML fragments into an XML
document without either
>> escaping it, or pre-processing it to make sure it's
well-formed.
> 
> What about xhtml?

It should be possible to insert XHTML fragments into XHTML
documents,
in selected positions, assuming an appropriate definition of
"to insert".

For ET (and any other tree-oriented XML implementation),
replacing
text with serialized XHTML in the tree is not an
appropriate
implementation of "to insert", as that will just
insert less-than
characters, not markup. To insert markup (in particular,
tags,
i.e. elements), you need to insert Element objects into the
tree.


Regards,
Martin

_______________________________________________
XML-SIG maillist  -  XML-SIGpython.org
http:
//mail.python.org/mailman/listinfo/xml-sig

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )