List Info

Thread: HTML Parser problems with chunk parser ifHTMLkeywordsoverlap chunk border




HTML Parser problems with chunk parser ifHTMLkeywordsoverlap chunk border
user name
2006-06-22 15:11:59
>please use an attachment, not in the mail body, mailers
breaks 
>body content.
<...>
>provide test example as attachmnent too, I will plug
them 
>in test/HTML

The attached tar.gz includes the contextual patch of
HTMLparser.c of
libxml2-2.6.24 (now with htmlParseLookupSequence) and the
test HTML file
"chunk-boundary-cdata.html". The test HTML file
triggers the error in
libxml2 because it has the closing
"</script>" tag exactly on the 4096
boundary. To reproduce the test, the number of chars in the
test HTML
file and the number of bytes read by testHTML must not be
changed(!).
The character alignment needs to match exactly to trigger
the error.

Before the patch, libxml2-2.6.24 will fail the following
test with the
simple test HTML file:

./testHTML --push --sax --debug chunk-boundary-cdata.html

SAX.setDocumentLocator()
SAX.startDocument()
SAX.startElement(html)
SAX.startElement(body)
SAX.characters(.............................., 1000)
SAX.characters(...........................
.., 1000)
SAX.characters(.............................., 1000)
SAX.characters(...........................
.., 1000)
SAX.characters(.............................., 74)
SAX.startElement(script)
SAX.error: Invalid char in CDATA 0x0
SAX.cdata(&lt;/, 2)
SAX.error: htmlParseEndTag: '</' not found
SAX.cdata(cript&gt;
&lt;a href="test", 26)
SAX.error: Unexpected end tag : a
SAX.cdata(
, 1)
SAX.endElement(script)
SAX.endElement(body)
SAX.ignorableWhitespace(
, 1)
SAX.endElement(html)
SAX.ignorableWhitespace(
, 1)
SAX.endDocument()


After the patch, the result is correct.

Cyrill
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )