On Thu, Jun 22, 2006 at 10:57:02AM +0200, Cyrill Osterwalder
wrote:
>
> I suppose I found the reason why chunked CDATA parsing
also fails
> without the special recovery mode:
>
> If the chunk actually ends with "</",
then htmlParseTryOrFinish() calls
> htmlParseScript() to process it. In there, the normal
break condition is
> coded as follows:
>
> if ((cur == '<') && (NXT(1) == '/')) {
> if (((NXT(2) >= 'A') && (NXT(2) <=
'Z')) ||
> ((NXT(2) >= 'a') && (NXT(2) <=
'z')))
> {
> break; /* while */
> }
> }
>
> However, NXT(2) is not guaranteed to be available. So
it will not break
> but consume the "</", which leads to a
broken CDATA parsing in all
> cases, even without PARSE_HTML_RECOVER being set. This
could be solved
> by avoiding calling htmlParseScript() with a chunk
ending with "</".
>
> The case with the CDATA recovery option is even more
complicated.
>
> I wonder what you think if we would check in
htmlParseTryOrFinish() that
> the last 8 characters of the chunk do not include
"</" before calling
> htmlParseScript() in order to solve both cases?
Assuming we are in a
> CDATA block being followed by at least one real end tag
and other tags
> afterwards this should be safe, shouldn't it?
I think delaying calling the parser if "</"
is present in the last 8
character would be somewhat broken. You could perfectly find
a number of
other elements after the script/style block (actually I
would expect that)
and those need to be closed.
What should be checked is probably that there is more than
8 characters
in the buffer for consumption there (i.e. avail >=8),
that should be safe:
- it garantee we can test for the tag name
- a style or script is unlikely to be at the very end of
an HTML document
(and if it is it we would have terminate), plus it's
not yet displayable
content so waiting for the next packet should not
generate a degradation
there.
Can you test by changing the condition to:
if ((!terminate) &&
((htmlParseLookupSequence(ctxt,
'<', '/', 0, 0) < 0) ||
(avail < 8)))
goto done;
in that "Handle SCRIPT/STYLE separately" section
and report ? If positive
provide a contextual patch
> PS: Please let me know if such detailed source code
discussions are not
> supposed to be done on the list
that's fine, that's where the knowledge should be
shared!
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|