>Why do you use PARSE_HTML_RECOVER ? The parser is
already
>doing recovery mode to some extend without them
>(I mean the HTML parser .
Actually, the problem also seems to exist without
PARSE_HTML_RECOVER,
otherwise the test with testHTML.c of the libxml2 package
would not show
it, right? I will have to look at this again. I had the
impression that
recovery mode is the trigger in htmlParseScript() to
actually produce
the problem. But my testHTML.c example can be easily
reproduced and it
does not use HTML_RECOVER. With the testHTML.c example it
seems that
parsing fails if the CDATA end tag overlaps the chunk
boundary. If
that's true even without PASRE_HTML_RECOVER, then it's
just a matter of
luck if chunked parsing HTML with CDATA is successful.
>with '</ style' or '</foo> and expect taht
to close the open tag, and
>'style "</" style' and expect to not
close it...
I see. I guess there's a reason why the slash in
"</" should be quoted
in CDATA contents if not being the real end tag However,
there's a
lot of HTML out "in the wild" containing
unquoted "</" strings in CDATA
blocks.
>You can try, but it's all very messy IMHO, I will take
>patches if not obviously broken
I will further look at it and get back to the list if I'm
able to
produce anything useful.
Thanks,
Cyrill
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|