On Wed, Jun 21, 2006 at 04:29:56PM +0200, Cyrill Osterwalder
wrote:
> Hi all
>
> After some more research I believe to have found the
reason for the
> problem with the CDATA parsing. In case
PARSE_HTML_RECOVER is true, the
> following criteria in htmlParseTryOrFinish() is not
enough for calling
> htmlParseScript():
>
> /*
> * Handle SCRIPT/STYLE separately
> */
> if ((!terminate) &&
> (htmlParseLookupSequence(ctxt, '<', '/', 0,
0) < 0))
> goto done;
> htmlParseScript(ctxt);
>
>
> This code makes sure that there is an end tag starting
somewhere in the
> buffer that is going to be processed by
htmlParseScript(). However, in
> recovery mode, htmlParseScript() will consume the
"</" characters if the
> real CDATA end tag is not fully inside the current
chunk (like described
> in the problem report).
True. I was think about something like that. This is all
due to
script and style having different parsing constraints.
Why do you use PARSE_HTML_RECOVER ? The parser is already
doing recovery
mode to some extend without them (I mean the HTML parser
.
> I don't have a patch recommendation for the moment but
I see two
> possibilities:
>
> a) htmlParseTryOrFinish() could guarantee that the
buffer contains the
> desired close tag (or terminate is true). I guess that
this could be
> done using multiple htmlParseLookupSequence() calls and
checking for the
> tag name in a loop...?
Hum, well we could check for the current element and make
2 specific
tests in that case. This would be very hard anywy people are
gonna come
with '</ style' or '</foo> and expect taht to
close the open tag, and
'style "</" style' and expect to not close
it...
> b) htmlParseScript would have to be more powerful in
order to recognize
> that it is trying to do xmlStrncasecmp() on an
incomplete tag string. In
> that case it should break and be called again by
htmlParseTryOrFinish().
> That on the other hand would have to be more careful
with the switch to
> the end tag processing after the call to
htmlParseScript().
Not sure it's much better
> Possibility a) looks better to me and might try to
implement a patch
> example.
You can try, but it's all very messy IMHO, I will take
patches if not
obviously broken (could be a good idea to provide examples
for the test
suite too).
thanks
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|