List Info

Thread: HTML Parser problems with chunk parser if HTML keywordsoverlap chunk border




HTML Parser problems with chunk parser if HTML keywordsoverlap chunk border
user name
2006-06-22 06:22:36
>Why do you use PARSE_HTML_RECOVER ? The parser is
already 
>doing recovery mode to some extend without them 
>(I mean the HTML parser .

Actually, the problem also seems to exist without
PARSE_HTML_RECOVER,
otherwise the test with testHTML.c of the libxml2 package
would not show
it, right? I will have to look at this again. I had the
impression that
recovery mode is the trigger in htmlParseScript() to
actually produce
the problem. But my testHTML.c example can be easily
reproduced and it
does not use HTML_RECOVER. With the testHTML.c example it
seems that
parsing fails if the CDATA end tag overlaps the chunk
boundary. If
that's true even without PASRE_HTML_RECOVER, then it's
just a matter of
luck if chunked parsing HTML with CDATA is successful.

>with '</ style' or '</foo> and expect taht
to close the open tag, and
>'style "</" style' and expect to not
close it...

I see. I guess there's a reason why the slash in
"</" should be quoted
in CDATA contents if not being the real end tag  However,
there's a
lot of HTML out "in the wild" containing
unquoted "</" strings in CDATA
blocks.

>You can try, but it's all very messy IMHO, I will take 
>patches if not obviously broken

I will further look at it and get back to the list if I'm
able to
produce anything useful.

Thanks,

Cyrill
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )