List Info

Thread: replaceEntities replacing entities twice




replaceEntities replacing entities twice
user name
2006-07-27 02:57:22
I wanted to follow up with this bug:


WebKit was also running into this bug.  We inherited some work-around code from KDOM:

static xmlEntityPtr getEntityHandler(void *closure, const xmlChar *name)
{
    xmlParserCtxtPtr ctxt = static_cast<xmlParserCtxtPtr&gt;(closure);
    xmlEntityPtr ent = xmlGetPredefinedEntity(name);
    if (ent)
        return ent;

    ent = xmlGetDocEntity(ctxt->myDoc, name);
    if (!ent && getTokenizer(closure)->isXHTMLDocument())
        ent = getXHTMLEntity(name);

    // Work around a libxml SAX2 bug that causes charactersHandler to be called twice.
    if (ent)
        ctxt->replaceEntities = (ctxt->instate == XML_PARSER_ATTRIBUTE_VALUE) || (ent->etype != XML_INTERNAL_GENERAL_ENTITY);
    
    return ent;
}

Recently I've noticed that the above work-around code is not quite correct and is causing troubles of its own:


I'm looking for any ideas for a better workaround.

Thanks.

-eric




replaceEntities replacing entities twice
user name
2006-07-27 10:02:35
On Wed, Jul 26, 2006 at 10:57:22PM -0400, Eric Seidel wrote:
> I wanted to follow up with this bug:
> 
> http
://bugzilla.gnome.org/show_bug.cgi?id=159219
> 
> WebKit was also running into this bug.  We inherited
some work-around  
> code from KDOM:
> 
> static xmlEntityPtr getEntityHandler(void *closure,
const xmlChar *name)
> {
>     xmlParserCtxtPtr ctxt =
static_cast<xmlParserCtxtPtr>(closure);
>     xmlEntityPtr ent = xmlGetPredefinedEntity(name);
>     if (ent)
>         return ent;
> 
>     ent = xmlGetDocEntity(ctxt->myDoc, name);
>     if (!ent &&
getTokenizer(closure)->isXHTMLDocument())
>         ent = getXHTMLEntity(name);
> 
>     // Work around a libxml SAX2 bug that causes
charactersHandler  
> to be called twice.
>     if (ent)
>         ctxt->replaceEntities = (ctxt->instate ==
 
> XML_PARSER_ATTRIBUTE_VALUE) || (ent->etype !=  
> XML_INTERNAL_GENERAL_ENTITY);
> 
>     return ent;
> }
> 
> Recently I've noticed that the above work-around code
is not quite  
> correct and is causing troubles of its own:

  There is a big warning in red on libxml2 doc about
entities and SAX:
    http://xmlsoft.org/e
ntities.html
 
libxml2 default sax callback build entities informations
associated 
to the document. That documenbt is also contsructed by sax
callbacks.
If you replace the sax callbacks you also must construct the
document
and associated entities.
The code you pasted can only work if a number of other
things are in place
in order to build the entities from the sax callbacks, I
really can't guess
what is going on in your framework, and correct entities
support can be
extremely hard to implement right. Handling of entities
while in the 
internal subset or in the external subset must be different
than handling
from content, this can get incredibly complex.
In a nutshell, sorry you got yourself in a very hard place,
the xmlReader
is a way cleaner and simpler streaming API which will take
care of those
issues. 

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )