List Info

Thread: why is htmlNewParserCtxt static?




why is htmlNewParserCtxt static?
user name
2006-09-20 08:37:27
Hi,

The htmlNewParserCtxt() function is static, but
xmlNewParserCtxt() is 
not. Would it be possible to make htmlNewParserCtxt()
function globally 
available so that it can be called from outside libxml2?

Best regards,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
why is htmlNewParserCtxt static?
user name
2006-09-20 13:16:02
On Wed, Sep 20, 2006 at 06:37:27PM +1000, Michael Day wrote:
> Hi,
> 
> The htmlNewParserCtxt() function is static, but
xmlNewParserCtxt() is 
> not. Would it be possible to make htmlNewParserCtxt()
function globally 
> available so that it can be called from outside
libxml2?

  But shouldn't you use one of the wrapper functions where
you specify
what input will be used ? Like htmlCreateMemoryParserCtxt()
or one of the
htmlRead... function ? What is your use case ?

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
why is htmlNewParserCtxt static?
user name
2006-09-21 00:15:41
Hi Daniel,

>   But shouldn't you use one of the wrapper functions
where you specify
> what input will be used ? Like
htmlCreateMemoryParserCtxt() or one of the
> htmlRead... function ? What is your use case ?

The use case is replacing xmlDefaultExternalEntityLoader()
with a 
customised entity loader function.

The entity loader is passed an xmlParserCtxtPtr, which
allows 
document-specific options to be passed in using the _private
field and 
used by the entity loader (as long as you don't use
xmlReader  So when 
I parse a document I want to pass in some stuff in the
_private field of 
the context.

Normally I would use xmlReadFile(), but that creates its own
context and 
destroys it afterwards and doesn't give me a chance to put
anything in 
_private. However, xmlCtxtReadFile() looks like just what I
want, as it 
takes a context that I've already created with the
appropriate stuff in 
_private.

On the HTMLparser side of things I can't do this, as
htmlNewParserCtxt() 
is static and I can't call it. I want to call
htmlCtxtReadFile(), but 
first I need to create a context, set the _private field and
pass it in.

The htmlCreateFileParserCtxt() is the only way I can create
a context, 
but I can't use it, because it calls
xmlLoadExternalEntity() itself, 
before I have had a chance to set _private. Then when I pass
the context 
into htmlCtxtReadFile() it will call xmlLoadExternalEntity()
again!

So I need a way to create a HTML parsing context that will
let me set 
_private before trying to load any external entities. Does
this seem 
reasonable? I can do it with the XML parsing API, so I just
assume that 
the HTML parsing API should work the same way.

Best regards,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
why is htmlNewParserCtxt static?
user name
2006-09-21 06:54:32
On Thu, Sep 21, 2006 at 10:15:41AM +1000, Michael Day wrote:
> The entity loader is passed an xmlParserCtxtPtr, which
allows 
> document-specific options to be passed in using the
_private field and 
> used by the entity loader (as long as you don't use
xmlReader  So when 
> I parse a document I want to pass in some stuff in the
_private field of 
> the context.

  Okay,

> Normally I would use xmlReadFile(), but that creates
its own context and 
> destroys it afterwards and doesn't give me a chance to
put anything in 
> _private. However, xmlCtxtReadFile() looks like just
what I want, as it 
> takes a context that I've already created with the
appropriate stuff in 
> _private.
> 
> On the HTMLparser side of things I can't do this, as
htmlNewParserCtxt() 
> is static and I can't call it. I want to call
htmlCtxtReadFile(), but 
> first I need to create a context, set the _private
field and pass it in.

  Okay, makes sense, you need to be able to bootstrap
htmlCtxtRead*()
Fixed in CVS !

  thanks 

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
why is htmlNewParserCtxt static?
user name
2006-09-23 07:32:31
Hi Daniel,

>> The entity loader is passed an xmlParserCtxtPtr,
which allows 
>> document-specific options to be passed in using the
_private field and 
>> used by the entity loader (as long as you don't
use xmlReader  So when 
>> I parse a document I want to pass in some stuff in
the _private field of 
>> the context.

I've just realised that the XInclude mechanism does not
support this, as 
it creates its own XML parser context internally and
doesn't provide any 
mechanism to pass in user data via the _private field.

Specifically I am calling xmlXIncludeProcessFlags(), which
in turn calls 
xmlXIncludeParseFile(), which creates a new parser context
to do the 
parsing.

Is there some way that I could pass in my own
xmlParserCtxtPtr to the 
XInclude API?

Alternatively, if the XML_PARSE_XINCLUDE flag applied to the
xmlRead* 
functions then there would be no problem, but I understand
that this 
flag is only implemented by the xmlReader API at the moment.
Would it 
make sense to apply this flag to the regular SAX2 parser as
well?

Best regards,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
What is the _private field actually for?
user name
2006-09-25 08:11:22
> I've just realised that the XInclude mechanism does
not support this, as 
> it creates its own XML parser context internally and
doesn't provide any 
> mechanism to pass in user data via the _private field.

I've tried creating a new function:
xmlXIncludeProcessFlagsData(), which 
takes an additional void* argument that it passes in to the
_private 
field of the created XML parser context. This works fine.

However, I've discovered when parsing an XML document that
uses external 
entities, the context used for parsing the external entities
does not 
preserve the _private field of the original parser context.

eg. If I try to parse the following document using
xmlCtxtReadFile:

<!DOCTYPE foo [
<!ENTITY bar SYSTEM "bar.xml">
]>
<foo>Hello &bar; world!</foo>

the context that I pass in (with _private field set) will
*not* be used 
to parse the external entity "bar.xml".
Strangely enough, the context 
that I pass in *will* be used to parse external DTDs, which
seems a bit 
inconsistent.

It seems that further patches to libxml2 will be necessary
if I want to 
be able to use the _private field of the XML parser context
in this way. 
Which leads me to ask what the _private field exists for in
the first 
place, if it cannot be relied upon to be there. What is the
use case for 
this application data field, and in what situations can it
actually be 
used reliably?

(On a slightly unrelated note, it seems that requirements
for a future 
libxml3 have been circulating since 2002 or before. Has
there been any 
progress since then? 

Cheers,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
why is htmlNewParserCtxt static?
user name
2006-09-28 09:35:36
On Sat, Sep 23, 2006 at 05:32:31PM +1000, Michael Day wrote:
> Hi Daniel,

  Hi, sorry for the delay, I was first away and then sick
...

> >>The entity loader is passed an
xmlParserCtxtPtr, which allows 
> >>document-specific options to be passed in using
the _private field and 
> >>used by the entity loader (as long as you don't
use xmlReader  So when 
> >>I parse a document I want to pass in some stuff
in the _private field of 
> >>the context.
> 
> I've just realised that the XInclude mechanism does not
support this, as 
> it creates its own XML parser context internally and
doesn't provide any 
> mechanism to pass in user data via the _private field.

  Right, it really was designed in isolation.

> Specifically I am calling xmlXIncludeProcessFlags(),
which in turn calls 
> xmlXIncludeParseFile(), which creates a new parser
context to do the 
> parsing.
> 
> Is there some way that I could pass in my own
xmlParserCtxtPtr to the 
> XInclude API?

  I don't think so, at this point.

> Alternatively, if the XML_PARSE_XINCLUDE flag applied
to the xmlRead* 
> functions then there would be no problem, but I
understand that this 
> flag is only implemented by the xmlReader API at the
moment. Would it 
> make sense to apply this flag to the regular SAX2
parser as well?

  The problem would be to implement XInclude at the SAX
level, it's not
really trivial, it's like a reimplementation, and well it
would be hard
to provide the xpointer/xpath support too.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
What is the _private field actually for?
user name
2006-09-28 09:45:47
On Mon, Sep 25, 2006 at 06:11:22PM +1000, Michael Day wrote:
> >I've just realised that the XInclude mechanism does
not support this, as 
> >it creates its own XML parser context internally
and doesn't provide any 
> >mechanism to pass in user data via the _private
field.
> 
> I've tried creating a new function:
xmlXIncludeProcessFlagsData(), which 
> takes an additional void* argument that it passes in to
the _private 
> field of the created XML parser context. This works
fine.

  Hum, yeah, I don't see how to do this except by adding yet
another API
with yet another extra data, not nice but I don't see a
workaround.

> However, I've discovered when parsing an XML document
that uses external 
> entities, the context used for parsing the external
entities does not 
> preserve the _private field of the original parser
context.
> 
> eg. If I try to parse the following document using
xmlCtxtReadFile:
> 
> <!DOCTYPE foo [
> <!ENTITY bar SYSTEM "bar.xml">
> ]>
> <foo>Hello &bar; world!</foo>
> 
> the context that I pass in (with _private field set)
will *not* be used 
> to parse the external entity "bar.xml".
Strangely enough, the context 
> that I pass in *will* be used to parse external DTDs,
which seems a bit 
> inconsistent.

  Well getting DTD and entity parsing right is unfortunately
horribly complex
and it took a while to get conformant to the spec. Some of
the result is 
that some inconsistancies like that sneaked in because the
design has been
revamped at least 3 time. In a sense XInclude is yet another
extra layer in
that already complex stack.

> It seems that further patches to libxml2 will be
necessary if I want to 
> be able to use the _private field of the XML parser
context in this way. 
> Which leads me to ask what the _private field exists
for in the first 
> place, if it cannot be relied upon to be there. What is
the use case for 
> this application data field, and in what situations can
it actually be 
> used reliably?

  No code can be considered reliable until it has been used
over and over in
different ways. That has been reliable for other users I
guess, you're hitting
issues because you go a bit further, sorry no silver bullet
around. But we
can and should fix issues found when they appear, based on
needs and available
manpower.

> (On a slightly unrelated note, it seems that
requirements for a future 
> libxml3 have been circulating since 2002 or before. Has
there been any 
> progress since then? 

  Considering the amount of resources, and how painful the
transition from
libxml version 1 has been (I only now managed to get rid of
it from Fedora !)
I have no intent for anything like libxml3 in the foreseable
future, sorry
again !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
What is the _private field actually for?
user name
2006-09-28 12:25:44
Hi Daniel,

>   Hum, yeah, I don't see how to do this except by
adding yet another API
> with yet another extra data, not nice but I don't see a
workaround.

Alright, I will submit a patch for this later then, as it's
not a very 
big change.

>   Well getting DTD and entity parsing right is
unfortunately horribly complex
> and it took a while to get conformant to the spec. Some
of the result is 
> that some inconsistancies like that sneaked in because
the design has been
> revamped at least 3 time. In a sense XInclude is yet
another extra layer in
> that already complex stack.

Okay, but then it should be feasible to fix this, right? Is
it 
worthwhile me taking a look at it to see what is going on? I
just don't 
know much about how the SAX2 parser handles entities and
when the parser 
contexts get switched around, but I should be able to trace
through it, 
if you're willing to accept the resulting patches 

>   Considering the amount of resources, and how painful
the transition from
> libxml version 1 has been (I only now managed to get
rid of it from Fedora !)
> I have no intent for anything like libxml3 in the
foreseable future, sorry
> again !

That's a pity, but quite understandable; it's shocking how
so many 
different versions of the same libraries need to be
installed to make 
all the apps work. (Prince actually ends up getting linked
to libxml2 
and expat, which is used by Fontconfig, so two XML parsers!)

Cheers,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
What is the _private field actually for?
user name
2006-09-28 12:42:45
Michael Day <mikedayyeslogic.com> writes:

> That's a pity, but quite understandable; it's shocking
how so many 
> different versions of the same libraries need to be
installed to make 
> all the apps work. (Prince actually ends up getting
linked to libxml2 
> and expat, which is used by Fontconfig, so two XML
parsers!)

IMHO we don't need a new library. We just need simpler (but
less
adaptable) layers on top of libxml2.

An expat compatability layer would be cool for example.

Lot's of people I've worked with do seem to find libxml2
difficult.

-- 
Nic Ferrier
http://www.tapsellfer
rier.co.uk   for all your tapsell ferrier needs
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1-10] [11-18]

about | contact  Other archives ( Real Estate discussion Medical topics )