|
List Info
Thread: can't use DOM methods in XML::LibXML when parsing from file...
|
|
| can't use DOM methods in XML::LibXML
when parsing from file... |

|
2007-05-09 12:40:05 |
weird $doc->getElementById() doesn't work on XML::LibXML
objects
parsed from a file.
Can't locate object method "getElementById" via
package
"XML::LibXML: ocument&
quot; at /UI/console/parts/help/list line 20
19: my $p = XML::LibXML->new();
20: my $doc =
$p->parse_html_file('/UI/console/parts/help/console');
21: my $help = $doc->getElementById('help_speed');
22: my $xml_str = $help->toString;
any ideas on that one?
$doc->documentElement; returns the html object no
problem. also,
getElementsByTagName does not work.
--
Anthony Ettinger
Ph: 408-656-2473
http://chovy.dynd
ns.org/resume.html
http://utuxia.com/consul
ting
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|
|
| Re: can't use DOM methods in XML::LibXML
when parsing from file... |
  Czech Republic |
2007-05-09 16:48:44 |
On Wednesday 09 May 2007, Anthony Ettinger wrote:
> On 5/9/07, Anthony Ettinger <anthony chovy.com> wrote:
> > On 5/9/07, Anthony Ettinger <anthony chovy.com> wrote:
> > > On 5/9/07, Petr Pajas <pajas ufal.mff.cuni.cz> wrote:
> > > > On Wednesday 09 May 2007, Anthony
Ettinger wrote:
> > > > > weird $doc->getElementById()
doesn't work on XML::LibXML objects
> > > > > parsed from a file.
> > > > >
> > > > > Can't locate object method
"getElementById" via package
> > > > > "XML::LibXML: ocument&
quot; at /UI/console/parts/help/list line 20
> > > > >
> > > > > 19: my $p =
XML::LibXML->new();
> > > > > 20: my $doc =
> > > > >
$p->parse_html_file('/UI/console/parts/help/console');
21: my
> > > > > $help =
$doc->getElementById('help_speed');
> > > > > 22: my $xml_str =
$help->toString;
> > > > >
> > > > > any ideas on that one?
> > > > >
> > > > > $doc->documentElement; returns
the html object no problem. also,
> > > > > getElementsByTagName does not
work.
> > > >
> > > > The same code works for me, XML::LibXML
1.63. My test:
> > > >
> > > > perl html_id.pl http://www.perl.org
description
> > > >
> > > > where html_id.pl:
> > > >
> > > > #!/usr/bin/perl
> > > > use XML::LibXML;
> > > > my $p = XML::LibXML->new();
> > > > my $doc =
$p->parse_html_file(shift);
> > > > my $help =
$doc->getElementById(shift);
> > > > my $xml_str = $help->toString;
> > > > print $xml_str,"n";
> > > > ___END__
> > > >
> > > > Sadly, being too strict, the parser is
unusable for real-world HTML
> > > > unless $parser->recover(1) is used.
It took me some minutes to find a
> > > > site whose HTML libxml2 doesn't complain
about for some reason or
> > > > other :-(
> > > >
> > > > -- Petr
> > >
> > > I'm using 1.58 of XML::LibXML, trying to
upgrade now, but that is
> > > introducing a whole new set of problems.
> >
> > after much pain, crying, and anguish in #perl --
and upgrading to
> > 1.63, it now works.
> > --
>
> It still however ($doc->getElementById) does not
work on xml
> (specifically a docbook file I'm using).
>
> <?xml version="1.0"
encoding="UTF-8"?>
> <book lang="en">
> <bookinfo>
> <title>Foobar Doc</title>
> </bookinfo>
> <chapter id="foobar">
> <title>Sub Title of Foobar</title>
> <para>Blah blah</para>
> </chapter>
> </book>
>
>
> I get an undefined object when I call
$doc->getElementById('foobar');
>
> There is an explanation here, but I cannot make sense
of it:
> http://cpan.uwinnipeg.ca/htdocs/XML-L
ibXML/Document.pm.html#strong_getEleme
>ntById_strong
pitty, the explanation is correct.
> Also a thread here, which discusses an altternative
method (that I am
> not wanting to use).
> http://www.p
erlmonks.org/?node_id=516988
>
The DOM spec says it explicitly:
Note: Attributes with the name "ID" or
"id" are not of type ID unless so
defined.
To make the "hashing" work and to be able to
lookup nodes using
getElementById, you have to use a DTD which declares the
type of this
particular attribute as ID type. For a DocBook document you
would add
something like
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML
V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/do
cbookx.dtd">
to the preamble of your document. See
http://www.docbook.org/tdg/en/html/appb.html#d0e286815
for details.
Alternatively, you can use xml:id attributes, which are
reserved for this
purpose by http://www.w3.org/TR/xm
l-id/. They are fully supported by libxml2,
too.
For attributes which are not of type ID, you may use XPath
lookups, as
suggested in the first of those threads:
$doc->findnodes(qq(//*[ id="$id"]))
but that is not as fast as getElementById because it is not
hashed.
> Why can't we just use standard DOM functions on valid
XML?
And why can't you just formulate your questions in a
civilized way? I mean
pragmatically, without this scent of blaming (whom
actually?). This time
problem is on your side, not in DOM, XML, or XML::LibXML.
-- Petr
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|
|
| Re: can't use DOM methods in XML::LibXML
when parsing from file... |

|
2007-05-09 16:58:16 |
On 5/9/07, Petr Pajas <pajas ufal.mff.cuni.cz>
wrote:
> On Wednesday 09 May 2007, Anthony Ettinger wrote:
> > On 5/9/07, Anthony Ettinger <anthony chovy.com> wrote:
> > > On 5/9/07, Anthony Ettinger <anthony chovy.com> wrote:
> > > > On 5/9/07, Petr Pajas <pajas ufal.mff.cuni.cz> wrote:
> > > > > On Wednesday 09 May 2007, Anthony
Ettinger wrote:
> > > > > > weird
$doc->getElementById() doesn't work on XML::LibXML
objects
> > > > > > parsed from a file.
> > > > > >
> > > > > > Can't locate object method
"getElementById" via package
> > > > > > "XML::LibXML: ocument&
quot; at /UI/console/parts/help/list line 20
> > > > > >
> > > > > > 19: my $p =
XML::LibXML->new();
> > > > > > 20: my $doc =
> > > > > >
$p->parse_html_file('/UI/console/parts/help/console');
21: my
> > > > > > $help =
$doc->getElementById('help_speed');
> > > > > > 22: my $xml_str =
$help->toString;
> > > > > >
> > > > > > any ideas on that one?
> > > > > >
> > > > > > $doc->documentElement;
returns the html object no problem. also,
> > > > > > getElementsByTagName does not
work.
> > > > >
> > > > > The same code works for me,
XML::LibXML 1.63. My test:
> > > > >
> > > > > perl html_id.pl http://www.perl.org
description
> > > > >
> > > > > where html_id.pl:
> > > > >
> > > > > #!/usr/bin/perl
> > > > > use XML::LibXML;
> > > > > my $p = XML::LibXML->new();
> > > > > my $doc =
$p->parse_html_file(shift);
> > > > > my $help =
$doc->getElementById(shift);
> > > > > my $xml_str = $help->toString;
> > > > > print $xml_str,"n";
> > > > > ___END__
> > > > >
> > > > > Sadly, being too strict, the parser
is unusable for real-world HTML
> > > > > unless $parser->recover(1) is
used. It took me some minutes to find a
> > > > > site whose HTML libxml2 doesn't
complain about for some reason or
> > > > > other :-(
> > > > >
> > > > > -- Petr
> > > >
> > > > I'm using 1.58 of XML::LibXML, trying to
upgrade now, but that is
> > > > introducing a whole new set of
problems.
> > >
> > > after much pain, crying, and anguish in #perl
-- and upgrading to
> > > 1.63, it now works.
> > > --
> >
> > It still however ($doc->getElementById) does
not work on xml
> > (specifically a docbook file I'm using).
> >
> > <?xml version="1.0"
encoding="UTF-8"?>
> > <book lang="en">
> > <bookinfo>
> > <title>Foobar Doc</title>
> > </bookinfo>
> > <chapter id="foobar">
> > <title>Sub Title of
Foobar</title>
> > <para>Blah blah</para>
> > </chapter>
> > </book>
> >
> >
> > I get an undefined object when I call
$doc->getElementById('foobar');
> >
> > There is an explanation here, but I cannot make
sense of it:
> > http://cpan.uwinnipeg.ca/htdocs/XML-L
ibXML/Document.pm.html#strong_getEleme
> >ntById_strong
>
> pitty, the explanation is correct.
>
> > Also a thread here, which discusses an
altternative method (that I am
> > not wanting to use).
> > http://www.p
erlmonks.org/?node_id=516988
> >
>
> The DOM spec says it explicitly:
>
> Note: Attributes with the name "ID" or
"id" are not of type ID unless so
> defined.
>
> To make the "hashing" work and to be able to
lookup nodes using
> getElementById, you have to use a DTD which declares
the type of this
> particular attribute as ID type. For a DocBook document
you would add
> something like
>
> <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook
XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/do
cbookx.dtd">
>
> to the preamble of your document. See
> http://www.docbook.org/tdg/en/html/appb.html#d0e286815
for details.
>
> Alternatively, you can use xml:id attributes, which are
reserved for this
> purpose by http://www.w3.org/TR/xm
l-id/. They are fully supported by libxml2,
> too.
>
> For attributes which are not of type ID, you may use
XPath lookups, as
> suggested in the first of those threads:
$doc->findnodes(qq(//*[ id="$id"]))
> but that is not as fast as getElementById because it is
not hashed.
>
> > Why can't we just use standard DOM functions on
valid XML?
>
> And why can't you just formulate your questions in a
civilized way? I mean
> pragmatically, without this scent of blaming (whom
actually?). This time
> problem is on your side, not in DOM, XML, or
XML::LibXML.
>
> -- Petr
>
Understood. I read it a few more times after blaming the
world
xml:id did not work with a 4.5 dtd...so I started looking
for 5.0beta
version that do support it, then I realized I should use xsd
or
perhaps relaxng, but I don't know if XML::LibXML will
validate against
those, I am unsure how to properly form the
<!DOCTYPE...> declaration
to point to the various schemas.
--
Anthony Ettinger
Ph: 408-656-2473
http://chovy.dynd
ns.org/resume.html
http://utuxia.com/consul
ting
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|
|
| Re: can't use DOM methods in XML::LibXML
when parsing from file... |
  Czech Republic |
2007-05-09 18:15:46 |
On Thursday 10 May 2007, Anthony Ettinger wrote:
> On 5/9/07, A. Pagaltzis <pagaltzis gmx.de> wrote:
> > Hi Anthony,
> >
> > * Anthony Ettinger <anthony chovy.com> [2007-05-09 23:00]:
> > > It still however ($doc->getElementById)
does not work on xml
> > > (specifically a docbook file I'm using).
> > >
> > > <?xml version="1.0"
encoding="UTF-8"?>
> > > <book lang="en">
> > > <bookinfo>
> > > <title>Foobar
Doc</title>
> > > </bookinfo>
> > > <chapter id="foobar">
> > > <title>Sub Title of
Foobar</title>
> > > <para>Blah blah</para>
> > > </chapter>
> > > </book>
> > >
> > > I get an undefined object when I call
$doc->getElementById('foobar');
> > >
> > > There is an explanation here, but I cannot
make sense of it:
> > > http://cpan.uwinnipeg.ca/htdocs/XML-LibXM
L/Document.pm.html#strong_getE
> > >lementById_strong
> >
> > The fact that the attribute is called `id` is
irrelevant to
> > whether it is an ID. It must be defined as an
`IDREF` attribute
> > in the DTD for the document. That means that
without finding a
> > DTD declaration and parsing the DTD, the XML
parser does not know
> > which attributes are IDs. So as far as the parser
is concerned,
> > there is no element with the ID `foobar` in the
above document,
> > and thus you get undef. Everything is working as
expected.
> >
> > If using the DTD is not an option in your case for
whatever
> > reason, you could duplicate the ID in a redundant
`xml:id`
> > attribute:
> >
> > <chapter id="foobar"
xml:id="foobar">
> >
> > Since `xml:id` is predefined as an IDREF, XML
parsers that
> > conform to the xml:id spec not need a DTD to know
that the
> > element above has the ID `foobar`.
> >
> > However, I am not sure what reaction a parser will
have if it has
> > read the DTD and sees that the same ID is declared
on the same
> > element in two different ways.
if both DTD-declared id and xml:id are used with the same
value, libxml2 will
quite justly complain about a duplicated ID.
> Ok, so if I validate, I can use
'id="foobar"', if I don't validate, I
> have to use 'xml:id="foobar"'
>
> Now, supposing (and this may be OT) I want to use xml
schema or
> relaxNG to validate instead of a DTD, what is the
proper doctype to
> put in my xml docbook file?
Find and read the DocBook Definitive Guide. It's on the web
and I suspect this
is discussed there.
But first of all, no DOCTYPE - those are only for DTDs.
For XML Schema see http://www.w3.org/
TR/xmlschema-1/.
For RelaxNG there is no standard way to associate a XML
document with a RNG
schema. There is reason behind it - I think basically it's
that many people
around RelaxNG consider it a bad design. Some thing along
the lines: I claim
I'm a pope, because I have a certificate for it that I wrote
by myself, will
you trust me? But I'm probably oversimplifying the matter;
if interested look
it up for yourself.
Finally, LibXML's APIs for validating against Schema are
RelaxNG are
XML::LibXML::Schema and XML::LibXML::RelaxNG and both
require that you tell
them the location of your schema explicitly(!), so
associating it with the
document won't help. See the docs.
-- Petr
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|
|
| Re: can't use DOM methods in XML::LibXML
when parsing from file... |
  Czech Republic |
2007-05-09 13:54:43 |
On Wednesday 09 May 2007, Anthony Ettinger wrote:
> weird $doc->getElementById() doesn't work on
XML::LibXML objects
> parsed from a file.
>
> Can't locate object method "getElementById"
via package
> "XML::LibXML: ocument&
quot; at /UI/console/parts/help/list line 20
>
> 19: my $p = XML::LibXML->new();
> 20: my $doc =
$p->parse_html_file('/UI/console/parts/help/console');
> 21: my $help =
$doc->getElementById('help_speed');
> 22: my $xml_str = $help->toString;
>
> any ideas on that one?
>
> $doc->documentElement; returns the html object no
problem. also,
> getElementsByTagName does not work.
The same code works for me, XML::LibXML 1.63. My test:
perl html_id.pl http://www.perl.org
description
where html_id.pl:
#!/usr/bin/perl
use XML::LibXML;
my $p = XML::LibXML->new();
my $doc = $p->parse_html_file(shift);
my $help = $doc->getElementById(shift);
my $xml_str = $help->toString;
print $xml_str,"n";
___END__
Sadly, being too strict, the parser is unusable for
real-world HTML unless
$parser->recover(1) is used. It took me some minutes to
find a site whose
HTML libxml2 doesn't complain about for some reason or other
:-(
-- Petr
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|
|
| Re: can't use DOM methods in XML::LibXML
when parsing from file... |

|
2007-05-09 15:56:21 |
On 5/9/07, Anthony Ettinger <anthony chovy.com> wrote:
> On 5/9/07, Anthony Ettinger <anthony chovy.com> wrote:
> > On 5/9/07, Petr Pajas <pajas ufal.mff.cuni.cz> wrote:
> > > On Wednesday 09 May 2007, Anthony Ettinger
wrote:
> > > > weird $doc->getElementById() doesn't
work on XML::LibXML objects
> > > > parsed from a file.
> > > >
> > > > Can't locate object method
"getElementById" via package
> > > > "XML::LibXML: ocument&
quot; at /UI/console/parts/help/list line 20
> > > >
> > > > 19: my $p =
XML::LibXML->new();
> > > > 20: my $doc =
$p->parse_html_file('/UI/console/parts/help/console');
> > > > 21: my $help =
$doc->getElementById('help_speed');
> > > > 22: my $xml_str =
$help->toString;
> > > >
> > > > any ideas on that one?
> > > >
> > > > $doc->documentElement; returns the
html object no problem. also,
> > > > getElementsByTagName does not work.
> > >
> > > The same code works for me, XML::LibXML 1.63.
My test:
> > >
> > > perl html_id.pl http://www.perl.org
description
> > >
> > > where html_id.pl:
> > >
> > > #!/usr/bin/perl
> > > use XML::LibXML;
> > > my $p = XML::LibXML->new();
> > > my $doc = $p->parse_html_file(shift);
> > > my $help = $doc->getElementById(shift);
> > > my $xml_str = $help->toString;
> > > print $xml_str,"n";
> > > ___END__
> > >
> > > Sadly, being too strict, the parser is
unusable for real-world HTML unless
> > > $parser->recover(1) is used. It took me
some minutes to find a site whose
> > > HTML libxml2 doesn't complain about for some
reason or other :-(
> > >
> > > -- Petr
> > >
> >
> > I'm using 1.58 of XML::LibXML, trying to upgrade
now, but that is
> > introducing a whole new set of problems.
> >
>
> after much pain, crying, and anguish in #perl -- and
upgrading to
> 1.63, it now works.
> --
It still however ($doc->getElementById) does not work on
xml
(specifically a docbook file I'm using).
<?xml version="1.0"
encoding="UTF-8"?>
<book lang="en">
<bookinfo>
<title>Foobar Doc</title>
</bookinfo>
<chapter id="foobar">
<title>Sub Title of Foobar</title>
<para>Blah blah</para>
</chapter>
</book>
I get an undefined object when I call
$doc->getElementById('foobar');
There is an explanation here, but I cannot make sense of
it:
http://cpan.uwinnipeg.ca
/htdocs/XML-LibXML/Document.pm.html#strong_getElementById_st
rong
Also a thread here, which discusses an altternative method
(that I am
not wanting to use).
http://www.p
erlmonks.org/?node_id=516988
Why can't we just use standard DOM functions on valid XML?
Anthony Ettinger
Ph: 408-656-2473
http://chovy.dynd
ns.org/resume.html
http://utuxia.com/consul
ting
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|
|
[1-6]
|
|