List Info

Thread: Re: Problems using XML::LibXML and XML::LibXSLT; getting, corrupted output.




Re: Problems using XML::LibXML and XML::LibXSLT; getting, corrupted output.
country flaguser name
United States
2007-10-13 19:54:40
Richard,

Thanks for the suggestions, I realy appreciate your help. I
am aware of 
the issue with using XSLT to remove the root. The documents
I'm getting 
should never have more than 1 element under the root, so
that shouldn't 
be a problem. The reason I want to use XSLT is so I can
adjust to any 
changes in the source format, which I have no control over,
without 
changing the code. It's not an idea that I'm married to,
since the 
complexity of XSLT make this approach of questionable value,
but I would 
like to see if I can get it to work so I have the option.

I've been experimenting with some of the ideas you gave me
and some I 
came up with one of my own. I don't really understand all of
the results.

First I used basically the same script, but add an
importNode statement, 
as yous suggested, replacing:
     $root->appendChild(get_doc($_));
with:
     my $node = get_doc($_);
     $doc->importNode($node);
     $root->appendChild($node);

It doesn't seem to make any difference, it still produces
garbage 
output. In reading the documentation, appendChild says
"if the new node 
is not part of the document, the node will be imported
first" so I'm not 
sure the importNode is really needed here. As I said in my
original 
post, if I eliminate the XSLT, it works fine. So I think the
real reason 
your script works and mine doesn't, is the lack of XSLT in
yours. I 
tried removing the importNode statement from the script you
provided, 
and it still works fine.

I have found 2 approaches that do seem to work. The first is
to change 
the line above to the following:
     my $node = get_doc($_);
     my $import = $doc->importNode($node);
     $root->appendChild($import);

I'm not sure why this works. The documentation doesn't even
indicate 
what importNode returns, so I'm not sure I'd want to rely on
this. The 
same approach does not work with adoptNode.

The 2nd approach I found that also seems to work is to
replace:
     return $element;
At the end of sub get_doc, with:
     return $element->cloneNode(1);

Although this is a little less mysterious, I'm still don't
understand 
why it works, but it doesn't work without the cloneNode.

So I have 2 solutions, but I'm still not sure I understand
what's going 
on. As the saying goes, I'm confused, but at a (slightly)
higher level.

If anybody can shed any insight on any of this, I would
appreciate it. 
Maybe someday it will all make sense to me.

-Tim Fletcher

> Tim,
> 
> There are two issues with your current code:
> 
>    1. When taking XML nodes from one XML document and
placing them in
>       another, the latter must explicitly either
"import" or "adopt"
>       each node, depending on whether the nodes should
be removed from
>       the original doc or not, respectively (see
documentation of
>       importNode and adoptNode for XML::LibXML:ocument
object).
>    2. If a source file has more than one node under the
root, it will
>       cause problems when you try to use XSLT to remove
the root element
>       and append the results to the new document
because the results
>       from the transform will not be a well-formed XML
document. 
>       Therefore, it would be better to avoid XSLT all
together and stick
>       with DOM+XPath.
> 
> Revised code:
> 
> #!/usr/bin/perl -w
> use strict;
> use XML::LibXML;
> use XML::LibXSLT;
> 
> my $parser = XML::LibXML->new();
> my $xslt = XML::LibXSLT->new();
> my $doc = XML::LibXML:ocument-
>new();
> my $root = $doc->createElement('newroot');
> $doc->setDocumentElement($root);
> foreach (qw(file1.xml file2.xml)) {
> foreach (get_toplevel_elems($_))
> {
> $doc->importNode($_);  # need to associate node w/
new doc
> $root->appendChild($_);
> }
> }
> print "** final result **n",
$doc->toString, "nn";
> 
> sub get_toplevel_elems {
> my $filename = shift;
> my $doc = $parser->parse_file($filename);
> my elements = $doc->documentElement()->childNodes;
  # gets all child 
> nodes, including comments, whitespace, etc.
> #my elements =
$doc->documentElement()->find("*")->get_n
odelist();   # 
> gets just the elements and converts the resulting
XML::LibXML::NodeList 
> to perl list
> return elements;
> }
> 
> HTH,
> Richard

> Tim Fletcher wrote:
>> > I?m trying to write a script using XML::LibXML
and XML::LibXSTL and I?m
>> > getting corrupted data output. I?ve stripped
my script down to the bare
>> > minimum to reproduce this problem. Basically
the stripped down script
>> > will take 2 XML files, use XSLT to remove the
root (and some other stuff
>> > in the final script), then merge them into a
single document under a
>> > different root. The parsing and transformation
will ultimately be done
>> > in a separate module, but not until I figure
out what I?m doing wrong.
>> >
>> > So if document 1 looks like:
>> > <?xml version="1.0"
encoding="UTF-8"?>
>> > <root>
>> > <element1>
>> > <element1a>text1a</element1a>
>> > <element1b>text1b</element1b>
>> > </element1>
>> > </root>
>> >
>> > And document 2 looks like:
>> > <?xml version="1.0"
encoding="UTF-8"?>
>> > <root>
>> > <element2>
>> > <element2a>text2a</element2a>
>> > </element2>
>> > </root>
>> >
>> > The result should be:
>> >
>> > <?xml version="1.0"
encoding="UTF-8"?>
>> > <new-root>
>> > <element1>
>> > <element1a>text1a</element1a>
>> > </element 1>
>> > <element 2>
>> > <element2a>text2a</element2a>
>> > <element2b>text2b</element2b>
>> > </element2>
>> > </new-root >
>> >
>> > Here?s the script.
>> >
>> > #!/usr/bin/perl -w
>> > use strict;
>> > use XML::LibXML;
>> > use XML::LibXSLT;
>> >
>> > my $parser = XML::LibXML->new();
>> > my $xslt = XML::LibXSLT->new();
>> > my $doc = XML::LibXML:ocument-
>new();
>> > my $root = $doc->createElement('newroot');
>> > $doc->setDocumentElement($root);
>> > foreach (qw(file1.xml file2.xml)) {
>> > $root->appendChild(get_doc($_));
>> > }
>> > print "** final result **n",
$doc->toString, "n";
>> >
>> > sub get_doc {
>> > my $filename = shift;
>> > my $template =
$xslt->parse_stylesheet_file('template.xsl');
>> > my $doc =
$template->transform_file($filename);
>> > my $element = $doc->documentElement();
>> > print "** transform results **n",
$element->toString(), "n";
>> > return $element;
>> > }
>> >
>> > The result of the transform_file is what I
expect, but the final
>> > document is often not even well formed XML,
and sometimes even includes
>> > little smiley faces, and other non-text
characters. In some cases I?ve
>> > seen bits of strings from the template file,
which makes me think I'm
>> > doing something wrong with LibXSLT.
>> >
>> > If, instead of transforming the file with
XSLT, I just parse it, get the
>> > root element and return that, then everything
seems to work fine, Which
>> > again points to LibXSLT.
>> >
>> > If anybody could tell me what I?m doing wrong,
I would really appreciate
>> > it. Also since I?m relatively new to the XML
modules, any specific
>> > suggestions for fixes would also be
appreciated.
>> >
>> > TIA
>> >
>> > -Tim Fletcher
>> >
>> > Just in case it matters, here is the
template.
>> > <?xml version="1.0"
encoding="UTF-8"?>
>> > <xsl:stylesheet version="1.0"
>> > xmlnssl=&q
uot;http:/
/www.w3.org/1999/XSL/Transform">
>> > <xsl:output method="xml"/>
>> > <xsl:template match="root">
>> > <xsl:apply-templates
select="*"/>
>> > </xsl:template>
>> > <xsl:template match="*">
>> > <xsl:copy-of select="."/>
>> > </xsl:template>
>> > </xsl:stylesheet

_______________________________________________
Perl-XML mailing list
Perl-XMLlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs

Re: Problems using XML::LibXML and XML::LibXSLT; getting, corrupted output.
country flaguser name
Czech Republic
2007-10-14 03:35:06
On Sunday 14 October 2007 02:54:40 Tim Fletcher wrote:
> I have found 2 approaches that do seem to work. The
first is to change
> the line above to the following:
>      my $node = get_doc($_);
>      my $import = $doc->importNode($node);
>      $root->appendChild($import);
>
> I'm not sure why this works. The documentation doesn't
even indicate
> what importNode returns, so I'm not sure I'd want to
rely on this. The
I'm pretty sure importNode returns the imported node - that
is, a copy of its 
argument whose document is the method's object. Certainly my
module uses 
importNode that way and seems to work OK. I vaguely remember
not 
understanding importNode semantics either - I think I had to
look at the W3C 
DOM spec to get how the XML::LibXML DOM works...

	Bye
		Vasek
--
Open Source consulting
http://www.mangrove.cz
_______________________________________________
Perl-XML mailing list
Perl-XMLlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )