List Info

Thread: Problems using XML::LibXML and XML::LibXSLT; getting corrupted output.




Problems using XML::LibXML and XML::LibXSLT; getting corrupted output.
user name
2007-10-12 17:33:54
I’m trying to write a script using XML::LibXML and
XML::LibXSTL and I’m
getting corrupted data output. I’ve stripped my script down
to the bare
minimum to reproduce this problem. Basically the stripped
down script
will take 2 XML files, use XSLT to remove the root (and some
other stuff
in the final script), then merge them into a single document
under a
different root. The parsing and transformation will
ultimately be done
in a separate module, but not until I figure out what I’m
doing wrong.

So if document 1 looks like:
<?xml version="1.0"
encoding="UTF-8"?>
<root>
<element1>
<element1a>text1a</element1a>
<element1b>text1b</element1b>
</element1>
</root>

And document 2 looks like:
<?xml version="1.0"
encoding="UTF-8"?>
<root>
<element2>
<element2a>text2a</element2a>
</element2>
</root>

The result should be:

<?xml version="1.0"
encoding="UTF-8"?>
<new-root>
<element1>
<element1a>text1a</element1a>
</element 1>
<element 2>
<element2a>text2a</element2a>
<element2b>text2b</element2b>
</element2>
</new-root >

Here’s the script.

#!/usr/bin/perl -w
use strict;
use XML::LibXML;
use XML::LibXSLT;

my $parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
my $doc = XML::LibXML:ocument-
>new();
my $root = $doc->createElement('newroot');
$doc->setDocumentElement($root);
foreach (qw(file1.xml file2.xml)) {
$root->appendChild(get_doc($_));
}
print "** final result **n", $doc->toString,
"n";

sub get_doc {
my $filename = shift;
my $template =
$xslt->parse_stylesheet_file('template.xsl');
my $doc = $template->transform_file($filename);
my $element = $doc->documentElement();
print "** transform results **n",
$element->toString(), "n";
return $element;
}

The result of the transform_file is what I expect, but the
final
document is often not even well formed XML, and sometimes
even includes
little smiley faces, and other non-text characters. In some
cases I’ve
seen bits of strings from the template file, which makes me
think I'm
doing something wrong with LibXSLT.

If, instead of transforming the file with XSLT, I just parse
it, get the
root element and return that, then everything seems to work
fine, Which
again points to LibXSLT.

If anybody could tell me what I’m doing wrong, I would
really appreciate
it. Also since I’m relatively new to the XML modules, any
specific
suggestions for fixes would also be appreciated.

TIA

-Tim Fletcher

Just in case it matters, here is the template.
<?xml version="1.0"
encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlnssl=&q
uot;http:/
/www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="root">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>


_______________________________________________
Perl-XML mailing list
Perl-XMLlistserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs

Re: Problems using XML::LibXML and XML::LibXSLT; getting corrupted output.
user name
2007-10-12 19:03:18
Tim,

There are two issues with your current code:
  1. When taking XML nodes from one XML document and placing them in another, the latter must explicitly either "import" or "adopt" each node, depending on whether the nodes should be removed from the original doc or not, respectively (see documentation of importNode and adoptNode for XML::LibXML:ocument object).
  2. If a source file has more than one node under the root, it will cause problems when you try to use XSLT to remove the root element and append the results to the new document because the results from the transform will not be a well-formed XML document.  Therefore, it would be better to avoid XSLT all together and stick with DOM+XPath.
Revised code:

#!/usr/bin/perl -w
use strict;
use XML::LibXML;
use XML::LibXSLT;

my $parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
my $doc = XML::LibXML:ocument-&gt;new();
my $root = $doc->createElement('newroot');
$doc-&gt;setDocumentElement($root);
foreach (qw(file1.xml file2.xml)) {
foreach (get_toplevel_elems($_))
{
$doc->importNode($_);  # need to associate node w/ new doc
$root->appendChild($_);
}
}
print "** final result **n", $doc->toString, "nn";

sub get_toplevel_elems {
my $filename = shift;
my $doc = $parser-&gt;parse_file($filename);
my elements = $doc->documentElement()->childNodes;   # gets all child nodes, including comments, whitespace, etc.
#my elements = $doc-&gt;documentElement()-&gt;find("*")->get_nodelist();   # gets just the elements and converts the resulting XML::LibXML::NodeList to perl list
return elements;
}

HTH,
Richard


Tim Fletcher wrote:
tim.fletchmail.net" type="cite">
I’m trying to write a script using XML::LibXML and XML::LibXSTL and I’m
getting corrupted data output. I’ve stripped my script down to the bare
minimum to reproduce this problem. Basically the stripped down script
will take 2 XML files, use XSLT to remove the root (and some other stuff
in the final script), then merge them into a single document under a
different root. The parsing and transformation will ultimately be done
in a separate module, but not until I figure out what I’m doing wrong.

So if document 1 looks like:
<;?xml version="1.0" encoding="UTF-8"?>
<root&gt;
<element1>

<element1a>text1a</element1a>
<element1b>text1b</element1b>
&lt;/element1>
</root>

And document 2 looks like:
<;?xml version="1.0" encoding="UTF-8"?>
<root&gt;
<element2>

<element2a>text2a</element2a>
</element2>
<;/root>


The result should be:

<;?xml version="1.0" encoding="UTF-8"?>
<new-root>
&lt;element1&gt;
<element1a>text1a</element1a&gt;
</element 1>
<;element 2>
<;element2a&gt;text2a&lt;/element2a>
<element2b&gt;text2b<;/element2b>
</element2>;
</new-root >

Here’s the script.


#!/usr/bin/perl -w
use strict;
use XML::LibXML;
use XML::LibXSLT;

my $parser = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
my $doc = XML::LibXML:ocument-&gt;new();
my $root = $doc->createElement('newroot');
$doc-&gt;setDocumentElement($root);
foreach (qw(file1.xml file2.xml)) {
$root-&gt;appendChild(get_doc($_));
}
print "** final result **n", $doc->toString, "n";

sub get_doc {
my $filename = shift;
my $template = $xslt->parse_stylesheet_file('template.xsl');
my $doc = $template->transform_file($filename);
my $element = $doc->documentElement();
print "** transform results **n", $element-&gt;toString(), "n";
return $element;

}

The result of the transform_file is what I expect, but the final
document is often not even well formed XML, and sometimes even includes
little smiley faces, and other non-text characters. In some cases I’ve
seen bits of strings from the template file, which makes me think I'm
doing something wrong with LibXSLT.

If, instead of transforming the file with XSLT, I just parse it, get the
root element and return that, then everything seems to work fine, Which
again points to LibXSLT.

If anybody could tell me what I’m doing wrong, I would really appreciate
it. Also since I’m relatively new to the XML modules, any specific
suggestions for fixes would also be appreciated.

TIA

-Tim Fletcher

Just in case it matters, here is the template.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlnssl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
&lt;xsl:template match="root">
<;xsl:apply-templates select="*"/>
</xsl:template>
<;xsl:template match="*"&gt;
<xsl:copy-of select="."/>
</xsl:template>
<;/xsl:stylesheet>


_______________________________________________
Perl-XML mailing list
listserv.ActiveState.com">Perl-XMLlistserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


  

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )