List Info

Thread: lxml iterparse and comments




lxml iterparse and comments
user name
2008-03-23 22:56:59
Hello,

I am probably mising something elementary (I am new
to both xml and lxml), but I am having problems figuring 
out how to get comments when using lxml's iterparse().  
When I parse xml with parse() and iterate though the 
result, I get the comments.  But when I try to do the
same thing (approximately I think) with iterparse, 
I don't see any comments.  See example code below.  
(lxml-2.02, Python-2.5.1)

(I was using the standard Python ElementTree but my 
understanding is that it doesn't save comments at all.  
If that's wrong I would go back to using it).

The real file is ~50MB and has about 1M nodes under the 
root so I have to use iterparse and I also have to process 
comments, so I would really appreciate a clue about how 
to do it.  Thanks.

Example code:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
import lxml.etree as ET
from cStringIO import StringIO

# XML data...
#=============================================
xmltxt = 
'''<?xml version="1.0"
encoding="UTF-8"?>
<!-- Rev 1.06 
-->
<!DOCTYPE Test [
<!ELEMENT Test (entry*)>
<!--                                                     
             -->
<!ELEMENT entry ANY>
	<!-- Description of <entry> element.
	-->
]>
<!-- File created: 2008-02-27 -->
<Test>
<!--  Chronosynclastic Infindibulum Listing -->
<entry>text 1</entry>
<!-- Deleted:  A1500477 -->
<entry>text 2</entry>
</Test>'''
#=============================================

print 'Parse:n------'
et = ET.parse( StringIO (xmltxt))
for elem in et.iter():
    print elem

print 'nIterparse:n----------'
xx = ET.iterparse( StringIO (xmltxt),
("start","end"))
for event, elem in iter(xx):
    print event, elem

_______________________________________________
XML-SIG maillist  -  XML-SIGpython.org
http:
//mail.python.org/mailman/listinfo/xml-sig

Re: lxml iterparse and comments
user name
2008-03-24 02:33:53
Hi,

Stuart McGraw wrote:
> I am probably mising something elementary (I am new
> to both xml and lxml), but I am having problems
figuring 
> out how to get comments when using lxml's iterparse(). 

> When I parse xml with parse() and iterate though the 
> result, I get the comments.  But when I try to do the
> same thing (approximately I think) with iterparse, 
> I don't see any comments.

While the comments end up in the tree that iterparse
generates, they do not
show up in the events. Now that you mention it, I actually
think that should
change. There should be events "comment" and
"pi" that yield them if requested.


> I was using the standard Python ElementTree but my 
> understanding is that it doesn't save comments at all.

ElementTree strips comments in the parser, that's right.


> The real file is ~50MB and has about 1M nodes under the

> root so I have to use iterparse and I also have to
process 
> comments, so I would really appreciate a clue about how

> to do it.  Thanks.

Have you tried the parser target interface? It's a SAX-like
interface that
uses callbacks.

http://codespeak.net/lxml/parsing.html#the-tar
get-parser-interface
http://effbot.org/elementtree/elemen
ttree-xmlparser.htm#the-target-interface

Stefan
_______________________________________________
XML-SIG maillist  -  XML-SIGpython.org
http:
//mail.python.org/mailman/listinfo/xml-sig

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )