List Info

Thread: xmlTextReader performance question




xmlTextReader performance question
user name
2007-01-17 12:21:19
I am switching to the reader API from XPath to improve 
performance. A lot of time is spent on processing a huge 
list of similar elements. The attributes of each element are

what I am after. I notice a significant performance gain 
when I replace xmlTextReaderGetAttribute to a sequence of

MoveToAttribute
GetConstValue
MoveToElement

probably because there are very few possible values for the

attributes in all my test documents. I understand it is very

difficult to answer performance tradeoff questions and that

could change in the future. I just hope someone can tell me

a bit more what is happening here. Is there a hash/set for 
these 'const xmlChar *' strings so that allocation and 
deallocation are minimized? Or is it block allocation? If I

know every attribute value is unique, would anyone recommend

not using the MoveToAttribute approach? My program is 
multi-threaded and I really want to minimize 
allocation/deallocation.

I have a separate question. Can I save some 
allocation/deallocation if I use xmlReaderForMemory instead

of xmlReaderForFile? I am thinking of memory mapping the 
whole file.

And thanks so much for libxml2.

Russell
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml

Re: xmlTextReader performance question
user name
2007-01-17 12:50:00
On Wed, Jan 17, 2007 at 01:21:19PM -0500, Russell Mok
wrote:
> I am switching to the reader API from XPath to improve

> performance.

  Hum, you compare apples to oranges here I hope you're
aware of that.

> A lot of time is spent on processing a huge 
> list of similar elements. The attributes of each
element are 
> what I am after. I notice a significant performance
gain 
> when I replace xmlTextReaderGetAttribute to a sequence
of
> 
> MoveToAttribute
> GetConstValue
> MoveToElement
> 
> probably because there are very few possible values for
the 
> attributes in all my test documents. I understand it is
very 
> difficult to answer performance tradeoff questions and
that 
> could change in the future. I just hope someone can
tell me 
> a bit more what is happening here. Is there a hash/set
for 
> these 'const xmlChar *' strings so that allocation and

> deallocation are minimized?

 depends, for strings coming from markup, yes. For strings
coming fron content, no, because content is not bounded
and we don't want to stick to constant size.

> I have a separate question. Can I save some 
> allocation/deallocation if I use xmlReaderForMemory
instead 
> of xmlReaderForFile? I am thinking of memory mapping
the 
> whole file.

  I guess that will be lost in the mass of existing
allocations
needed for the reader.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )