Hi Will,
See comments below.
"Will Sappington" <wsappington ndma.us> writes:
> We provide an
> archival/retrieval system for medical records and
images and we use XML
> for attaching metadata to the files we store. We have
some front-end UI
> components that make some use of XML but currently most
of the work is
> done in the transport layer and the backend database
components. Due to
> the volume of data involved, efficiency and execution
speed is a prime
> concern, though not necessarily an overriding one.
Most of the XML work
> being done now is with roll-your-own string processing.
Going forward
> we will need to be more sophisticated and
standards-compliant.
>
> Of the packages that turned up when I did a search,
Xerces and libxml
> are the leading candidates. I've downloaded,
installed, built, and
> written test code for both and based on my findings,
I'm leaning very
> heavily toward recommending libxml. The person I
report to has a very
> strong bias toward Xerces in general, and the W3C DOM
standard in
> particular, as the hammer with which to pound all
nails, even if the
> problem isn't a nail.
If the Xerces guy wins , you may
want to consider using data binding
on top of Xerces that will hide all (or most) of the details
of dealing
with XML. From the description above it appear that your
application is
data-centric (as opposed to document-centric) so the XML
data binding
approach should work nicely for you. One such data binding
tool is
CodeSynthesis XSD (full disclosure: I am involved with the
project).
It is open-source and supports a wide range of platforms and
compilers:
http://www
.codesynthesis.com/products/xsd/
> * (I may be mistaken about this, but...) for character
encodings
> libxml uses a standard library (iconv) that is
distributed with most
> versions of Linux and Unix (and has been ported to
Win32), Xerces uses
> its own internal routines (?).
Yes, you are mistaken here. Xerces-C++ has a built-in
support for a
small set of essential encodings (UTF-8/16, UCS-4, etc.). It
can also
be built to use external libraries for encoding. The
supported external
libraries are Iconv or ICU.
> And then this:
>
> "In cases where performance is critical, I think
you'd be best off
>
> avoiding XPath altogether. (snip) An optimal Xerces SAX
parser might
> well be more efficient than
>
> libxml parsing + XPath evaluation."
XPath is slow because it is an interpretive language. It is
always
more efficient to hand-code critical queries in a compiled
language
such as C or C++. XML data binding has a big advantage here
since
you can implement your queries using the standard C++
algorithms
which will allow you to maintain both sanity and speed.
> I'm unsure of the importance of an XML Schema validator
so I can't
> comment on this. I don't think I agree with the
comment about speed vis
> a vis UTF-8/16. Encoding conversions using UTF-8 are
more
> computationally intensive than UTF-16 so what you lose
by moving around
> double the number bytes would, I think be offset by the
greater CPU
> requirement for translating the data. Does Xerces' use
of UTF-16
> provide support for a wider range of encodings and
local languages?
The speedup comes from the simple fact that when your XML
instance is
UTF-8-encoded (as most XML instances are these days) and
your parser
uses UTF-8 encoding then you do not need to convert from one
encoding
to the other. You can just use the strings as is. On the
other hand,
if your parser uses UTF-16 then you will need to convert
every
character in the XML document from UTF-8 to UTF-16.
If you are interested in the XML parsing performance, you
may want to
read the "XSDBench XML Schema Benchmark 1.0.0
released" thread on the
xmlschema-dev mailing list:
http://lists.w3.org/Archives/Public/xmlschema-dev/2006
Oct/
Particularly this message:
http://lists.w3.org/Archives/Public/xmlschema
-dev/2006Oct/0061.html
hth,
-boris
--
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis
.com
Open-Source, Cross-Platform C++ XML Data Binding
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
|