List Info

Thread: How to create a UTF-8 XML document (in-memory)




How to create a UTF-8 XML document (in-memory)
user name
2006-12-19 08:01:24
Hello World, hello Daniel,

I'm creating a XML document using

xmlProg = xmlNewDoc("1.0");

Then I add nodes and subnodes, finally I'll dump the
complete document to a buffer:

xmlDocDumpMemory(xmlProg, &xmlStr, &xmlStrLen,
"UTF-8");


My problem: I have node values containing umlauts (for
example: "Früchte"). Although I specify
"UTF-8" as encoding and altough I use
xmlEncodeSpecialChars(xmlProg, "Früchte"), at the
time I use it, the encoding is not yet specified and if I
write the buffer to a file, the BOM is written, but the
actual encoding is cp1252 (I'm woring on windows). And if I
try to read the document again, libxml2 complains that the
document is not UTF-8, which is correct (the "ü"
in "Früchte" has a value with bit 8 set)

I know that the internal encoding is UTF-8, but how do I
tell that my XML document and how do I have to convert the
characters/strings to make it correct?

Used library: libxml2 2.6.27

Best Regards
	Andreas
-- 
Andreas Tscharner                      andreas.tscharnermetromec.ch
------------------------------------------------------------
--------
"You take the blue pill and the story ends. You wake in
your bed and
 believe whatever you want to believe. You take the red pill
and you
 stay in Wonderland and I show you how deep the rabbit-hole
goes."
                                               -- Morpheus
in Matrix
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
How to create a UTF-8 XML document (in-memory)
user name
2006-12-19 09:51:09
On Tue, Dec 19, 2006 at 09:01:24AM +0100, Andreas Tscharner
wrote:
> xmlDocDumpMemory(xmlProg, &xmlStr, &xmlStrLen,
"UTF-8");
> 
> 
> My problem: I have node values containing umlauts (for
example: "Früchte"). Although I specify
"UTF-8" as encoding and altough I use
xmlEncodeSpecialChars(xmlProg, "Früchte"), at the
time I use it, the encoding is not yet specified and if I
write the buffer to a file, the BOM is written, but the
actual encoding is cp1252 (I'm woring on windows). And if I
try to read the document again, libxml2 complains that the
document is not UTF-8, which is correct (the "ü"
in "Früchte" has a value with bit 8 set)

When you build the tree you *must* pass UTF-8 strings each
time there
is an xmlChar * argument. libxml2 will not try to guess what
you passed to
it, nor will it try to do on the fly checking or conversion.
If you pass an cp1252 encoded string when an xmlChar *
argument is required
you break the API and this may lead to this kind of
problems. You must
convert the strings passed though the tree building APIs
yourself.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/v
irtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillardredhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ |
Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )