List Info

Thread: '' question




'' question
user name
2006-09-06 13:50:46
... hi all,

just a question about the '' character.

My application parses some xml files using the
xmlParseFile() API.
This API gives an error if the file has the following
content:
<content>Asl&#x10;URP</content>

What I have to do to parse files like that?

TIA

-- Stefano
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
'&#x10;' question
user name
2006-09-06 14:01:22
Marchese Stefano wrote:
> ... hi all,
> 
> just a question about the '&#x10;' character.
> 
> My application parses some xml files using the
xmlParseFile() API.
> This API gives an error if the file has the following
content:
> <content>Asl&#x10;URP</content>
> 
> What I have to do to parse files like that?

The XML standard defines a character as

 Char ::= #x9 | #xA | #xD |
          [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

(http://www.w3.org/
TR/xml/#charsets)

As such the entity corresponding to codepoint 0x10 is not a
valid
character according to the XML standard, and a conforming
parser will
not allow it in a document.

So it seems the content is binary, in which case it should
either be
encoded in some way (base64 for example), or not be in XML
at all (XML
is not a binary transport).

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
'&#x10;' question
user name
2006-09-06 17:43:07
On Wed, 2006-09-06 at 15:50 +0200, Marchese Stefano wrote:

> My application parses some xml files using the
xmlParseFile() API.
> This API gives an error if the file has the following
content:
> <content>Asl&#x10;URP</content>

As indeed it should, character 0x10 (hexadecimal, ie.
decimal 16,
i.e. ASCII DLE, Data Link Escape, control-P) is not legal in
XML 1.0
documents.

You can use XML 1.1 if your tools support it, but it's more
likely
an error in the data.  Maybe it's intended to be a newline,
which
would be &#10; instead, or in hexadecmial &#xa;, or
maybe you have
a character set problem and it's supposed to be some
accented
character, in which case you need to convert to UTF-8 (for
example)
*before* escaping non-ascii characters.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/Peop
le/Quin/
Pictures from old books: http://fromoldbooks.org/
Liam on the Web: http://www.holoweb.net/
~liam/

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xmlgnome.org
http://mai
l.gnome.org/mailman/listinfo/xml
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )