|
List Info
Thread: problems reading xml file with com.sun.star.xml.dom.DocumentBuilder
|
|
| problems reading xml file with
com.sun.star.xml.dom.DocumentBuilder |

|
2006-09-21 13:42:55 |
I have a small problem, In starbasic I'm using (almost) the
following
code (there might be small mistakes sicne I'm writing this
from memory)
to read and parse an xml document with starbasic
oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
oInpStream = oSFA.openFileRead(sUrl)
oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
domDoc = oDB.parse(oInpStream)
oInpStream.closeInput
this works for me almost perfectly, and I say almost, since
there are
some xml documents that it cannot read.
the problem I am having is that some documents (that are
beeing
generated by a third party system which I cannot change)
have not declared that it is an xml document like this
<?xml version="1.0"
encoding="utf-8" ?>
it just starts with the xml tags directly liek this
<test>
<test2>
.....
</test2>
</test>
this is all fine, I have other xml documents that also look
liek this,
and Openoffice can read and parse them.
however within these problematic documents they are using
national
characters (åæø) encoded using iso-8859-1 and this is the
problem.
if they were encoded with utf-8 openoffice can read the
document without
having any ecoding declaration. but with iso-8859-1 the
oDB.parse
function just returns null. no errors/exceptions or
anything, just null.
if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
?> at the start, openoffice can read it perfectly..
so is there some way I can force the dom parser to use
iso-8859-1
instead of utf-8 ?
it would be great if I could do
domDoc = oDB.parse(oInpStream, "iso-8859-1")
and it would work, but from what I can see there is no
function for this
in the DocumentBuilder, not is there anything like this in
the
inputstream object or the simplefileaccess object.
I should be able to get around this problem by
programmaticly make a
copy of the file, and insert the <?... part first and
then use my
modified file for reading the xml file, but this is only a
last resort
sollution.
--
Christian Andersson - ca ofs.no
Configuration and Collaboration for OpenOffice.org
Open Framework Systems AS http://www.ofs.no
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
| problems reading xml file with
com.sun.star.xml.dom.DocumentBuilder |

|
2006-09-21 21:30:55 |
Kjære Christian,
for meg følgende code virker:
oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
oInpStream = oSFA.openFileRead(sUrl)
oTextInpStream =
createUnoService("com.sun.star.io.TextInputStream&quo
t;)
oTextInpStream.setInputStream(oInpStream)
oTextInpStream.setEncoding("iso-8859-1")
oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
domDoc = oDB.parse(oTextInpStream)
oInpStream.closeInput
Sorry for my bad Norvegian but It's long ago, I've been
there.
To the code:
You have to use a TextInputStream to be able to set the
encoding.
Hope it helps.
Ha det bra,
Christoph
Christian Andersson wrote:
> I have a small problem, In starbasic I'm using
(almost) the following
> code (there might be small mistakes sicne I'm writing
this from memory)
> to read and parse an xml document with starbasic
>
> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
> oInpStream = oSFA.openFileRead(sUrl)
> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
> domDoc = oDB.parse(oInpStream)
> oInpStream.closeInput
>
> this works for me almost perfectly, and I say almost,
since there are
> some xml documents that it cannot read.
>
> the problem I am having is that some documents (that
are beeing
> generated by a third party system which I cannot
change)
>
> have not declared that it is an xml document like this
> <?xml version="1.0"
encoding="utf-8" ?>
>
> it just starts with the xml tags directly liek this
>
> <test>
> <test2>
> .....
> </test2>
> </test>
>
> this is all fine, I have other xml documents that also
look liek this,
> and Openoffice can read and parse them.
> however within these problematic documents they are
using national
> characters (åæø) encoded using iso-8859-1 and this
is the problem.
> if they were encoded with utf-8 openoffice can read the
document without
> having any ecoding declaration. but with iso-8859-1 the
oDB.parse
> function just returns null. no errors/exceptions or
anything, just null.
>
> if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
> ?> at the start, openoffice can read it perfectly..
>
> so is there some way I can force the dom parser to use
iso-8859-1
> instead of utf-8 ?
> it would be great if I could do
> domDoc = oDB.parse(oInpStream,
"iso-8859-1")
> and it would work, but from what I can see there is no
function for this
> in the DocumentBuilder, not is there anything like this
in the
> inputstream object or the simplefileaccess object.
>
> I should be able to get around this problem by
programmaticly make a
> copy of the file, and insert the <?... part first
and then use my
> modified file for reading the xml file, but this is
only a last resort
> sollution.
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
| problems reading xml file with c |

|
2006-09-21 22:05:05 |
One thing I forgot to mention:
Be aware that your code uses the
com.sun.star.xml.dom.XDocumentBuilder
Interface
and that is signed 'unpublished' in the IDL reference.
Which means you can't rely on it in future versions.
It might be changed or replaced or something like that.
Christian Andersson wrote:
> I have a small problem, In starbasic I'm using
(almost) the following
> code (there might be small mistakes sicne I'm writing
this from memory)
> to read and parse an xml document with starbasic
>
> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
> oInpStream = oSFA.openFileRead(sUrl)
> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
> domDoc = oDB.parse(oInpStream)
> oInpStream.closeInput
>
> this works for me almost perfectly, and I say almost,
since there are
> some xml documents that it cannot read.
>
> the problem I am having is that some documents (that
are beeing
> generated by a third party system which I cannot
change)
>
> have not declared that it is an xml document like this
> <?xml version="1.0"
encoding="utf-8" ?>
>
> it just starts with the xml tags directly liek this
>
> <test>
> <test2>
> .....
> </test2>
> </test>
>
> this is all fine, I have other xml documents that also
look liek this,
> and Openoffice can read and parse them.
> however within these problematic documents they are
using national
> characters (åæø) encoded using iso-8859-1 and this
is the problem.
> if they were encoded with utf-8 openoffice can read the
document without
> having any ecoding declaration. but with iso-8859-1 the
oDB.parse
> function just returns null. no errors/exceptions or
anything, just null.
>
> if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
> ?> at the start, openoffice can read it perfectly..
>
> so is there some way I can force the dom parser to use
iso-8859-1
> instead of utf-8 ?
> it would be great if I could do
> domDoc = oDB.parse(oInpStream,
"iso-8859-1")
> and it would work, but from what I can see there is no
function for this
> in the DocumentBuilder, not is there anything like this
in the
> inputstream object or the simplefileaccess object.
>
> I should be able to get around this problem by
programmaticly make a
> copy of the file, and insert the <?... part first
and then use my
> modified file for reading the xml file, but this is
only a last resort
> sollution.
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
| problems reading xml file with
com.sun.star.xml.dom.DocumentBuilder |

|
2006-09-22 07:18:34 |
thank you, I'll try that at once, and don't worry about
the norwegian,
I'm not good at it either.
Christoph Jopp wrote:
> Kjære Christian,
> for meg følgende code virker:
>
> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
> oInpStream = oSFA.openFileRead(sUrl)
> oTextInpStream =
createUnoService("com.sun.star.io.TextInputStream&quo
t;)
> oTextInpStream.setInputStream(oInpStream)
>
oTextInpStream.setEncoding("iso-8859-1")
> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
> domDoc = oDB.parse(oTextInpStream)
> oInpStream.closeInput
>
> Sorry for my bad Norvegian but It's long ago, I've
been there.
> To the code:
> You have to use a TextInputStream to be able to set the
encoding.
>
> Hope it helps.
> Ha det bra,
> Christoph
>
>
> Christian Andersson wrote:
>> I have a small problem, In starbasic I'm using
(almost) the following
>> code (there might be small mistakes sicne I'm
writing this from memory)
>> to read and parse an xml document with starbasic
>>
>> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
>> oInpStream = oSFA.openFileRead(sUrl)
>> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
>> domDoc = oDB.parse(oInpStream)
>> oInpStream.closeInput
>>
>> this works for me almost perfectly, and I say
almost, since there are
>> some xml documents that it cannot read.
>>
>> the problem I am having is that some documents
(that are beeing
>> generated by a third party system which I cannot
change)
>>
>> have not declared that it is an xml document like
this
>> <?xml version="1.0"
encoding="utf-8" ?>
>>
>> it just starts with the xml tags directly liek this
>>
>> <test>
>> <test2>
>> .....
>> </test2>
>> </test>
>>
>> this is all fine, I have other xml documents that
also look liek this,
>> and Openoffice can read and parse them.
>> however within these problematic documents they are
using national
>> characters (åæø) encoded using iso-8859-1 and
this is the problem.
>> if they were encoded with utf-8 openoffice can read
the document without
>> having any ecoding declaration. but with iso-8859-1
the oDB.parse
>> function just returns null. no errors/exceptions or
anything, just null.
>>
>> if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
>> ?> at the start, openoffice can read it
perfectly..
>>
>> so is there some way I can force the dom parser to
use iso-8859-1
>> instead of utf-8 ?
>> it would be great if I could do
>> domDoc = oDB.parse(oInpStream,
"iso-8859-1")
>> and it would work, but from what I can see there is
no function for this
>> in the DocumentBuilder, not is there anything like
this in the
>> inputstream object or the simplefileaccess object.
>>
>> I should be able to get around this problem by
programmaticly make a
>> copy of the file, and insert the <?... part
first and then use my
>> modified file for reading the xml file, but this is
only a last resort
>> sollution.
>>
>>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
> For additional commands, e-mail: dev-help api.openoffice.org
>
>
--
Christian Andersson - ca ofs.no
Configuration and Collaboration for OpenOffice.org
Open Framework Systems AS http://www.ofs.no
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
| problems reading xml file with
com.sun.star.xml.dom.DocumentBuilder |

|
2006-09-25 11:58:17 |
Hmm this is not working for me, I still get a null object
from oDB.parse...
what system do you test this on?
I am running this on windows 2003 server and openoffice 2.0
(I know that there is a way to get build number, but I keep
forgetting it)
Christoph Jopp wrote:
> Kjære Christian,
> for meg følgende code virker:
>
> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
> oInpStream = oSFA.openFileRead(sUrl)
> oTextInpStream =
createUnoService("com.sun.star.io.TextInputStream&quo
t;)
> oTextInpStream.setInputStream(oInpStream)
>
oTextInpStream.setEncoding("iso-8859-1")
> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
> domDoc = oDB.parse(oTextInpStream)
> oInpStream.closeInput
>
> Sorry for my bad Norvegian but It's long ago, I've
been there.
> To the code:
> You have to use a TextInputStream to be able to set the
encoding.
>
> Hope it helps.
> Ha det bra,
> Christoph
>
>
> Christian Andersson wrote:
>> I have a small problem, In starbasic I'm using
(almost) the following
>> code (there might be small mistakes sicne I'm
writing this from memory)
>> to read and parse an xml document with starbasic
>>
>> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
>> oInpStream = oSFA.openFileRead(sUrl)
>> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
>> domDoc = oDB.parse(oInpStream)
>> oInpStream.closeInput
>>
>> this works for me almost perfectly, and I say
almost, since there are
>> some xml documents that it cannot read.
>>
>> the problem I am having is that some documents
(that are beeing
>> generated by a third party system which I cannot
change)
>>
>> have not declared that it is an xml document like
this
>> <?xml version="1.0"
encoding="utf-8" ?>
>>
>> it just starts with the xml tags directly liek this
>>
>> <test>
>> <test2>
>> .....
>> </test2>
>> </test>
>>
>> this is all fine, I have other xml documents that
also look liek this,
>> and Openoffice can read and parse them.
>> however within these problematic documents they are
using national
>> characters (åæø) encoded using iso-8859-1 and
this is the problem.
>> if they were encoded with utf-8 openoffice can read
the document without
>> having any ecoding declaration. but with iso-8859-1
the oDB.parse
>> function just returns null. no errors/exceptions or
anything, just null.
>>
>> if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
>> ?> at the start, openoffice can read it
perfectly..
>>
>> so is there some way I can force the dom parser to
use iso-8859-1
>> instead of utf-8 ?
>> it would be great if I could do
>> domDoc = oDB.parse(oInpStream,
"iso-8859-1")
>> and it would work, but from what I can see there is
no function for this
>> in the DocumentBuilder, not is there anything like
this in the
>> inputstream object or the simplefileaccess object.
>>
>> I should be able to get around this problem by
programmaticly make a
>> copy of the file, and insert the <?... part
first and then use my
>> modified file for reading the xml file, but this is
only a last resort
>> sollution.
>>
>>
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
> For additional commands, e-mail: dev-help api.openoffice.org
>
>
--
Christian Andersson - ca ofs.no
Configuration and Collaboration for OpenOffice.org
Open Framework Systems AS http://www.ofs.no
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
| problems reading xml file with
com.sun.star.xml.dom.DocumentBuilder |

|
2006-09-25 13:23:13 |
The system I tested it was a Linux Machine so it
might be true that
there is a difference.
To check it on a Windows (XP) machine I have to wait until
the evening.
But what I found in the IDL reference might help:
They say they use the character encoding name according to
this
http:/
/www.iana.org/assignments/character-sets document.
So it might be a different wrighting and you could check
with some of
these possibilities I found there:
Name: ISO_8859-1:1987
[RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1
If something of it works tell me please. Otherwise I'll
check it today in the evening on my windows machine.
Christian Andersson wrote:
> Hmm this is not working for me, I still get a null
object from oDB.parse...
>
> what system do you test this on?
> I am running this on windows 2003 server and openoffice
2.0
> (I know that there is a way to get build number, but I
keep forgetting it)
>
> Christoph Jopp wrote:
>
>> Kjære Christian,
>> for meg følgende code virker:
>>
>> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
>> oInpStream = oSFA.openFileRead(sUrl)
>> oTextInpStream =
createUnoService("com.sun.star.io.TextInputStream&quo
t;)
>> oTextInpStream.setInputStream(oInpStream)
>>
oTextInpStream.setEncoding("iso-8859-1")
>> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
>> domDoc = oDB.parse(oTextInpStream)
>> oInpStream.closeInput
>>
>> Sorry for my bad Norvegian but It's long ago,
I've been there.
>> To the code:
>> You have to use a TextInputStream to be able to set
the encoding.
>>
>> Hope it helps.
>> Ha det bra,
>> Christoph
>>
>>
>> Christian Andersson wrote:
>>
>>> I have a small problem, In starbasic I'm using
(almost) the following
>>> code (there might be small mistakes sicne I'm
writing this from memory)
>>> to read and parse an xml document with
starbasic
>>>
>>> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
>>> oInpStream = oSFA.openFileRead(sUrl)
>>> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
>>> domDoc = oDB.parse(oInpStream)
>>> oInpStream.closeInput
>>>
>>> this works for me almost perfectly, and I say
almost, since there are
>>> some xml documents that it cannot read.
>>>
>>> the problem I am having is that some documents
(that are beeing
>>> generated by a third party system which I
cannot change)
>>>
>>> have not declared that it is an xml document
like this
>>> <?xml version="1.0"
encoding="utf-8" ?>
>>>
>>> it just starts with the xml tags directly liek
this
>>>
>>> <test>
>>> <test2>
>>> .....
>>> </test2>
>>> </test>
>>>
>>> this is all fine, I have other xml documents
that also look liek this,
>>> and Openoffice can read and parse them.
>>> however within these problematic documents they
are using national
>>> characters (åæø) encoded using iso-8859-1
and this is the problem.
>>> if they were encoded with utf-8 openoffice can
read the document without
>>> having any ecoding declaration. but with
iso-8859-1 the oDB.parse
>>> function just returns null. no
errors/exceptions or anything, just null.
>>>
>>> if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
>>> ?> at the start, openoffice can read it
perfectly..
>>>
>>> so is there some way I can force the dom parser
to use iso-8859-1
>>> instead of utf-8 ?
>>> it would be great if I could do
>>> domDoc = oDB.parse(oInpStream,
"iso-8859-1")
>>> and it would work, but from what I can see
there is no function for this
>>> in the DocumentBuilder, not is there anything
like this in the
>>> inputstream object or the simplefileaccess
object.
>>>
>>> I should be able to get around this problem by
programmaticly make a
>>> copy of the file, and insert the <?... part
first and then use my
>>> modified file for reading the xml file, but
this is only a last resort
>>> sollution.
>>>
>>>
>>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
>> For additional commands, e-mail: dev-help api.openoffice.org
>>
>>
>>
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
| problems reading xml file with
com.sun.star.xml.dom.DocumentBuilder |

|
2006-09-26 10:05:26 |
Sorry, it seems to be a platform independent problem.
I could reproduce your problem even on the Linux machine.
Just had no really 'iso formatted' xml test document
first.
In the code I sent, the TextInputStream is in fact providing
the correct
character encoding and it turned out that the
DocumentBuilder seems to
look only into the stream for the encoding. Thus it doesn't
help to
provide the stream with a correct character encoding and you
must
provide the definition of the encoding inside the stream
(here in the
first line of your xml document).
The only way I could think of to bypass this problem would
be
1. Write this definition into your file (as you stated)
2. Somehow write this definition into your stream first
(don't know yet
how to do this)
3. Convert your stream encoding (maybe reading bytes from
inputstream
and writing utf to the parser - how?)
Sorry again for not really helping you.
Maybe somebody else?
Btw: To get the build number without writing code you could
open the
about box from the help menu and type sdt keeping the
control key
pressed for all three letters.
Christian Andersson wrote:
> Hmm this is not working for me, I still get a null
object from oDB.parse...
>
> what system do you test this on?
> I am running this on windows 2003 server and openoffice
2.0
> (I know that there is a way to get build number, but I
keep forgetting it)
>
> Christoph Jopp wrote:
>
>> Kjære Christian,
>> for meg følgende code virker:
>>
>> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
>> oInpStream = oSFA.openFileRead(sUrl)
>> oTextInpStream =
createUnoService("com.sun.star.io.TextInputStream&quo
t;)
>> oTextInpStream.setInputStream(oInpStream)
>>
oTextInpStream.setEncoding("iso-8859-1")
>> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
>> domDoc = oDB.parse(oTextInpStream)
>> oInpStream.closeInput
>>
>> Sorry for my bad Norvegian but It's long ago,
I've been there.
>> To the code:
>> You have to use a TextInputStream to be able to set
the encoding.
>>
>> Hope it helps.
>> Ha det bra,
>> Christoph
>>
>>
>> Christian Andersson wrote:
>>
>>> I have a small problem, In starbasic I'm using
(almost) the following
>>> code (there might be small mistakes sicne I'm
writing this from memory)
>>> to read and parse an xml document with
starbasic
>>>
>>> oSFA = createUNOService
("com.sun.star.ucb.SimpleFileAccess")
>>> oInpStream = oSFA.openFileRead(sUrl)
>>> oDB =
createUnoService("com.sun.star.xml.dom.DocumentBuilder
")
>>> domDoc = oDB.parse(oInpStream)
>>> oInpStream.closeInput
>>>
>>> this works for me almost perfectly, and I say
almost, since there are
>>> some xml documents that it cannot read.
>>>
>>> the problem I am having is that some documents
(that are beeing
>>> generated by a third party system which I
cannot change)
>>>
>>> have not declared that it is an xml document
like this
>>> <?xml version="1.0"
encoding="utf-8" ?>
>>>
>>> it just starts with the xml tags directly liek
this
>>>
>>> <test>
>>> <test2>
>>> .....
>>> </test2>
>>> </test>
>>>
>>> this is all fine, I have other xml documents
that also look liek this,
>>> and Openoffice can read and parse them.
>>> however within these problematic documents they
are using national
>>> characters (åæø) encoded using iso-8859-1
and this is the problem.
>>> if they were encoded with utf-8 openoffice can
read the document without
>>> having any ecoding declaration. but with
iso-8859-1 the oDB.parse
>>> function just returns null. no
errors/exceptions or anything, just null.
>>>
>>> if I in that file manually add <?xml
version="1.0" encoding="iso-8859-1"
>>> ?> at the start, openoffice can read it
perfectly..
>>>
>>> so is there some way I can force the dom parser
to use iso-8859-1
>>> instead of utf-8 ?
>>> it would be great if I could do
>>> domDoc = oDB.parse(oInpStream,
"iso-8859-1")
>>> and it would work, but from what I can see
there is no function for this
>>> in the DocumentBuilder, not is there anything
like this in the
>>> inputstream object or the simplefileaccess
object.
>>>
>>> I should be able to get around this problem by
programmaticly make a
>>> copy of the file, and insert the <?... part
first and then use my
>>> modified file for reading the xml file, but
this is only a last resort
>>> sollution.
>>>
>>>
>>>
>>
------------------------------------------------------------
---------
>> To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
>> For additional commands, e-mail: dev-help api.openoffice.org
>>
>>
>>
>
>
------------------------------------------------------------
---------
To unsubscribe, e-mail: dev-unsubscribe api.openoffice.org
For additional commands, e-mail: dev-help api.openoffice.org
|
|
[1-7]
|
|