List Info

Thread: Problems with the charset of items returned from DBpedia




Problems with the charset of items returned from DBpedia
user name
2007-08-13 14:34:50
WHEN I RETRIEVE TEXT ITEMS FROM DBPEDIA WITH SPANISH
CHARACTERS IN THEM, THEY 
ARE INCORRECT. THE OPEN-URI OPEN() CALL IN THE SPARQL.RB
FILE DOESN'T SET THE 
CHARSET (THE DEFAULT IS "ISO-8859-1"), AND I TRIED
SETTING IT TO 'UTF-8' IN 
THE HEADER() METHOD BUT IT DIDN'T MAKE ANY DIFFERENCE. HAS
ANYONE HAD SIMILAR 
PROBLEMS? 

HERE'S A SAMPLE PIECE OF TEXT WHERE THE 'E' IN TAMBIEN
SHOULD HAVE AN ACUTE 
ACCENT. IT LOOKS TO ME AS THOUGH A DOUBLE UTF8 CHARACTER HAS
WRONGLY BEEN 
INTERPRETED AS A LATIN1 CHARACTER SOMEWHERE:

<LITERAL XML:LANG="ES">WILLIAM JEFFERSON
CLINTON (TAMBIƩN CONOCIDO COMO 
&QUOT;BILL CLINTON&QUOT;)

-- RICHARD
_______________________________________________
ACTIVERDF MAILING LIST
ACTIVERDFLISTS.DERI.ORG
HTTP://LISTS.DERI.ORG/MAILMAN/LISTINFO/ACTIVERDF

Re: Problems with the charset of items returned from DBpedia
user name
2007-08-13 15:32:06
On 08/13/07/08/07 20:34 +0100, Richard Dale wrote:
>When I retrieve text items from DBpedia with spanish
characters in them, they 
>are incorrect. The open-uri open() call in the sparql.rb
file doesn't set the 
>charset (the default is "iso-8859-1"), and I
tried setting it to 'utf-8' in 
>the header() method but it didn't make any difference.
Has anyone had similar 
>problems? 
I had problems with DBpedia's language-typed literals as
well, but thought 
it'd be a Virtuoso problem. But it could indeed also be an
ActiveRDF 
problem. Have you been able to run the same query with curl
and see what 
kind of stuff Virtuose returns (you can find the encoded
SPARQL query by 
running ActiveRDF in debug mode (export
ACTIVE_RDF_LOG_LEVEL=0))?

  -eyal
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

Re: Problems with the charset of items returned from DBpedia
user name
2007-08-14 08:22:18
On Monday 13 August 2007, Eyal Oren wrote:
> On 08/13/07/08/07 20:34 +0100, Richard Dale wrote:
> >When I retrieve text items from DBpedia with
spanish characters in them,
> > they are incorrect. The open-uri open() call in
the sparql.rb file
> > doesn't set the charset (the default is
"iso-8859-1"), and I tried
> > setting it to 'utf-8' in the header() method but
it didn't make any
> > difference. Has anyone had similar problems?
>
> I had problems with DBpedia's language-typed literals
as well, but thought
> it'd be a Virtuoso problem. But it could indeed also be
an ActiveRDF
> problem. Have you been able to run the same query with
curl and see what
> kind of stuff Virtuose returns (you can find the
encoded SPARQL query by
> running ActiveRDF in debug mode (export
ACTIVE_RDF_LOG_LEVEL=0))?
I tried the query with curl, and it was exactly the same.
But the snorql 
explorer worked fine, and I looked at the javascript and
found it was using 
json as a result format. So I tried a json result format
with curl and the 
spanish word tambien was returned as 'tambiu00E9n' which is
the correct 
unicode escape sequence. When I tried ActiveRDF with json it
worked fine too 
and the Ruby string in the results has the correct spanish
characters as 
utf-8.

-- Richard
_______________________________________________
ActiveRDF mailing list
ActiveRDFlists.deri.org
http
://lists.deri.org/mailman/listinfo/activerdf

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )