---- Original message ----
>Date: Tue, 27 Nov 2007 14:56:56 -0500
>From: Bob Duncan <duncanr lafayette.edu>
>Subject: [Web4lib] RSS and diacritics
>To: web4lib webjunction.org
>
>
>Greetings,
>
>I'm getting ready to offer RSS feeds for our library's
recent
>acquisitions lists and have run into a little snag:
characters with
>diacritics. I understand why I can't use HTML character
entity
>references and expect all feed readers to play nicely,
so I tried
>encoding the ampersand in the HTML entity reference (a
suggested fix
>that I can no longer document). While this works great
for some feed
>readers, other readers and the two major browsers
display the raw
>code instead of the character with diacritical mark.
>
>Other than displaying plain letters without diacritics,
is there a
>way to code feeds so that all (or at least most) feed
readers will
>display the character with the mark? (I'd like to be
able to this in
>item titles and descriptions.)
>
>Thanks,
>
I guess I'm a little confused. This could possibly be
several problems and there's a lot more we need to know.
Where are you getting your information from that has
diacritics? What encoding are those diacritics? Are you
sure the data isn't being converted or corrupted when you
are querying the source?
RSS feeds are XML. If you're pulling unicode information
and putting it directly into the RSS feed and the RSS feed's
encoding matches, you shouldn't have an issue. The
diacritics will be there.
That being said, unicode isn't very well supported as of
yet. There's a lot of software and fonts that don't have
very complete character sets. Arial Unicode so far has the
most complete that I know of. People using a browser will
have to have it set to use a unicode font to see unicode
characters correctly. On top of that, there's a lot of
software that mishandles combining diacritics (IE 6 is one
example, if I recall correctly) and will never display them
correctly.
Other issues like bi-directionality are ambiguous and not
clear even now. For example, if you have Korean and English
in one document, it's not clear what layer of the software
is required to do the work necessary so each can be read in
the right direction.
Unicode issues can run through several layers of software,
even for the server-side software that is commonly used for
generating things like RSS feeds. Often unicode support is
feasible, but it must be done purposefully and it's not.
Unicode issues can be tricky, but you should be able to
trace the data through the system and ensure that it's
unicode at every step.
Of course, if the source data isn't even in unicode, that's
another issue.
Jon Gorman
_______________________________________________
Web4lib mailing list
Web4lib webjunction.org
http://lists.we
bjunction.org/web4lib/
|