List Info

Thread: Re: RSS and diacritics




Re: RSS and diacritics
country flaguser name
United States
2007-11-27 14:56:45

Apologizes, In rereading I realized I mis-interpreted what
you were saying.  I thought you had two distinct problems
(using html character entities) and issues with diacritics.

The answer as far as the entities?  RSS can be a mess ;). 
RSS feeds are XML.  Sadly, a widespread practice has
occurred of using "escaped html" in fields of the
RSS feeds.  There's no way to ensure that these escaping
nightmares will be parsed correctly.

HTML defines some character entities, but RSS doesn't have
all of them.  You can attempt to add these characters to the
RSS feed via including them in a Doctype declaration at the
beginning of the feed.  This wikipedia page looks like it
has some examples of that: http://en.wikipedia
.org/wiki/XML.

The best solution?  Not really sure.  I'd lean towards not
using "escaped html" in my RSS feed.  Instead use
just rss and the character references, which should display
cleanly assuming that the rss feeder isn't junk.

(And by character reference, I mean use &#x..; where ..
is the appropriate code point).

See http://en.wikipedia.org/wiki/Character_entity_reference for a bit more information.

Jon Gorman

---- Original message ----
>Date: Tue, 27 Nov 2007 14:56:56 -0500
>From: Bob Duncan <duncanrlafayette.edu>  
>Subject: [Web4lib] RSS and diacritics  
>To: web4libwebjunction.org
>
>
>Greetings,
>
>I'm getting ready to offer RSS feeds for our library's
recent 
>acquisitions lists and have run into a little snag: 
characters with 
>diacritics.  I understand why I can't use HTML character
entity 
>references and expect all feed readers to play nicely,
so I tried 
>encoding the ampersand in the HTML entity reference (a
suggested fix 
>that I can no longer document).  While this works great
for some feed 
>readers, other readers and the two major browsers
display the raw 
>code instead of the character with diacritical mark.
>
>Other than displaying plain letters without diacritics,
is there a 
>way to code feeds so that all (or at least most) feed
readers will 
>display the character with the mark?  (I'd like to be
able to this in 
>item titles and descriptions.)
>
>Thanks,
>
>Bob Duncan
>
>
>~!~!~!~!~!~!~!~!~!~!~!~!~
>Robert E. Duncan
>Systems Librarian
>Editor of IT Communications
>Lafayette College
>Easton, PA  18042
>duncanrlafayette.edu
>http://www.library.
lafayette.edu/ 
>
>
>_______________________________________________
>Web4lib mailing list
>Web4libwebjunction.org
>http://lists.we
bjunction.org/web4lib/
_______________________________________________
Web4lib mailing list
Web4libwebjunction.org
http://lists.we
bjunction.org/web4lib/

Re: RSS and diacritics
country flaguser name
United States
2007-11-27 16:58:47
At 03:56 PM 11/27/2007, Jonathan Gorman wrote:
>Apologizes, In rereading I realized I mis-interpreted
what you were 
>saying.  I thought you had two distinct problems (using
html 
>character entities) and issues with diacritics.

Phew!  I thought I was going to have to attempt a reply to
your first 
response. ;o)

>The answer as far as the entities?  RSS can be a mess
;).  RSS feeds 
>are XML.  Sadly, a widespread practice has occurred of
using 
>"escaped html" in fields of the RSS feeds. 
There's no way to ensure 
>that these escaping nightmares will be parsed
correctly.
>
>HTML defines some character entities, but RSS doesn't
have all of 
>them.  You can attempt to add these characters to the
RSS feed via 
>including them in a Doctype declaration at the beginning
of the 
>feed.  This wikipedia page looks like it has some
examples of that: 
>http://en.wikipedia
.org/wiki/XML.
>
>The best solution?  Not really sure.  I'd lean towards
not using 
>"escaped html" in my RSS feed.  Instead use
just rss and the 
>character references, which should display cleanly
assuming that the 
>rss feeder isn't junk.
>
>(And by character reference, I mean use &#x..; where
.. is the 
>appropriate code point).

Thanks.  I think that will do it.  I was using name-based
references 
(Egrave, etc.) and escaping the ampersand, which worked in
most feed 
readers but not in everything capable of displaying a feed. 
The 
numeric character references work fine in all apps tested so
far.

One other question:  which numeric reference is preferable? 
For 
example, both &#xC9; and &#201; (xC9 and 201)
produce a Latin capital 
E acute.  Are there good reasons to use one over the other? 
(And is 
either more likely than the other to be correctly rendered
by 
browsers in non-RSS situations?)

Thanks,

Bob Duncan


~!~!~!~!~!~!~!~!~!~!~!~!~
Robert E. Duncan
Systems Librarian
Editor of IT Communications
Lafayette College
Easton, PA  18042
duncanrlafayette.edu
http://www.library.
lafayette.edu/ 


_______________________________________________
Web4lib mailing list
Web4libwebjunction.org
http://lists.we
bjunction.org/web4lib/

Re: RSS and diacritics
country flaguser name
Australia
2007-11-27 17:54:48
Which version of RSS are you using, and does its schema/DTD
defined the 
entities you want to use?

re, NCRs have a look at 
http://www.w3.org/International/questions/qa-escapes



Bob Duncan wrote:
> At 03:56 PM 11/27/2007, Jonathan Gorman wrote:
>> Apologizes, In rereading I realized I
mis-interpreted what you were 
>> saying.  I thought you had two distinct problems
(using html character 
>> entities) and issues with diacritics.
> 
> Phew!  I thought I was going to have to attempt a reply
to your first 
> response. ;o)
> 
>> The answer as far as the entities?  RSS can be a
mess ;).  RSS feeds 
>> are XML.  Sadly, a widespread practice has occurred
of using "escaped 
>> html" in fields of the RSS feeds.  There's no
way to ensure that these 
>> escaping nightmares will be parsed correctly.
>>

named entities need to be defined. XML by default only
supports a small 
handful. Most of the named entities in HTMl don't exist in
XML, unless 
the schema or DTD in question defines them.

for XML documents its best to use an appropriate encoding
that supports 
all your character requirements rather than using entities
or NCRs.

>> HTML defines some character entities, but RSS
doesn't have all of 
>> them.  You can attempt to add these characters to
the RSS feed via 
>> including them in a Doctype declaration at the
beginning of the feed.  
>> This wikipedia page looks like it has some examples
of that: 
>> http://en.wikipedia
.org/wiki/XML.

yep

>> The best solution?  Not really sure.  I'd lean
towards not using 
>> "escaped html" in my RSS feed.  Instead
use just rss and the character 
>> references, which should display cleanly assuming
that the rss feeder 
>> isn't junk.

best solution: choose an appropriate encoding for your data
and declare 
that encoding.

>> (And by character reference, I mean use &#x..;
where .. is the 
>> appropriate code point).
> 
> 
> One other question:  which numeric reference is
preferable?  For 
> example, both &#xC9; and &#201; (xC9 and 201)
produce a Latin capital E 
> acute.  Are there good reasons to use one over the
other?  (And is 
> either more likely than the other to be correctly
rendered by browsers 
> in non-RSS situations?)

Decimal is more likely to work with older browsers, either
should work 
with modern browsers, and hexadecimal is easier to work with
when editing.

-- 
Andrew Cunningham
Research and Development Coordinator (Vicnet)
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia

Email: andrewc+AEA-vicnet.net.au
Alt. email: lang.support+AEA-gmail.com

Ph: +613-8664-7430                    Fax:+613-9639-2175
Mob: 0421-450-816

http://www.slv.vic.gov.au/
            http://www.vicnet.net.au/
http://www.openroad.net.a
u/           http://www.mylanguage.g
ov.au/
http://home.vicne
t.net.au/~andrewc/
_______________________________________________
Web4lib mailing list
Web4libwebjunction.org
http://lists.we
bjunction.org/web4lib/

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )