List Info

Thread: Re: out of range unicode escapes




Re: out of range unicode escapes
country flaguser name
France
2007-04-16 11:55:50
The CSS WG decided as follows on Björn Höhrmann's comment[1]
about  
Unicode (numerical) escapes outside the legal Unicode
range:

  - Add this text to 4.1.3:

    If the number is outside the range allowed by Unicode
(e.g.,
    "110000" is above the maximum 10FFFF allowed
in current Unicode),
    the UA may replace the escape with the "replacement
character"
    (U+FFFD). If the character is to be displayed, the UA
should show a
    visible symbol, such as a "missing character"
glyph (cf. 15.2, point
    5).

[1] http://lists.w3.org/Archives/Public/www-style/200
7Jan/0062.html

(For reference, this is issue 19 in the forthcoming
"Disposition of 
Comments.")



For the CSS WG,

Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people
/bos                               W3C/ERCIM
  bertw3.org                             2004 Rt des
Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis
Cedex, France


Re: out of range unicode escapes
country flaguser name
Finland
2007-04-16 13:00:29
On Mon, 16 Apr 2007, Bert Bos wrote:

> The CSS WG decided as follows on Björn Höhrmann's
comment[1] about
> Unicode (numerical) escapes outside the legal Unicode
range:
>
>  - Add this text to 4.1.3:
>
>    If the number is outside the range allowed by
Unicode (e.g.,
>    "110000" is above the maximum 10FFFF
allowed in current Unicode),
>    the UA may replace the escape with the
"replacement character"
>    (U+FFFD). If the character is to be displayed, the
UA should show a
>    visible symbol, such as a "missing
character" glyph (cf. 15.2, point
>    5).

The wording "current Unicode" sounds odd, since
the Unicode Consortium has 
agreed that no characters will ever be assigned past 10FFFF.
If they 
change this decision, it will be a different Unicode then.

I don't see why 110000 would be treated as anything but a
malformed 
value, to be ignored, if you specify some fixed error
processing for it.

Specifically, using U+FFFD is not suitable, since it's the
replacement 
character to be used when data has been converted from some
other 
character code and a particular character has no Unicode
counterpart. This 
is quite different from having an out of range reference. If
there has 
actually been some code conversion (so that U+FFFD might be
adequate), 
then the data should of course be ufffd and not something
like 110000.

In practical terms, 110000 probably results from a typo
(e.g., some digit 
repeated too many times), so I'd compare it with e.g. the
string #fffffff 
appearing where a color value is expected.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/
~jkorpela/



[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )