On Mon, 16 Apr 2007, Bert Bos wrote:
> The CSS WG decided as follows on Björn Höhrmann's
comment[1] about
> Unicode (numerical) escapes outside the legal Unicode
range:
>
> - Add this text to 4.1.3:
>
> If the number is outside the range allowed by
Unicode (e.g.,
> "110000" is above the maximum 10FFFF
allowed in current Unicode),
> the UA may replace the escape with the
"replacement character"
> (U+FFFD). If the character is to be displayed, the
UA should show a
> visible symbol, such as a "missing
character" glyph (cf. 15.2, point
> 5).
The wording "current Unicode" sounds odd, since
the Unicode Consortium has
agreed that no characters will ever be assigned past 10FFFF.
If they
change this decision, it will be a different Unicode then.
I don't see why 110000 would be treated as anything but a
malformed
value, to be ignored, if you specify some fixed error
processing for it.
Specifically, using U+FFFD is not suitable, since it's the
replacement
character to be used when data has been converted from some
other
character code and a particular character has no Unicode
counterpart. This
is quite different from having an out of range reference. If
there has
actually been some code conversion (so that U+FFFD might be
adequate),
then the data should of course be ufffd and not something
like 110000.
In practical terms, 110000 probably results from a typo
(e.g., some digit
repeated too many times), so I'd compare it with e.g. the
string #fffffff
appearing where a color value is expected.
--
Jukka "Yucca" Korpela, http://www.cs.tut.fi/
~jkorpela/
|