Marc Lehmann skribis 2007-03-31 2:12 (+0200):
> Yes, and the exact same is true for unicode (both have
a 1-1 mapping
> between 0..255 and octets), trivially, of course, as
unicode explicitly is
> a superset of latin1.
Unicode is a character set, not a character encoding.
While for 8 bit character sets, the encoding is the same
thing, once you
get past the 8 bit boundary, the difference begins to
matter.
A unicode string is a sequence of codepoints, not octets.
They don't map
1:1 to octets either. To express a unicode string in
octects, you need
to encode it. For this, there are several possibilities,
including
UTF-8, UTF-16, ...
Unicode is a superset of the latin1 character set, not the
latin1
character encoding. We'd need bigger bytes for the latter
--
korajn salutojn,
juerd waalboer: perl hacker <juerd juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy
<sales convolution.nl>
Ik vertrouw stemcomputers niet.
Zie <ht
tp://www.wijvertrouwenstemcomputersniet.nl/>.
|