On Sat, Mar 31, 2007 at 04:08:30AM -0600, Ben Carter wrote:
>
> Now consider the case of
>
> $y = chr(1000);
>
> Clearly whatever is in $y cannot be a single octet.
The way Perl
> currently works (and this is my limited understanding
here - someone
> with more knowledge can feel free to step in and
correct my errors)
> is that now $y is considered to be a string of Unicode
codepoints. So
> $y contains a single codepoint, U+03E8. The internal
flag is used to
> indicate that the internal data pointer points to
something that is a
> "Unicode codepoint string".
No.
"ABCD" also contains 4 Unicode code points.
Perl strings only contain Unicode code points. Always.
The issue is not whether or not a string is a
"Unicode" string or not, the
point is the *encoding* of the Unicode code points. That can
be in UTF-8
(variable number of bytes/code point), or Latin-1 (one
byte/character).
Unicode does not imply UTF-8.
Abigail
|