List Info

Thread: Re: the utf8 flag (was Re: decode_utf8 sets utf8 flag on plain ascii strings)




Re: the utf8 flag (was Re: decode_utf8 sets utf8 flag on plain ascii strings)
user name
2007-03-31 05:29:55
On Sat, Mar 31, 2007 at 04:08:30AM -0600, Ben Carter wrote:
> 
> Now consider the case of
> 
>   $y = chr(1000);
> 
> Clearly whatever is in $y cannot be a single octet. 
The way Perl
> currently works (and this is my limited understanding
here - someone
> with more knowledge can feel free to step in and
correct my errors)
> is that now $y is considered to be a string of Unicode
codepoints.  So
> $y contains a single codepoint, U+03E8.  The internal
flag is used to
> indicate that the internal data pointer points to
something that is a
> "Unicode codepoint string".

No.

"ABCD" also contains 4 Unicode code points.

Perl strings only contain Unicode code points. Always.

The issue is not whether or not a string is a
"Unicode" string or not, the
point is the *encoding* of the Unicode code points. That can
be in UTF-8
(variable number of bytes/code point), or Latin-1 (one
byte/character).

Unicode does not imply UTF-8.



Abigail
[1]

about | contact  Other archives ( Real Estate discussion Medical topics )