Tels skribis 2007-03-30 23:17 (+0000):
> > If it is so deadly to collide byte-oriented data
with character data,
> > it should not be so easy to do so accidentally.
> It can happen everytime you concatenate two strings.
Maybe we could add a
> new warning?
Eh, no, because Perl does not have any metadata telling you
if this
non-UTF8 string is a latin1 text string, or just a random
byte string.
There is no way to tell Perl how you intended your string to
be used,
and there is no way for Perl to tell you the same thing
about a string
it returned.
> use warnings 'upgrade';
This already exists on CPAN, authored by Audrey Tang, as
encoding::warnings:
use encoding::warnings;
But it will warn when Perl upgrades latin1 to utf-8, without
knowing if
that is a bug or a feature, because it doesn't know if the
"latin1"
string was meant as a text string or a byte string.
It's a useful debugging tool, to find unintended upgrades,
but you
shouldn't try to avoid upgrading altogether. That just
hurts, because
upgrading is part of the way the Perl Unicode model was
intended.
> * the lenght in bytes
> * the lenght in characters (not always set, e.g. can
be unknown)
> * the storage buffer (containing the data, plus some
optional padding)
> * the encoding
Hey, cool, Perl has almost the same thing, only it supports
just two
encodings: latin1 and utf8. It uses a single bit to indicate
the
encoding, the UTF8 flag, which can be on or off. When it's
off, the
string is latin1, when it's on, the string is UTF-8.
Maybe you should try Perl; you'll like the way it's built,
because it
very closely matches your own design!
The same type of string can be used for binary data, because
in the
unicode encoding "latin1", all 256 codepoints map
to the same byte
values.
> In short, it becomes a mess.
Yes, with strong typing, especially with string subtypes for
arbitrary
encodings, it would be cleaner. But it would also not look
like Perl 5.
--
korajn salutojn,
juerd waalboer: perl hacker <juerd juerd.nl> <http://juerd.nl/sig>
convolution: ict solutions and consultancy
<sales convolution.nl>
Ik vertrouw stemcomputers niet.
Zie <ht
tp://www.wijvertrouwenstemcomputersniet.nl/>.
|