On Sat, Mar 31, 2007 at 03:03:21AM +0200, Juerd Waalboer
<juerd convolution.nl> wrote:
> JSON is pretty big to just quickly examine. I have
nothing set up for
> testing it.
Not my problem. Your coding style cnanot handle it, though,
so in your own
interest you should try to examine it some day.
> > > I'm constantly very explicitly and verbosely
telling people to NOT look
> > > at the flag, NOT set it manually, etcetera.
> > So why do you propose that people have to make
sure that they never put a
> > binary string with the UTF-X flag set into
unpack?
>
> Not unpack in general, but unpack "C".
>
> Because "C" is explicitly catered for byte
data, which strings with the
> UTF8 flag aren't.
Well, you are not tlaking of Perl here.
> It won't always catch mistakes, because indeed lack of
> the flag says nothing, but it can help catch some of
them.
Having the flag means nothing, either.
> Perl already has a similar warning in many places, for
example when you
> print such a "wide character" on a filehandle
that has no encoding or
> utf8 layer. Some modules, like MIME::Base64, provide
the same
> functionality.
It is similar, but it works completely different: It only
warns if you pass
something into a function/filehandle that knows that it is
expecting binary
data.
Unlike unpack, the UTF-X flag has nothing to do with the
warning: the warning
tells you that the data you pass in is not binary data
because it contains at
least one character >255. Thats completely fine. But when
I do pass in a
string only consisting of octets (in the perl level), then
it gets passed
into the funciton as binary, as one would expect.
And that, again, has nothing to do with the UTF-X flag. Data
passed into
such a function gets properly downgraded (that process is
what actually
generates the warning, btw).
> > How are users supposed to do that, unless they
know about he flag in the
> > first place?
>
> By keeping byte strings and text string separate.
Please either accept
> this, or stop asking me questions that will lead to
this answer.
I am asking about how users do that, I am not askign what
you think they
should do. I am asking specifically _how_ your idea should
be put into
practise. I gave you an example where the only currently
known way to do that
is by knowing and manipulating the internal UTF-X flag.
And since you have not given an answer to that question, it
stays a valid
question.
The problem is that your coding style cannot resolve this
situation, as
the module in question (JSON::XS) does not know wether the
given piece of
data is binary or text. Only the user knows, but by ghen it
is already
upgraded.
> > Right, and then you want perl functions to die
depending on the setting of
> > that flag, even though you also claim Perl users
should not need to know
> > about it.
>
> The warning would not be a new feature, but an existing
feature applied
> in more places. "die" is probably too harsh
indeed.
No part in perl acts like that, see above, the parts that
generate that
warning are all downrading properly, ensuring the perl
promises of string
handling are kept.
> When they get the error message, they can read the
following in
> perldiag:
>
> Wide character in %s
> (W utf8) Perl met a wide character (>255)
when it wasn’t expecting one. This warning is by default
on for I/O
> (like print). The easiest way to quiet this
warning is simply to add the ":utf8" layer to the
output, e.g.
> "binmode STDOUT, ’:utf8’".
Another way to turn off the warning is to add "no
warnings ’utf8’;" but that is
> often closer to cheating. In general, you
are supposed to explicitly mark the filehandle with an
encoding,
> see open and "binmode" in
perlfunc.
>
> Changing the order of these sentences is on my to-do
list.
You are completely confused. I am talking about octet
strings (or byte
strings in your parlance). That string _never_ triggers that
warning,
regardless of how it is encoded internally, because octte
strings nver
contain wide characters.
Thats how the abstraction should work.
Your change of warning when the UTF-X bit is set would break
that
abstraction, because users suddenly would get that warning
for strings that
do not contain wide characters *at all*.
Thats I can only call very misleading to users.
> Note how this clear explanation doesn't mention the
UTF8 flag!
Exactly: because you didn't understand the mechanics of that
warning
because it doesn't do what you claim it does, namely warn if
the UTF-X
flag is set but instead does the right thing and warns when
there *is* a wide
character in the string, regardless of how it was encoded.
Do you finally understand? Please!
> > You want perl functions to behave different
depending on wether that flag is
> > set or not. I want perl functions to behave the
same, regardless of the fact.
>
> I want Perl to warn about certain mistakes when it
can.
No, you want Perl to warn even when no mistakes happened
because you
equate UTF-X flag with "contains no (binary)
octets/bytes".
But thats not how Perl works. Thats where you misunderstand
how the UTF-X
flag works. Perl warns on real problems (and probably should
die), not
because the UTF-X flag happens to be set, which is
misleading.
Do you finally understand how Perl works?
> > > That's not what I said, nor what I meant. In
fact, quite the opposite.
> > So then unpack should not croak when it sees the
UTF-X flag?
>
> No, it should warn instead. From now on, I no longer
think it should die. It
> should warn, and people who want it to die can do so
with "use warnings FATAL".
Of course it should not warn. That *exposes* the UTF-X flag
to the
user. And the warning you quote would simply be wrong,
because users would
get that warning even when no wide character is in the
string at all.
> I don't usually read bug reports, and never claimed to
have done so.
>
> But in this special case, I will make an exception, and
read the Unicode
> related bug reports that you have submitted.
Maybe you learn what the UTF-X flag does, and why it
shouldn't be exposed
in the way you think it should be or is currently exposed.
The UTF-X flag is *no* indication of a wide character
whatsoever. In Perl.
I think its obvious by know that you are do not know very
much about
unicode handling vs. the UTF-X flag in Perl. At least your
knowledge is
mostly wrong it seems.
And thats sad, because it could be very simple, and for the
most part already
is very simple: Often used modules will simply be improved
to use SvPVbyte
explicitly, even if there is no default typemap support for
it. And Modules
requiring binary data will eventually be fixed to use
"U" instead of "C" for
decoding single octets. And the rest of perl works
relatively fine, and the
remaining issues will be fixed, too.
I just think it would be much better for Perl if those
changes were not
required and things would just continue to work by providing
backwards
compatibility.
--
The choice of a
-----==- _GNU_
----==-- _ generation Marc Lehmann
---==---(_)__ __ ____ __ pcg goof.com
--==---/ / _ / // / / / http://schmorp.de/
-=====/_/_//_/_,_/ /_/_ XX11-RIPE
|