Ok, last mail, because this is a different topic
On Sat, Mar 31, 2007 at 01:08:21AM +0200, Juerd Waalboer
<juerd convolution.nl> wrote:
> Marc Lehmann skribis 2007-03-31 0:25 (+0200):
> > If you send a compressed string over the network
using JSON and decompress
> > it, you need to know that.
>
> Does JSON compress arbitrary data?
no.
> If so, then the user must do the decoding and
encoding,
No, compression is something completely orthogonal from
encoding. Neither
forces me to do the other.
> because arbitrary data only exists in byte form
Thats eems completely wrong to me.
> Once you dictate any specific encoding, it's no longer
arbitrary.
JSON dictates unicode for the JSON text, and strongly hints
at the use of
UTF-8 for interchange purposes.
> On the other hand, if JSON does text data only,
No, it does support binary data just as well. It is used a
lot, too.
It works just like perl without the bugs: You have a string
type that can
store bytes. It is up to the user to interpret them as she
wants.
> it can just use any UTF encoding on both sides, and
document it like
> that.
It is a bit complicated, but you can safely assume that 99%
of all JSON
is UTF-8 encoded. In fact, you can recode all JSON documents
into ASCII,
too. JSON::XS offers that, and JSON::XS by default encodes
to/decodes
from UTF-8, but allows the user to decode/encode himself.
JSON text is
composed of unicode characters, and in Perl some JSON
modules store them
as a simple Perl string.
All that is not well-supported by most JSON modules, though,
for example
JSON::XS is the only module for perl that correctly decodes
escaped
surrogate pairs.
> Unless both sides are exactly the same platform (e.g.
both Perl), you
> need to establish a protocol for sending data anyway.
And that protocol
> should also describe encoding. If sender and receiver
don't agree, you
> have a problem.
No, it doesn't have anything to do with the platform. Even
when both sides
use Perl I need to decide on a common encoding. Thats
strictly outside the
JSON definition, though.
> > I am really frustrated at that. It makes perl as a
whole rather
> > questionable for unicode use, as you constantly
have to think about
> > the internals. And yes, that simply shouldn't be
the case.
>
> I maintain that it isn't the case, for almost any
programming job,
> unless you're indeed doing things with internals.
Well, the JSON::XS module certainly does things with the
internals, it
has to flag some strings as UTF-X, and in fact flags all
strings that
way unless you enable the shrink option, which is documented
to try to
shrink the memory used in various ways (one way is to try to
downgrade the
scalar).
Certainly, the user who reported the bug also didn't look at
the
internals. Compress::Zlib called unpack "CCCV" or
somesuch, though, which
unfortunately treats V very different from C, by looking at
the internals
with "C", and not doing that and treating the
string as an octte string
with "V".
The user suggested that JSON::XS corrupts binary data
because it happens to
be returned upgraded unless you set the shrink option.
However, Perl does not expose the internals elsewhere, the
upgraded
version is semantically equivalent to the downgraded one
unless you use
an XS module using SvPV directly or indirectly (considered a
bug in Perl
when I understood nick correctly), or when using unpack
"C", as that has
a different meaning in perl 5.6 than in perl 5.005, and has
confusing
documentation.
The right thing for Compress::Zlib is not to use unpack
"CCCV" but unpack
"UUUV", which seems completely weird to me, as no
unicode was ever
involved *on the perl level*.
--
The choice of a
-----==- _GNU_
----==-- _ generation Marc Lehmann
---==---(_)__ __ ____ __ pcg goof.com
--==---/ / _ / // / / / http://schmorp.de/
-=====/_/_//_/_,_/ /_/_ XX11-RIPE
|