-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Moin,
On Saturday 31 March 2007 16:09:18 Juerd Waalboer wrote:
> Tels skribis 2007-03-31 12:23 (+0000):
> > #!/usr/bin/perl -w
> > use Encode qw/decode/;
> > my $random = "xc3xc3"; # some
random bytes
> > my $ascii = "a"; # some 7bit data
> >
> > # Somebody "helpfull" decodes the ascii
string:
> > # The encoding doesn't actually matter, since it
is 7bit anyway.
> > # This step happens out of my control (e.g. in
third party code)
> > $string = decode('ISO-8859-1', $ascii);
>
> $string is a text string, now. Remember, decoding is
going from byte
> string to text string.
Yes, but my point was that I:
* might not be the one who "decoded" $string or
produced it even.
* do not know if I am passed a "text" string as
there is only the
flag-you-should-not-know-about to distinguish these two.
> Using unpack "C" on a text string makes no
sense if you consider that
> this "C" doesn't stand for
"character" in the sense that the
> documentation for chr, ord, length, split, etcetera
use. It stands for
> "char", which is a C datatype that contains
one byte.
>
> As such, unpack "C" is a byte operation and
makes sense on byte strings
> only. $string is a text string, and you can tell by
looking at the
> decode() step.
>
> > # now take our random binary data and a 7bit
ascii string and do:
> > print join (" ",
unpack("CCC", "$random$string")),
"n";
>
> Dangerous, and that's why I suggested adding a
"wide character in..."
> warning earlier in this thread.
>
> > Now explain to me why this prints different things
even tho $random is
> > the same string in both cases, and $string and
$ascii should be the
> > same, too. Bonus
points if you manage to not mention the uhh -- ut -
> > utf -- uhm -- er The Flag[tm].
>
> I get the bonus points! Hurrah!
Not really, as you didn't explain the difference, you merely
told me "there
is a difference" (where me personally don't expect to
be a difference)
> The only explanation that I used is the separation
between text strings
> and binary strings. It's also the only thing you need
to know. You'll
> benefit from knowing more, certainly, but I see red
flags in your code.
Ok, and how am I supposed know that in:
sub dosomething {
my $a = shift;
}
$a is a text string or a binary string?
> > So far, I can see the ways to handle this are:
> > (..)
> > * never mix fire and water er dogs and cats er I
mean text and bytes,
> > and pray that every piece of code out there to
adheres to this, too.
>
> Exactly.
This is not a working strategy.
> > I think the Pray and Hope[tm] strategy doesn't
really work, tho.
>
> It doesn't always work, because people can't be trusted
to do the right
> thing, but it can always be fixed.
Only if you consider your own code. But data is sometimes
processed by other
code (Perl itself, some module etc.).
All the best,
Tels
- --
Signed on Sat Mar 31 18:33:51 2007 with key 0x93B84C15.
Get one of my photo posters: http://bloodgate.com/pos
ters
PGP key on http://bloodgate.com/te
ls.asc or per email.
"We're looking at a future where only the very largest
companies will be
able to implement software, and it will technically be
illegal for other
people to do so."
-- Bruce Perens, 2004-01-23
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iQEVAwUBRg6qqXcLPEOTuEwVAQINCAf/QWq653liE6ZUnR5sUrO8YFVXU0Gi
5s/m
wm4teby4dypHRuyjKov7a2XeheRCZU+iYXnlNFk8Tioqd3ZOwlZC5uGbufX1
QnpO
H9lYRtDTG14BHH2D+QsMgSrPcAXwsnvSdlePAmy4m9TJ3xQTtzcPLTWt2p8t
giul
URl0lgMHv7I9ASJusYwPa00YRFDexpdVuYpclTtnzzVPoGkuMxAKIDhhAuKp
9uSl
gWJXGiha9hvGEZOh2k6mGZ/bkstEMhp3vrqU1ccp11jfahsaAwvU9EVS7254
t22R
KqXh3Ca4/lMxs+2+1xW0j518Asq0sB/L6gkyGr0tHdFgQwX7S71yoA==
=K82l
-----END PGP SIGNATURE-----
|