List Info

Thread: Re: the utf8 flag (was Re: decode_utf8 sets utf8 flag on plain ascii strings)




Re: the utf8 flag (was Re: decode_utf8 sets utf8 flag on plain ascii strings)
user name
2007-03-30 20:46:54
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Moin,

On Friday 30 March 2007 23:06:47 Marvin Humphrey wrote:
> On Mar 30, 2007, at 2:25 PM, Juerd Waalboer wrote:
> >> That so many users, including those as expert
as Marc, possess a
> >> "broken" understanding of Perl's
Unicode model suggests a flawed
> >> design.
> > I think the design is solid, but the
implementation (see regex)
> > slightly
> > broken and documentation wildly misleading.
>
> I strongly disagree with this assessment.  In
particular, I think
> insisting that the user be responsible for manually
segregating
> character and byte-oriented data without any help from
Perl is
> totally unreasonable.
>
> Look at how easily Marc made the "mistake" of
commingling the two
> types of data.  It's debatable whether the fact that
Perl allowed him
> to do that without complaint is a flaw with the design
or the
> implementation, but it's one or the other and it's
serious.
>
> Additionally, as Marc points out, there are lots of
broken XS modules
> out there -- including one of mine. (KinoSearch 0.15 --
Unicode
> support is fixed as of 0.20_01, which breaks backwards
> compatibility.)  Few or none of them would be broken if
Perl made it
> more difficult to move between character data and
byte-oriented data
> -- errors would be flying right and left and the broken
modules would
> get fixed right away.
>
> Of course I understand why that cannot be the case, but
it's
> astonishing to me that you see this as a problem which
can be solved
> via documentation.

I think just documenting isn't enough. We do have things
like "strict", so 
if the current Perl model doesn't allow you to even detect
when you mix the 
wrong kind of data, then we need module/pragma that catches
these errors.

Of course warnings::encode exists, but it seems to not be
able to 
distinguish between "untagged" data and real
ISO-8859-1 strings as Perl 
itself doesn't make this distinction.

> How about encouraging the use of encoding::warnings in
perlunitut?
>
> How about adding it to core and having 'use 5.10;' turn
it on?

If I understand correctly, that would not be enough due to
the "is this 
binary or really iso-8859-1 encoded data" problem
mentioned above.

all the best,

tels

- -- 
 Signed on Sat Mar 31 01:42:47 2007 with key 0x93B84C15.
 View my photo gallery: http://bloodgate.com/phot
os
 PGP key on http://bloodgate.com/te
ls.asc or per email.

 "In 1988, Jack Thompson ran against Janet Reno for DA
of Dade County:
 Thompson's unique campaign message was that Reno was unfit
for the job
 because, as a closeted lesbian with a drinking problem, she
was great
 candidate for blackmail by the criminal element. Jack never
explained
 why this remained a threat even after he exposed her
'secret'. Reno
 cruised at the polls."

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRg29jncLPEOTuEwVAQJALAf/SsSjz5VB4l3Zcggd18SNmdTq8DpB
LUtP
pxiPCs0fYrEtDny/HvDCbQss/nEaGmFwPaVpAA+kFp8jss3h3xzklW6MwAm7
Aisy
+EiZO0JEcADXRWr9CChJpWfMr0qllmzsUUKHa6wc9iXagD6kPoiL49Ay5bkq
PBDT
OKOfcJIRDqk12VKATpdQlBIHR3cEpnUMdh8QKhmAArkXAsV5cZGBC9EGm8l+
dgeK
Uc2k7pxvLXdjCZu6YbJfPwwdiLlugL23Bci7sZrCO/JyboBOK3ch5dWYohZ8
QoMw
SahL/axgJ1DeFTP2ryL6wvnM1djF+HSbzoaLD1E+d7XJqB700Qxdfg==
=eI9w
-----END PGP SIGNATURE-----

Re: the utf8 flag (was Re: decode_utf8 sets utf8 flag on plain ascii strings)
user name
2007-03-30 19:14:36
Tels skribis 2007-03-31  1:46 (+0000):
> > How about encouraging the use of
encoding::warnings in perlunitut?
> > How about adding it to core and having 'use 5.10;'
turn it on?
> If I understand correctly, that would not be enough due
to the "is this 
> binary or really iso-8859-1 encoded data" problem
mentioned above.

You understand correctly 

Thanks.
-- 
korajn salutojn,

  juerd waalboer:  perl hacker  <juerdjuerd.nl>  <http://juerd.nl/sig>
  convolution:     ict solutions and consultancy
<salesconvolution.nl>

Ik vertrouw stemcomputers niet.
Zie <ht
tp://www.wijvertrouwenstemcomputersniet.nl/>.

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )