List Info

Thread: Re: parsing in eval() varies with UTF8ness




Re: parsing in eval() varies with UTF8ness
user name
2007-09-24 03:42:37
On 23/09/2007, Tels <nospam-abusebloodgate.com> wrote:
> When you don't do "use utf8;" you script is
expected to be in latin1
> (iso.-8859-1). (we leave "use locale" out of
this for now). Under use utf8,
> it can contain any UTF-8.
>
> However, it seems eval() (or require?) doesn't know
about this.

Right, there can be double encoding. That will need to be
fixed.

> Plus, I am
> not entirely sure how much Unicode you can use in
identifiers as something
> like this:
>
>         #!perl
>         use utf8;
>         my $€ = 1;
>
> still fails to compile with:
>
>         Unrecognized character x82 at t.pl line 5.
>
> perldoc perlsyn (in 5.8.8) doesn't seem to say anything
about identifiers.

Identifiers must start with letters; € isn't one.

[rafaelstcosmo ~]$ bleadperl -Mutf8 -le '$à=42;print $à'
42
[rafaelstcosmo ~]$ bleadperl -le '$à=42;print $à'
Unrecognized character xA0 in column 3 at -e line 1.

Re: parsing in eval() varies with UTF8ness
user name
2007-09-24 07:58:01
MOIN,

ON MONDAY 24 SEPTEMBER 2007 10:42:37 RAFAEL GARCIA-SUAREZ
WROTE:
> ON 23/09/2007, TELS <NOSPAM-ABUSEBLOODGATE.COM> WROTE:
> > WHEN YOU DON'T DO "USE UTF8;" YOU SCRIPT
IS EXPECTED TO BE IN LATIN1
> > (ISO.-8859-1). (WE LEAVE "USE LOCALE"
OUT OF THIS FOR NOW). UNDER USE
> > UTF8, IT CAN CONTAIN ANY UTF-8.
> >
> > HOWEVER, IT SEEMS EVAL() (OR REQUIRE?) DOESN'T
KNOW ABOUT THIS.
>
> RIGHT, THERE CAN BE DOUBLE ENCODING. THAT WILL NEED TO
BE FIXED.

OK.

> > PLUS, I AM
> > NOT ENTIRELY SURE HOW MUCH UNICODE YOU CAN USE IN
IDENTIFIERS AS
> > SOMETHING LIKE THIS:
> >
> >         #!PERL
> >         USE UTF8;
> >         MY $‚¬ = 1;
> >
> > STILL FAILS TO COMPILE WITH:
> >
> >         UNRECOGNIZED CHARACTER X82 AT T.PL LINE
5.
> >
> > PERLDOC PERLSYN (IN 5.8.8) DOESN'T SEEM TO SAY
ANYTHING ABOUT
> > IDENTIFIERS.
>
> IDENTIFIERS MUST START WITH LETTERS; ‚¬ ISN'T ONE.

WOULDN'T PERLSYN BE A GOOD PLACE TO DOCUMENT THIS TIDBIT,
THEN?

AND, OF COURSE, I TRIED THAT WITH "$A‚¬", TOO, SEE
BELOW :P

> [RAFAELSTCOSMO ~]$ BLEADPERL -MUTF8 -LE '$à=42;PRINT $à'
> 42
> [RAFAELSTCOSMO ~]$ BLEADPERL -LE '$à=42;PRINT $à'
> UNRECOGNIZED CHARACTER XA0 IN COLUMN 3 AT -E LINE 1.

V5.8.8:

	# PERL -MUTF8 -LE '$à=42;PRINT $à'
	42
	# PERL -MUTF8 -LE '$Aà=42;PRINT $Aà'
	42
	# PERL -MUTF8 -LE '$A‚¬=42;PRINT $A‚¬'
	UNRECOGNIZED CHARACTER XE2 AT -E LINE 1.
	# PERL -MUTF8 -LE '$‚¬=42;PRINT $‚¬'
	UNRECOGNIZED CHARACTER X82 AT -E LINE 1.

THAT MIGHTY EURO SEEMS TO BE SPECIAL, IT IS NOT ALLOWED EVEN
AFTER A LETTER, 
AND IT'S SOMETIMES RECOGNIZED AS X82 AND SOMETIMES AS XE2.
HUH?

ALL THE BEST,

TELS


-- 
 SIGNED ON MON SEP 24 14:54:04 2007 WITH KEY 0X93B84C15.
 VIEW MY PHOTO GALLERY: HTTP://BLOODGATE.COM/PHOTOS
 PGP KEY ON HTTP://BLOODGATE.COM/TELS.ASC OR PER EMAIL.

 "NOT KING YET."

Re: parsing in eval() varies with UTF8ness
user name
2007-09-24 08:50:02
Rafael Garcia-Suarez skribis 2007-09-24 10:42 (+0200):
> >         use utf8;
> >         my $¤ = 1;
> > still fails to compile with:
> >         Unrecognized character x82 at t.pl line
5.
> Identifiers must start with letters; ¤ isn't one.

Still, the character is not x82 but x, so the error
message is
incorrect.

x82 isn't even the first byte of the UTF-8 encoding of
x. It's
the second. Perhaps the first byte (xe2) is accepted as
latin1,
even though utf8.pm is in effect.
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####juerd.nl>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy
<salesconvolution.nl>

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )