List Info

Thread: Re: Regexp failure with utf8-flagged string and byte-flagged pattern




Re: Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-21 09:55:59
> Moin,
>
> On Thursday 20 September 2007 23:44:46 sreziccpan.org
wrote:
>> # New Ticket Created by  sreziccpan.org
>> # Please include the string:  [perl #45605]
>> # in the subject line of all future correspondence
about this issue.
>> # <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=45605 >
>>
>>
>> This is a bug report for perl from sreziccpan.org,
>> generated with the help of perlbug 1.36 running
under perl 5.10.0.
>>
>>
>>
------------------------------------------------------------
-----
>> The script below works as expected until perl 5.8.8
(i.e. it prints
>> "1").
>> With perl5.10.0 the pattern does not match
anymore.
>>
>> Regards,
>>     Slaven
>>
>> #!perl
>> $string = 'Öschel';
>> utf8::upgrade($string);
>> warn $string =~ m{(?:Ö|&Ouml;)schel};
>> __END__
>
> I don't see "use utf8;" in your example, so,
in what encoding is the
> script?

If "use utf8" is missing, then the script encoding
is usually iso-8859-1.

> Likewise, that means, in what encoding is the $string
and in what is the
> regexp?

The "Ö" is in both cases the byte 0xd6.

Regards,
    Slaven



Re: Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-21 10:19:04
MOIN,

ON FRIDAY 21 SEPTEMBER 2007 16:55:59 SLAVENREZIC.DE
WROTE:
> > MOIN,
[SNIP]
> > I DON'T SEE "USE UTF8;" IN YOUR EXAMPLE,
SO, IN WHAT ENCODING IS THE
> > SCRIPT?
>
> IF "USE UTF8" IS MISSING, THEN THE SCRIPT
ENCODING IS USUALLY ISO-8859-1.

AH. (DOH!)

> > LIKEWISE, THAT MEANS, IN WHAT ENCODING IS THE
$STRING AND IN WHAT IS
> > THE REGEXP?
>
> THE "Ö" IS IN BOTH CASES THE BYTE 0XD6.

AH. BUT YOUR EMAIL SAYS:

 RETURN-PATH: 
 
<PERL5-PORTERS-RETURN-128928-NOSPAM-ABUSE=BLOODGATE.COMPERL.ORG>
 [SNIP]
 MIME-VERSION: 1.0
 CONTENT-TYPE: TEXT/PLAIN;
   CHARSET="UTF-8"
 CONTENT-TRANSFER-ENCODING: 8BIT
 X-RT-ORIGINAL-ENCODING: UTF-8

NOTE THE MAIL ENCODINGS. IF ONE JUST COPY&PASTES YOUR
EXAMPLE, ONE ENDS UP 
WITH A UTF-8 ENCODED SCRIPT 

ALL THE BEST,

TELS

-- 
 SIGNED ON FRI SEP 21 17:16:24 2007 WITH KEY 0X93B84C15.
 GET ONE OF MY PHOTO POSTERS: HTTP://BLOODGATE.COM/POSTERS
 PGP KEY ON HTTP://BLOODGATE.COM/TELS.ASC OR PER EMAIL.

 "A THAUM IS THE BASIC UNIT OF MAGICAL STRENGTH. IT HAS
BEEN UNIVERSALLY
 ESTABLISHED AS THE AMOUNT OF MAGIC NEEDED TO CREATE ONE
SMALL WHITE
 PIGEON OR THREE NORMAL-SIZED BILLIARD BALLS."

  -- TERRY PRATCHETT
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )