List Info

Thread: Re: Regexp failure with utf8-flagged string and byte-flagged pattern




Re: Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-21 09:59:07
> On 9/20/07, via RT srezic  cpan. org
<perlbug-followupperl.org> wrote:
>> # New Ticket Created by  sreziccpan.org
>> # Please include the string:  [perl #45605]
>> # in the subject line of all future correspondence
about this issue.
>> # <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=45605 >
>>
>>
>> This is a bug report for perl from sreziccpan.org,
>> generated with the help of perlbug 1.36 running
under perl 5.10.0.
>>
>>
>>
------------------------------------------------------------
-----
>> The script below works as expected until perl 5.8.8
(i.e. it prints
>> "1").
>> With perl5.10.0 the pattern does not match
anymore.
>>
>> Regards,
>>     Slaven
>>
>> #!perl
>> $string = 'Öschel';
>> utf8::upgrade($string);
>> warn $string =~ m{(?:Ö|&Ouml;)schel};
>> __END__
>
> I dont have a blead handy right now to test with, could
someone please
> send me the output of this with a
>
> use re Debug=>'ALL';
>
> right before the warn statement.
>

See the attachment.

Regards,
    Slaven

  
Re: Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-21 16:35:18
On 9/21/07, slavenrezic.de <slavenrezic.de> wrote:
> > On 9/20/07, via RT srezic  cpan. org
<perlbug-followupperl.org> wrote:
> >> # New Ticket Created by  sreziccpan.org
> >> # Please include the string:  [perl #45605]
> >> # in the subject line of all future
correspondence about this issue.
> >> # <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=45605 >
> >>
> >>
> >> This is a bug report for perl from sreziccpan.org,
> >> generated with the help of perlbug 1.36
running under perl 5.10.0.
> >>
> >>
> >>
------------------------------------------------------------
-----
> >> The script below works as expected until perl
5.8.8 (i.e. it prints
> >> "1").
> >> With perl5.10.0 the pattern does not match
anymore.
> >>
> >> Regards,
> >>     Slaven
> >>
> >> #!perl
> >> $string = 'Öschel';
> >> utf8::upgrade($string);
> >> warn $string =~ m{(?:Ö|&Ouml;)schel};
> >> __END__
> >
> > I dont have a blead handy right now to test with,
could someone please
> > send me the output of this with a
> >
> > use re Debug=>'ALL';
> >
> > right before the warn statement.
> >
>
> See the attachment.

Thanks to you and Merijn I can say with pretty good
certainty what the
problem is.

The trie code builds a char class during its construction
phase, and
is not storing the first byte of the unicode representation
of
codepoints between 128 and 255.

The fix should be fairly straight forward but I dont have
access to
the tools to do it myself just at the second.

But we need to make sure this is fixed before 5.10 is
released.

Yves
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )