List Info

Thread: Regexp failure with utf8-flagged string and byte-flagged pattern




Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-20 16:44:46
# New Ticket Created by  sreziccpan.org 
# Please include the string:  [perl #45605]
# in the subject line of all future correspondence about
this issue. 
# <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=45605 >


This is a bug report for perl from sreziccpan.org,
generated with the help of perlbug 1.36 running under perl
5.10.0.


------------------------------------------------------------
-----
The script below works as expected until perl 5.8.8 (i.e. it
prints "1").
With perl5.10.0 the pattern does not match anymore.

Regards,
    Slaven

#!perl
$string = 'Öschel';
utf8::upgrade($string);
warn $string =~ m{(?:Ö|&Ouml;)schel};
__END__

------------------------------------------------------------
-----
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl 5.10.0:

Configured by eserte at Wed Sep 19 23:41:00 CEST 2007.

Summary of my perl5 (revision 5 version 10 subversion 0
patch 31894) configuration:
  Platform:
    osname=freebsd, osvers=6.2-release,
archname=amd64-freebsd
    uname='freebsd biokovo-amd64.herceg.de 6.2-release
freebsd 6.2-release #0: fri jan 12 08:32:24 utc 2007
rootportnoy.cse.buffalo.edu:usrobjusrsrcsysgeneric amd64
'
    config_args='-Dprefix=/usr/perl5.10.0 -D cc=ccache cc
-Dgccansipedantic -de'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define,
usesocks=undef
    use64bitint=define, use64bitall=define,
uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='ccache cc', ccflags ='-DHAS_FPSETMASK
-DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe
-I/usr/local/include',
    optimize='-O2 -pipe',
    cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H
-fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='3.4.6 [FreeBSD] 20060305',
gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8,
byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='ccache cc', ldflags ='-Wl,-E  -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib
    libs=-lgdbm -lm -lcrypt -lutil -lc
    perllibs=-lm -lcrypt -lutil -lc
    libc=, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags='-DPIC -fPIC', lddlflags='-shared 
-L/usr/local/lib'

Locally applied patches:
    DEVEL

---
INC
for perl 5.10.0:
    /usr/perl5.10.0/lib/5.10.0/amd64-freebsd
    /usr/perl5.10.0/lib/5.10.0
    /usr/perl5.10.0/lib/site_perl/5.10.0/amd64-freebsd
    /usr/perl5.10.0/lib/site_perl/5.10.0
    .

---
Environment for perl 5.10.0:
    HOME=/home/e/eserte
    LANG (unset)
    LANGUAGE (unset)
    LC_ALL=de_DE.ISO8859-1
    LC_CTYPE=de_DE.ISO8859-1
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
   
PATH=/usr/X11R6/bin:/usr/X11/bin:/usr/local/bin:/usr/bin:/bi
n:/usr/gnu/bin:/usr/TeX/bin:/usr/local/sbin:/usr/sbin:/sbin:
/usr/local/pilot/bin:/home/e/eserte/bin/FreeBSD:/home/e/eser
te/bin/sh:/home/e/eserte/bin:/usr/X386/bin:/usr/games:/home/
e/eserte/devel
    PERL_BADLANG (unset)
    PERL_HTML_DISPLAY_CLASS=HTML:isplay::
Mozilla
    SHELL=/bin/tcsh


Re: Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-21 05:26:07
On 9/20/07, via RT srezic  cpan. org
<perlbug-followupperl.org> wrote:
> # New Ticket Created by  sreziccpan.org
> # Please include the string:  [perl #45605]
> # in the subject line of all future correspondence
about this issue.
> # <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=45605 >
>
>
> This is a bug report for perl from sreziccpan.org,
> generated with the help of perlbug 1.36 running under
perl 5.10.0.
>
>
>
------------------------------------------------------------
-----
> The script below works as expected until perl 5.8.8
(i.e. it prints "1").
> With perl5.10.0 the pattern does not match anymore.
>
> Regards,
>     Slaven
>
> #!perl
> $string = 'Öschel';
> utf8::upgrade($string);
> warn $string =~ m{(?:Ö|&Ouml;)schel};
> __END__

I dont have a blead handy right now to test with, could
someone please
send me the output of this with a

use re Debug=>'ALL';

right before the warn statement.

Cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Re: Regexp failure with utf8-flagged string and byte-flagged pattern
user name
2007-09-21 05:26:22
MOIN,

ON THURSDAY 20 SEPTEMBER 2007 23:44:46 SREZICCPAN.ORG
WROTE:
> # NEW TICKET CREATED BY  SREZICCPAN.ORG
> # PLEASE INCLUDE THE STRING:  [PERL #45605]
> # IN THE SUBJECT LINE OF ALL FUTURE CORRESPONDENCE
ABOUT THIS ISSUE.
> # <URL:
HTTP://RT.PERL.ORG/RT3/TICKET/DISPLAY.HTML?ID=45605 >
>
>
> THIS IS A BUG REPORT FOR PERL FROM SREZICCPAN.ORG,
> GENERATED WITH THE HELP OF PERLBUG 1.36 RUNNING UNDER
PERL 5.10.0.
>
>
>
------------------------------------------------------------
-----
> THE SCRIPT BELOW WORKS AS EXPECTED UNTIL PERL 5.8.8
(I.E. IT PRINTS "1").
> WITH PERL5.10.0 THE PATTERN DOES NOT MATCH ANYMORE.
>
> REGARDS,
>     SLAVEN
>
> #!PERL
> $STRING = 'ÖSCHEL';
> UTF8::UPGRADE($STRING);
> WARN $STRING =~ M{(?:Ö|&OUML;)SCHEL};
> __END__

I DON'T SEE "USE UTF8;" IN YOUR EXAMPLE, SO, IN
WHAT ENCODING IS THE SCRIPT? 
LIKEWISE, THAT MEANS, IN WHAT ENCODING IS THE $STRING AND IN
WHAT IS THE 
REGEXP?

ALL THE BEST,

TELS

-- 
 SIGNED ON FRI SEP 21 12:24:58 2007 WITH KEY 0X93B84C15.
 VIEW MY PHOTO GALLERY: HTTP://BLOODGATE.COM/PHOTOS
 PGP KEY ON HTTP://BLOODGATE.COM/TELS.ASC OR PER EMAIL.

 "MOST PEOPLE, I THINK, DON'T EVEN KNOW WHAT A ROOTKIT
IS, SO WHY SHOULD
 THEY CARE ABOUT IT?"

  -- THOMAS HESSE, PRESIDENT OF SONY BMG'S GLOBAL DIGITAL
BUSINESS DIVISION, 
2005.
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )