List Info

Thread: g suffix on string search (/.../g) can cause string corruption




g suffix on string search (/.../g) can cause string corruption
user name
2007-10-19 18:15:05
# New Ticket Created by  owlbarnowl.research.intel-research.net 
# Please include the string:  [perl #46563]
# in the subject line of all future correspondence about
this issue. 
# <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=46563 >



This is a bug report for perl from dgay42gmail.com,
generated with the help of perlbug 1.35 running under perl
v5.8.8.


------------------------------------------------------------
-----
[Please enter your report here]

The following code prints "a" rather than
"z":
$z = "z";
$z = sprintf "aaa" if $z =~ /(.)/g;
printf "$1n";

The problem does not occur if the g suffix is removed from
the search.

The problem is that $1 is aliased with $z, and $z gets
subsequently
overwritten. The following *might* be a fix, to the extent
that I've
understood pp_match (...):

--- pp_hot.c    2006-09-29 17:28:06.000000000 -0700
+++ pp_hot_fix.c        2007-10-19 16:11:28.000000000 -0700
 -1303,7
+1303,7 
            }
        }
     }
-    if ((!global && rx->nparens)
+    if ((rx->nparens)
            || SvTEMP(TARG) || PL_sawampersand)
        r_flags |= REXEC_COPY_STR;
     if (SvSCREAM(TARG))

One "minor" drawback is that making this fix
causes SPEC2006's perl
test to consume very large amounts of memory (and fail when
used with
a 32-bit address space, at least...).


[Please do not change anything below this line]
------------------------------------------------------------
-----
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl v5.8.8:

Configured by owl at Tue Sep 25 23:53:19 PDT 2007.

Summary of my perl5 (revision 5 version 8 subversion 8)
configuration:
  Platform:
    osname=darwin, osvers=8.10.1, archname=darwin-2level
    uname='darwin barnowl.research.intel-research.net 8.10.1
darwin kernel version 8.10.1: wed may 23 16:33:00 pdt 2007;
rootnu-79
2.22.5~1release_i386 i386 i386 '
    config_args='-des -Dprefix=/opt/local
-Dccflags=-I'/opt/local/include' -Dldflags=-L/opt/local/lib
-Dvendorprefix=/opt/local -Dcc=/usr/bin/gcc-4.0'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='/usr/bin/gcc-4.0', ccflags ='-I/opt/local/include
-fno-common -DPERL_DARWIN -no-cpp-precomp
-fno-strict-aliasing -pipe -Wdeclaration-after-statement
-I/opt/local/include',
    optimize='-O3',
    cppflags='-no-cpp-precomp -I/opt/local/include
-fno-common -DPERL_DARWIN -no-cpp-precomp
-fno-strict-aliasing -pipe -Wdeclaration-after-statement
-I/opt/local/include'
    ccversion='', gccversion='4.0.1 (Apple Computer, Inc.
build 5363)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags
='-L/opt/local/lib -L/usr/local/lib'
    libpth=/usr/local/lib /opt/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lc
    perllibs=-ldl -lm -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false,
libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle
-undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches:
    

---
INC
for perl v5.8.8:
    /opt/local/lib/perl5/5.8.8/darwin-2level
    /opt/local/lib/perl5/5.8.8
    /opt/local/lib/perl5/site_perl/5.8.8/darwin-2level
    /opt/local/lib/perl5/site_perl/5.8.8
    /opt/local/lib/perl5/site_perl
    /opt/local/lib/perl5/vendor_perl/5.8.8/darwin-2level
    /opt/local/lib/perl5/vendor_perl/5.8.8
    /opt/local/lib/perl5/vendor_perl
    .

---
Environment for perl v5.8.8:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/owl
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
   
PATH=/Users/owl/bin:/usr/X11R6/bin:/usr/local/bin:/opt/local
/bin:/opt/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Re: g suffix on string search (/.../g) can cause string corruption
user name
2007-10-20 06:30:00
On 10/20/07, via RT owl  barnowl. research.
intel-research. net
<perlbug-followupperl.org> wrote:
> # New Ticket Created by  owlbarnowl.research.intel-research.net
> # Please include the string:  [perl #46563]
> # in the subject line of all future correspondence
about this issue.
> # <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=46563 >
>
>
>
> This is a bug report for perl from dgay42gmail.com,
> generated with the help of perlbug 1.35 running under
perl v5.8.8.
>
>
>
------------------------------------------------------------
-----
> [Please enter your report here]
>
> The following code prints "a" rather than
"z":
> $z = "z";
> $z = sprintf "aaa" if $z =~ /(.)/g;
> printf "$1n";
>
> The problem does not occur if the g suffix is removed
from the search.
>
> The problem is that $1 is aliased with $z, and $z gets
subsequently
> overwritten. The following *might* be a fix, to the
extent that I've
> understood pp_match (...):
>
> --- pp_hot.c    2006-09-29 17:28:06.000000000 -0700
> +++ pp_hot_fix.c        2007-10-19 16:11:28.000000000
-0700
>  -1303,7 +1303,7 
>             }
>         }
>      }
> -    if ((!global && rx->nparens)
> +    if ((rx->nparens)
>             || SvTEMP(TARG) || PL_sawampersand)
>         r_flags |= REXEC_COPY_STR;
>      if (SvSCREAM(TARG))
>
> One "minor" drawback is that making this fix
causes SPEC2006's perl
> test to consume very large amounts of memory (and fail
when used with
> a 32-bit address space, at least...).

This is a known, "wont fix" (at least for now) bug
in the regex engine.

Your patch was actually done by me over a year ago and then
retracted
as it makes matches using scalar /g in a loop go quadratic,
which can
have punishing consequences on code that uses it on long
strings in
such places as while (/.../g) {...}. The workaround is to
ensure that
you do not modify the target of a /g match in between a
successful
match and accessing the magic variables. Since this is
considered to
be a rare situation in comparison to using /g on long
strings we have
decided to accept the lessor of two weevils (bugs).

A proper solution hopefully will be forthcoming in 5.12
where I hope
to find the time to completely redesign the way regex match
results
are stored, how regex magic is applied to SV's and how
target string
copying occurs.

Basically what we need to do is have a flag on the sv that
says "ive
been copied for /g" which is reset if the sv is
modified, we then need
to change the logic for scalar /g matches to ensure that it
checks the
value and does the copy if it is not set, and then reuses
the copy
henceforth. I tried to get something along these lines
working but my
attempts were crude and met with failure, and lack of tuits
prevented
me from going the long and hard route of dealing with magic
on the sv
and etc.

So for the time being the solution to this problem is
"dont do that".

However we probably should document this issue of scalar
context /g
matches. Currently I dont think it is documented anywhere.

For now and for older perls this bug is firmly in the
"wont fix"
category. Sorry.

Thanks for your report however. 

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )