List Info

Thread: $1 suddenly tainted after regexp on utf-8 string




$1 suddenly tainted after regexp on utf-8 string
user name
2007-11-06 09:38:36
# New Ticket Created by  dstheise.de 
# Please include the string:  [perl #47195]
# in the subject line of all future correspondence about
this issue. 
# <URL: h
ttp://rt.perl.org/rt3/Ticket/Display.html?id=47195 >



This is a bug report for perl from dstheise.de,
generated with the help of perlbug 1.35 running under perl
v5.8.8.


------------------------------------------------------------
-----

After applying a regex using S on a string which is flagged
as utf-8
the submatch is unexpectedly tainted.  You can reproduce the
behaviour
with this code:

#!/usr/bin/perl -T

use warnings;
use strict;
use Scalar::Util qw(tainted);

my $line = 'A';
utf8::upgrade($line);
$line =~ /(S*)/;
my $first = $1;
print tainted($first) . "n";

__END__

$first is unexpectedly tainted.  If you comment out the line
which
flags $line as utf-8, $first is not tainted any longer.

The behaviour is identical on a self-built perl5.9.5.

------------------------------------------------------------
-----
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl v5.8.8:

Configured by Debian Project at Wed Dec  6 23:17:41 UTC
2006.

Summary of my perl5 (revision 5 version 8 subversion 8)
configuration:
  Platform:
    osname=linux, osvers=2.6.18.3,
archname=i486-linux-gnu-thread-multi
    uname='linux saens 2.6.18.3 #1 smp sat nov 25 13:39:52
est 2006 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles
-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC
-Darchname=i486-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.8.8
-Dsitearch=/usr/local/lib/perl/5.8.8
-Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3
-Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1
-Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs
-Ud_csh -Uusesfio -Uusenm -Duseshrplib
-Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef
useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE
-DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe
-I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.1.2 20061115 (prerelease)
(Debian 4.1.1-20)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc
-lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.6.so, so=so, useshrplib=true,
libperl=libperl.so.5.8.8
    gnulibc_version='2.3.6'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef,
ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared
-L/usr/local/lib'

Locally applied patches:
    

---
INC
for perl v5.8.8:
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    .

---
Environment for perl v5.8.8:
    HOME=/net/juan01.heise.de/opt/home/dst
    LANG=en_GB
    LANGUAGE=en_DE:en_US:en_GB:en
    LC_COLLATE=POSIX
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
   
PATH=/net/juan01.heise.de/opt/home/dst/lib/git/bin:/net/juan
01.heise.de/opt/home/dst/bin:/net/juan01.heise.de/opt/home/d
st/lib/git/bin:/net/juan01.heise.de/opt/home/dst/bin:/usr/lo
cal/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Re: $1 suddenly tainted after regexp on utf-8 string
user name
2007-11-06 09:59:01
On Tue, Nov 06, 2007 at 07:38:36AM -0800, dst  heise.
de wrote:

> After applying a regex using S on a string which is
flagged as utf-8
> the submatch is unexpectedly tainted.  You can
reproduce the behaviour
> with this code:
> 
> #!/usr/bin/perl -T
> 
> use warnings;
> use strict;
> use Scalar::Util qw(tainted);
> 
> my $line = 'A';
> utf8::upgrade($line);
> $line =~ /(S*)/;
> my $first = $1;
> print tainted($first) . "n";
> 
> __END__
> 
> $first is unexpectedly tainted.  If you comment out the
line which
> flags $line as utf-8, $first is not tainted any
longer.

It's in still in blead. I thought that it was something to
do with the swash
loading code:

$ cat 47195.pl
my $line = 'A1' . substr ($^X, 0, 0) . chr shift;
chop $line;
$line =~ /(AS*)/;
print eval { join("",$1), kill 0; 1 } ?
"untaintedn" : "taintedn";

my $line = 'A1' . substr ($^X, 0, 0) . chr shift;
chop $line;
$line =~ /(AS*)/;
print eval { join("",$1), kill 0; 1 } ?
"untaintedn" : "taintedn";

__END__
$ ./perl -Ilib -T 47195.pl 256
tainted
untainted


But it doesn't seem to be, because if I change the program
to be a loop:

$ cat 47195.pl
for (0..1) {
    my $line = 'A1' . substr ($^X, 0, 0) . chr $ARGV[0];
    chop $line;
    $line =~ /(AS*)/;
    print eval { join("",$1), kill 0; 1 } ?
"untaintedn" : "taintedn";
}

__END__
$ ./perl -Ilib -T 47195.pl 256
tainted
tainted


the results change. (Correct output would be
"untainted" both times)

Certainly, the solution isn't this simple:

$ p4 diff
==== //depot/perl/regcomp.c#618 -
/home/nick/p4perl/perl/regcomp.c ====
--- /tmp/tmp.59321.0    Tue Nov  6 15:55:47 2007
+++ /home/nick/p4perl/perl/regcomp.c    Tue Nov  6 15:54:11
2007
 -9627,7
+9627,10  Perl_save_re_context(pTHX)
     struct re_save_state *state;
 
     SAVEVPTR(PL_curcop);
-    SSGROW(SAVESTACK_ALLOC_FOR_RE_SAVE_STATE + 1);
+    /* Add 3 for the save bool, and do the check in one to
avoid a possible
+       second reallocation.  */
+    SSGROW(SAVESTACK_ALLOC_FOR_RE_SAVE_STATE + 1 + 3);
+    SAVEBOOL(PL_tainted);
 
     state = (struct re_save_state *)(PL_savestack +
PL_savestack_ix);
     PL_savestack_ix += SAVESTACK_ALLOC_FOR_RE_SAVE_STATE;


Nicholas Clark

$1 suddenly tainted after regexp on utf-8 string
user name
2007-11-06 18:18:45
On Nov 06 2007, dstheise.de wrote:
> 
> After applying a regex using S on a string which is
flagged as utf-8
> the submatch is unexpectedly tainted.  You can
reproduce the behaviour
> with this code:
> 
> #!/usr/bin/perl -T
> 
> use warnings;
> use strict;
> use Scalar::Util qw(tainted);
> 
> my $line = 'A';
> utf8::upgrade($line);
> $line =~ /(S*)/;
> my $first = $1;
> print tainted($first) . "n";
> 
> __END__

Attached patch fixes this.  I had a look through the rest of
the switch
statement to see if there were any other mis(sing|placed)
breaks and I'm
pretty sure there should be one at the end of case NDIGIT:

    case NDIGIT:
        if (do_utf8) {
            loceol = PL_regeol;
            LOAD_UTF8_CHARCLASS_DIGIT();
            while (hardcount < max && scan <
loceol &&
                   !swash_fetch(PL_utf8_digit, (U8*)scan,
do_utf8)) {
                scan += UTF8SKIP(scan);
                hardcount++;
            }
        } else {
            while (scan < loceol &&
!isDIGIT(*scan))
                scan++;
        }
    case LNBREAK:

Case LNBREAK was added with change 31026 which looks like it
may have
missed adding break to a few switch statements.  I haven't
looked at it
closely, though, just browsed the change.

    http://public.activestate.com/cgi-bin/perlbrowse/p/31026


-- 
Rick Delaney
rickbort.ca

  
[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )