(Previously posted at http://www.p
erlmonks.org/?node_id=636266 )
I went to demerphq's 5.10 regex talk at YAPC::Europe and
I've been
overwhelmed by the number of new features in the regex
syntax.
However, unless I'm missing something in the talk and perlre
there
still seems no simple way to use Perl code as an assertion.
When I say
"as an assertion" I mean that I assert that some
condition must be met
if the (sub)pattern is to match.
The way I would assert in a 5.8 regex that some expression,
say
whatever(), must be true would be.
/(?(?{ whatever() })|(?!))/
The latter can, in 5.10, be written...
/(?(?{ whatever() })|(*FAIL))/
... which almost reads naturally as whatever or fail but
this still
seems unduly complex for what I would have thought would be
a common
desire. Indeed I've always felt that this should be the
semantics of
simple code constructs. (After all the documentation does
call them
assertions).
/(?{ whatever() })/
In previous discussion Ilya Zakharevich suggested that we
could add
flags between the closing brackets so perhaps we could
use...
/(?{ whatever() }?)/
...so I've written a patch to do that.
BTW: I'm posting this via Google Groups. I hope that's
acceptable
etiquette and I'm hoping it doesn't break the patch.
Here's the patch.
diff -u --recursive origperl-5.9.5/pod/perlre.pod
perl-5.9.5/pod/
perlre.pod
--- origperl-5.9.5podperlre.pod Sat Jul 07 15:40:28 2007
+++ perl-5.9.5podperlre.pod Mon Sep 03 09:59:35 2007
 -880,6
+880,8 
may be used instead of C<< k<NAME> >> in
Perl 5.10 or later.
=item C<(?)>
+
+=item C<(??)>
X<(?{})> X<regex, code in> X<regexp, code
in> X<regular expression,
code in>
B<WARNING>: This extended regular expression feature
is considered
 -887,18
+889,24 
has side effects may not perform identically from version
to version
due to the effect of future optimisations in the regex
engine.
-This zero-width assertion evaluates any embedded Perl code.
It
-always succeeds, and its C<code> is not interpolated.
Currently,
-the rules to determine where the C<code> ends are
somewhat
convoluted.
+These zero-width assertions evaluate any embedded Perl
code. In the
+form without the terminal C<?> the assertion always
succeeds. The
+C<(??)> form the assertion only succeeds if
C<code> returns
+true. In either case its C<code> is not interpolated.
Currently, the
+rules to determine where the C<code> ends are
somewhat convoluted.
This feature can be used together with the special variable
C<$^N> to
-capture the results of submatches in variables without
having to keep
+restrict or save the results of submatches without having
to keep
track of the number of nested parentheses. For example:
$_ = "The brown fox jumps over the lazy dog";
/the (S+)(?{ $color = $^N }) (S+)(?{ $animal = $^N
})/i;
print "color = $color, animal = $animaln";
+ known_animal{ qw( cat dog fox horse rabbit rat ) } =
();
+ animals = /b(w++)(?{ exists $known_animal{$^N}
}?)/g;
+ print " animalsn";
+
Inside the C<(?{...})> block, C<$_> refers to
the string the regular
expression is matching against. You can also use
C<pos()> to know
what is
the current position of matching within this string.
 -925,11
+933,12 
introduced value, because the scopes that restrict
C<local> operators
are unwound.
-This assertion may be used as a
C<(?(condition)yes-pattern|no-
pattern)>
-switch. If I<not> used in this way, the result of
evaluation of
-C<code> is put into the special variable
C<$^R>. This happens
-immediately, so C<$^R> can be used from other
C<(?)>
assertions
-inside the same regular expression.
+These assertions may be used as a
+C<(?(condition)yes-pattern|no-pattern)> switch and
the two forms are
+equivalent when used in this way. If I<not> used in
this way, the
+result of evaluation of C<code> is put into the
special variable
+C<$^R>. This happens immediately, so C<$^R>
can be used from other
+C<(?)> assertions inside the same regular
expression.
The assignment to C<$^R> above is properly localized,
so the old
value of C<$^R> is restored if the assertion is
backtracked; compare
diff -u --recursive origperl-5.9.5/regcomp.c
perl-5.9.5/regcomp.c
--- origperl-5.9.5/regcomp.c Sat Jul 07 15:40:26 2007
+++ perl-5.9.5/regcomp.c Sun Sep 02 22:48:48 2007
 -5690,6
+5690,7 
U32 n = 0;
char c;
char *s = RExC_parse;
+ bool is_failable = 0;
RExC_seen_zerolen++;
RExC_seen |= REG_SEEN_EVAL;
 -5704,6
+5705,12 
count--;
RExC_parse++;
}
+
+ if (!is_logical && *RExC_parse == '?') {
+ /* (?{...}?) but not (??{...}?) */
+ is_failable = 1;
+ RExC_parse++;
+ }
if (*RExC_parse != ')') {
RExC_parse = s;
vFAIL("Sequence (?{...}) not terminated or not
{}-balanced");
 -5711,7
+5718,7 
if (!SIZE_ONLY) {
PAD *pad;
OP_4tree *sop, *rop;
- SV * const sv = newSVpvn(s, RExC_parse - 1 - s);
+ SV * const sv = newSVpvn(s, RExC_parse - 1 -
is_failable - s);
ENTER;
Perl_save_re_context(aTHX);
 -5742,10
+5749,14 
}
nextchar(pRExC_state);
- if (is_logical) {
+ if (is_logical || is_failable ) {
+ /* (??{...}} or (?{...}?) but, if we
could
manage,
+ not (?(?{ ... }?)...). That's rather
difficult
+ so instead work round it in regmatch.
+ --
nobull */
ret = reg_node(pRExC_state, LOGICAL);
if (!SIZE_ONLY)
- ret->flags = 2;
+ ret->flags = is_logical ? 2 : 3;
REGTAIL(pRExC_state, ret,
reganode(pRExC_state,
EVAL, n));
/* deal with the length of this later -
MJD */
return ret;
 -5767,6
+5778,9 
ret = reg_node(pRExC_state, LOGICAL);
if (!SIZE_ONLY)
ret->flags = 1;
+ /* Need somehow to tell /(?{...)?)/ that it
+ should behave like /(?{...})/ as
part of
+ if construct
-- nobull
*/
REGTAIL(pRExC_state, ret,
reg(pRExC_state, 1,
&flag,depth+1));
goto insert_if;
}
diff -u --recursive origperl-5.9.5/regexec.c
perl-5.9.5/regexec.c
--- origperl-5.9.5/regexec.c Sat Jul 07 15:40:26 2007
+++ perl-5.9.5/regexec.c Sun Sep 02 21:19:53 2007
 -2713,6
+2713,7 
0: (?{...})
1: (?(?{...})X|Y)
2: (??{...})
+ 3: (?{...}?)
or the following IFMATCH/UNLESSM is:
false: plain (?=foo)
true: used as a condition: (?(?=foo))
 -3669,9
+3670,13 
PL_op = oop;
PAD_RESTORE_LOCAL(old_comppad);
PL_curcop = ocurcop;
- if (!logical) {
- /* /(?{...})/ */
+ if (!logical || logical == 3 ) {
+ /* /(?{...})/ or /(?{...}?) */
sv_setsv(save_scalar(PL_replgv), ret);
+ if ( logical && !SvTRUE(ret) ) {
+ /* (?{...}?/ assertion failed */
+ sayNO;
+ }
break;
}
}
 -3875,7
+3880,12 
}
break;
case LOGICAL:
- logical = scan->flags;
+ /* (?(?{...}?)...) will insert a second redundant
LOGICAL
+ op which we here ignore. This is messy, sorry
+ --nobull
*/
+ if ( !logical ) {
+ logical = scan->flags;
+ }
break;
/**********************************************************
*********
diff -u --recursive origperl-5.9.5/t/op/re_tests
perl-5.9.5/t/op/
re_tests
--- origperl-5.9.5/t/op/re_tests Sat Jul 07 15:40:24 2007
+++ perl-5.9.5/t/op/re_tests Mon Sep 03 08:57:01 2007
 -1338,3
+1338,10 
.*z foon y - -
^(?:(d)x)?d$ 1 y ${(defined($1)?1:0)} 0
.*?(?:(w)|(w))x abx y $1-$2 b-
+x(???) x c - Sequence (?{...}) not terminated
+x(??) x y $& x
+x(??) x n - -
+(?(??)x|y) x y $& x
+(?(??)x|y) x n - -
+(?(??)y|x) x n - -
+(?(??)y|x) x y $& x
No newline at end of file
|
On 9/14/07, nobull67 gmail.com <nobull67 gmail.com> wrote:
> (Previously posted at http://www.p
erlmonks.org/?node_id=636266 )
>
> I went to demerphq's 5.10 regex talk at YAPC::Europe
and I've been
> overwhelmed by the number of new features in the regex
syntax.
> However, unless I'm missing something in the talk and
perlre there
> still seems no simple way to use Perl code as an
assertion. When I say
> "as an assertion" I mean that I assert that
some condition must be met
> if the (sub)pattern is to match.
>
> The way I would assert in a 5.8 regex that some
expression, say
> whatever(), must be true would be.
>
> /(?(?{ whatever() })|(?!))/
>
> The latter can, in 5.10, be written...
>
> /(?(?{ whatever() })|(*FAIL))/
>
> ... which almost reads naturally as whatever or fail
but this still
> seems unduly complex for what I would have thought
would be a common
> desire. Indeed I've always felt that this should be the
semantics of
> simple code constructs. (After all the documentation
does call them
> assertions).
>
> /(?{ whatever() })/
>
> In previous discussion Ilya Zakharevich suggested that
we could add
> flags between the closing brackets so perhaps we could
use...
>
> /(?{ whatever() }?)/
>
> ...so I've written a patch to do that.
>
> BTW: I'm posting this via Google Groups. I hope that's
acceptable
> etiquette and I'm hoping it doesn't break the patch.
When in doubt attach the patch instead of inlining it. But I
havent
tried it to see.
I like this patch, although i have some minor concerns:
1. It adds yet another meaning to '?', which is currently
probably the
most overloaded of all of the regex special characters. Im
inclined to
suggest that the syntax should be
(?{...}!)
instead, although the bang character is maybe not the most
noticable
character. But in my head it matches the purpose better than
?. IOW:
ThisMustBeTrueOrElse!
2. Maybe this is totally wrong, but i would have expected
this patch
to include changes to toke.c as some of the handling of
these
constructs occurs there. However, I havent looked into it in
detail so
maybe there is no need to change that file.
Cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
|