|
List Info
Thread: Named-capture regex syntax
|
|
| Named-capture regex syntax |

|
2006-12-12 16:02:02 |
Hi, all. I have a concern about the new named-capture regex
syntax. I
discussed it with Yves at LPW last weekend, and he
encouraged me to post
here and seek comments.
In Perl 5.9.4 and earlier, there's a trivial guarantee that
(?...) in a
regex never captures, regardless of what "..."
might be. The new named
captures in 5.9.5 break that guarantee:
(?<name>pattern) and
(?'name'pattern) have that syntactic form, but they capture
anyway.
I think it would nice to preserve that property if possible.
To that
end, I have a concrete syntax proposal:
(+<name>pattern)
(+'name'pattern)
That is: just the same as the current syntax, but with +
instead of ?.
The + is conveniently reminiscent of the %+ used to find the
text
captured by named groups.
Sequences of open-paren followed by a quantifier should
currently be
errors (except for the existing (?...) syntaxes), and
therefore should
be available for use in extensions. So I believe this
wouldn't break
any code that works with a non-blead release of Perl.
There's also room for other (+...) syntaxes to be used by
future
extensions.
I understand that the current syntax was borrowed from .NET;
that does
sound like an argument in favour of keeping it, to provide
familiarity
for people who know .NET regexes. However, I think
consistency with the
rest of Perl's regex syntax should win here -- especially
given that we
don't provide semantic parity with .NET's named captures.
Thoughts?
Oh, and while I'm here: huge thanks to all those who've
worked on the
excellent new regex features in bleadperl.
--
Aaron Crane
|
|
| Named-capture regex syntax |

|
2006-12-12 17:28:21 |
On 12/12/06, Aaron Crane <perl aaroncrane.co.uk> wrote:
> Hi, all. I have a concern about the new named-capture
regex syntax. I
> discussed it with Yves at LPW last weekend, and he
encouraged me to post
> here and seek comments.
>
> In Perl 5.9.4 and earlier, there's a trivial guarantee
that (?...) in a
> regex never captures, regardless of what
"..." might be. The new named
> captures in 5.9.5 break that guarantee:
(?<name>pattern) and
> (?'name'pattern) have that syntactic form, but they
capture anyway.
Well no, that's (?:...). I don't think there's ever been
that
guarantee you're speaking of. I can't find it in the pod
anyway. All I
find are references that "(?" means "Whoa
there! Question what's going
on."
Perhaps you picked this up from some non- pod/* part of perl
culture?
If that's so...
I prefer that we retain the current syntax, FWIW.
Josh
|
|
| Named-capture regex syntax |

|
2006-12-12 17:37:57 |
On Tue, Dec 12, 2006 at 09:28:21AM -0800, Joshua ben Jore
wrote:
> On 12/12/06, Aaron Crane <perl aaroncrane.co.uk> wrote:
> >Hi, all. I have a concern about the new
named-capture regex syntax. I
> >discussed it with Yves at LPW last weekend, and he
encouraged me to post
> >here and seek comments.
> >
> >In Perl 5.9.4 and earlier, there's a trivial
guarantee that (?...) in a
> >regex never captures, regardless of what
"..." might be. The new named
> >captures in 5.9.5 break that guarantee:
(?<name>pattern) and
> >(?'name'pattern) have that syntactic form, but they
capture anyway.
>
> Well no, that's (?:...). I don't think there's ever
been that
> guarantee you're speaking of. I can't find it in the
pod anyway. All I
> find are references that "(?" means
"Whoa there! Question what's going
> on."
(?:...) is specifically a non-capturing group. However,
Aaron is correct
that none of the existing (?...) syntaxes capture. We have
comments,
embedded modifiers, various assertions, postponed
subexpressions, and
conditional expressions, none of which capture.
Ronald
|
|
| Named-capture regex syntax |

|
2006-12-13 08:17:37 |
On 2006–12–12, at 17:02, Aaron Crane wrote:
> In Perl 5.9.4 and earlier, there's a trivial guarantee
that (?...)
> in a
> regex never captures, regardless of what
"..." might be. The new
> named
> captures in 5.9.5 break that guarantee:
(?<name>pattern) and
> (?'name'pattern) have that syntactic form, but they
capture anyway.
>
> I think it would nice to preserve that property if
possible. To that
> end, I have a concrete syntax proposal:
>
> (+<name>pattern)
> (+'name'pattern)
From the release notes for pcre at <http://sourceforge.ne
t/project/
shownotes.php?release_id=469341&group_id=10194>:
> Release 4.0 17-Feb-03
> ---------------------
>
> 6. Support for named subpatterns. The Python syntax
(?P<name>...)
> is used to name a group.
(Although it doesn't answer Aaron's plea) I think that Perl
should
follow (or at least support) the same syntax. I just hate
having to
ask myself "how does this tool express that re
construct?" --
although I've become pretty good at answering over the
years.
--
Dominic Dunlop
|
|
| Named-capture regex syntax |

|
2006-12-13 09:18:43 |
Aaron Crane <perl aaroncrane.co.uk> wrote:
:In Perl 5.9.4 and earlier, there's a trivial guarantee that
(?...) in a
:regex never captures, regardless of what "..."
might be. The new named
:captures in 5.9.5 break that guarantee:
(?<name>pattern) and
:(?'name'pattern) have that syntactic form, but they capture
anyway.
What about /(?=(x))/? Or do you mean only that the outer
parens don't
themselves form a capturing group?
Hugo
|
|
| Named-capture regex syntax |

|
2006-12-13 15:28:59 |
On Wed, Dec 13, 2006 at 09:18:43AM +0000, hv crypt.org
wrote:
> Aaron Crane <perl aaroncrane.co.uk> wrote:
> :In Perl 5.9.4 and earlier, there's a trivial guarantee
that (?...) in a
> :regex never captures, regardless of what
"..." might be. The new named
> :captures in 5.9.5 break that guarantee:
(?<name>pattern) and
> :(?'name'pattern) have that syntactic form, but they
capture anyway.
>
> What about /(?=(x))/? Or do you mean only that the
outer parens don't
> themselves form a capturing group?
Yes, as Aaron said, the (?=) in that regex does not capture.
That a
non-capturing group can contain a capturing group doesn't
invalidate his
point.
Look at it this way. You're looking at a regex, wondering
how many
capturing groups there are. You simply scan through the
regex, counting
left parentheses, but skipping any that are followed by a
question mark.
I think that's a useful convention, although matching the
syntax other
languages use for named captures is also useful.
Ronald
|
|
| Named-capture regex syntax |

|
2006-12-13 15:45:27 |
Ronald J Kimball writes:
> You're looking at a regex, wondering how many capturing
groups there
> are. You simply scan through the regex, counting left
parentheses,
> but skipping any that are followed by a question mark.
I think that's
> a useful convention,
Yes. (I did come on a bit strong describing it as a
"guarantee"; it's
more like a property that someone could reasonably derive
from Perl
<5.9.5.)
> although matching the syntax other languages use for
named captures is
> also useful.
I also agree with that. In this case, though, there are
already two
syntaxes in use elsewhere -- (?<name>pattern) in .NET,
and
(?P<name>pattern) in PCRE/Python.
That's not in itself an argument for introducing a third
syntax, but it
does weaken the argument that consistency with other systems
is
important. And I think consistency with the (derivable)
properties of
Perl regexes in <5.9.5 is a more useful goal for Perl
5.10 than
consistency with either .NET or Python.
--
Aaron Crane
|
|
| Named-capture regex syntax |

|
2006-12-13 17:45:25 |
On 2006–12–13, at 16:45, Aaron Crane wrote:
> I also agree with that. In this case, though, there
are already two
> syntaxes in use elsewhere -- (?<name>pattern) in
.NET, and
> (?P<name>pattern) in PCRE/Python.
Patches speaking louder than words, I've attached one that
adds,
documents and tests PCRE/Python-compatible named-capture
support to
bleadperl while retaining support for the .NET style as
originally
implemented. As a late arrival at this particular party, I
think
Perl's best policy is to try to please everybody. There's
more than
one way to do it.
The patch is ugly. My excuse is that it matches the style of
the
source... It doesn't break any tests for me, but there may
be
compilers that object to me sticking a label where I've
stuck one --
even though, AFAICT, it's valid C89.
(No offence to recent patchers: regcomp.c was already ugly
in 5.6.2,
which is the earliest version I happen to have around.)
--
Dominic Dunlop
|
|
| Named-capture regex syntax |

|
2006-12-13 18:03:47 |
Dominic Dunlop wrote:
> Patches speaking louder than words, I've attached one
that adds,
> documents and tests PCRE/Python-compatible
named-capture support to
> bleadperl while retaining support for the .NET style as
originally
> implemented. As a late arrival at this particular
party, I think Perl's
> best policy is to try to please everybody. There's more
than one way to
> do it.
If my vote counted, I'd be +1 on maintaining compatibility
with
PCRE/Python first and foremost, with .NET style secondary in
importance.
I am not happy about the heuristic of (?...) always
meaning a
non-capturing syntax, since that drastically reduces the
ability of
future implementations to use that prefix to mean other more
useful things.
My 2 cents...
John
--
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Boulevard
Suite H
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5748
|
|
| Named-capture regex syntax |

|
2006-12-13 18:38:04 |
On 12/13/06, Aaron Crane <perl aaroncrane.co.uk> wrote:
> Ronald J Kimball writes:
> > You're looking at a regex, wondering how many
capturing groups there
> > are. You simply scan through the regex, counting
left parentheses,
> > but skipping any that are followed by a question
mark. I think that's
> > a useful convention,
>
> Yes. (I did come on a bit strong describing it as a
"guarantee"; it's
> more like a property that someone could reasonably
derive from Perl
> <5.9.5.)
Huh. I wouldn't have even though "reasonably
derive" because if the
construct were different, it's merely different and there's
no knowing
what it might do. If you saw "(?^" ... ") you
wouldn't know what it
did, period. You can't reasonably infer that there's no
capturing
involved or what the heck is going on.
Maybe that extension means "match the innards only
approximately" or
maybe it means "capture the stuff in here but do it in
rot13." I can
see uses for both of those.
> I also agree with that. In this case, though, there
are already two
> syntaxes in use elsewhere -- (?<name>pattern) in
.NET, and
> (?P<name>pattern) in PCRE/Python.
I originally implemented (?<...>...) and (?'...'...)
in
Regexp::NamedCaptures because that was the syntax I
remembered from
Friedl's book on regular expressions. It was the only
concrete syntax
I'd seen up to that point so I ported it to perl as an
overload => qr
extension and then suggested to Yves that he follow it for
the same
reason.
> That's not in itself an argument for introducing a
third syntax, but it
> does weaken the argument that consistency with other
systems is
> important. And I think consistency with the
(derivable) properties of
> Perl regexes in <5.9.5 is a more useful goal for
Perl 5.10 than
> consistency with either .NET or Python.
If (?P<...>...) is syntax, isn't (?k<...>...)
better because it is
consistent with k<...>?
I happen to like the <...> part of the syntax because
it's
bracketting, it's part of the current syntax, and it is
kinda like
what Perl 6 rules look like.
Josh
|
|
|
|