FAQ

On Mon Jan 11 07:56:52 2010, ph10@hermes.cam.ac.uk wrote:
Consider the pattern ^(xa|=?\1a){2}$ when matched against the string
"xa=xaaa".
The first time the group matches "xa", so $1 is set to that value, and
the
remaining subject string is "=xaaa". The next time round the loop, the
first
alternative fails, but =?\1a matches the next three characters "=xaa",
so the
group ends, $1 is set to "=xaa", but there is still one character "a"
in the
subject string. Now we get to $ and that fails because we are not at
the end of
the string.

Perl (apparently) backtracks into the second iteration of the group,
to the
position where "=" is optional, and tries without matching the "="
character.
However, it does NOT (it seems) reset the value of $1, so it still set
to
"=xaa". This does match the next three characters in the string, and
then the
final "a" matches, and we are now at the end of the string. I have
tested this
with Perl 5.8.8, 5.10.1 and Perl 5.11.3.

So, the whole match succeeds with $1 set to "=xaaa". However, it
should fail.

One way of thinking about this is to consider the pattern without the
question
mark. ^(xa|=\1a){2}$ does fail when matched against "xa=xaaa". Since
the =
character is actually present in the subject, making it optional
should not
affect the outcome.

I discovered this bug because somebody reported exactly the same
effect in
PCRE. It seems to have been there for a very long time. In PCRE, the
only way I
have been able to fix the bug is to force any group that contains a
reference
to itself to be an atomic group. PCRE has treated recursive and
"subroutine"
group calls as atomic for some time, because of a similar issue - the
inability
to exactly replicate the environment when backtracking into them.

Philip
1. Do you get the same problems if you use the '\g1' or '\g{1}' notation
introduced in Perl 5.10 (see 'perlre')?

2. Have we ever claimed (in our documentation) that "recursive
backreferences" will, in some sense, "work"? (I.e., are we dealing with
a bug or a feature request?)

Thank you very much.
Jim Keenan



---
via perlbug: queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=72020

Search Discussions

  • Hv at Feb 24, 2013 at 8:46 am
    "James E Keenan via RT" wrote:
    :On Mon Jan 11 07:56:52 2010, ph10@hermes.cam.ac.uk wrote:
    :> Consider the pattern ^(xa|=?\1a){2}$ when matched against the string
    :> "xa=xaaa".
    [...]
    :2. Have we ever claimed (in our documentation) that "recursive
    :backreferences" will, in some sense, "work"? (I.e., are we dealing with
    :a bug or a feature request?)

    If the string were "xa=xaa" then it would make sense to look to the docs
    to decide if we were talking about a bug or a feature request; but given
    the string is "xa=xaaa" I'd say it's a clear bug.

    % perl -wle 'print "not ok" if "xa=xaaa" =~ /^(xa|=?\1a){2}$/'
    not ok
    %

    Hugo
  • Dr.Ruud at Feb 24, 2013 at 12:20 pm

    On 2013-02-24 08:37, hv@crypt.org wrote:
    "James E Keenan via RT" wrote:
    :On Mon Jan 11 07:56:52 2010, ph10@hermes.cam.ac.uk wrote:
    :> Consider the pattern ^(xa|=?\1a){2}$ when matched against the string
    :> "xa=xaaa".
    [...]
    :2. Have we ever claimed (in our documentation) that "recursive
    :backreferences" will, in some sense, "work"? (I.e., are we dealing with
    :a bug or a feature request?)

    If the string were "xa=xaa" then it would make sense to look to the docs
    to decide if we were talking about a bug or a feature request; but given
    the string is "xa=xaaa" I'd say it's a clear bug.

    % perl -wle 'print "not ok" if "xa=xaaa" =~ /^(xa|=?\1a){2}$/'
    not ok
    %
    I think this should compile \1 as \x{01},
    because there isn't a first caption yet.

    --
    Ruud
  • Demerphq at Feb 25, 2013 at 10:37 am

    On 24 February 2013 08:37, wrote:
    "James E Keenan via RT" wrote:
    :On Mon Jan 11 07:56:52 2010, ph10@hermes.cam.ac.uk wrote:
    :> Consider the pattern ^(xa|=?\1a){2}$ when matched against the string
    :> "xa=xaaa".
    [...]
    :2. Have we ever claimed (in our documentation) that "recursive
    :backreferences" will, in some sense, "work"? (I.e., are we dealing with
    :a bug or a feature request?)

    If the string were "xa=xaa" then it would make sense to look to the docs
    to decide if we were talking about a bug or a feature request; but given
    the string is "xa=xaaa" I'd say it's a clear bug.

    % perl -wle 'print "not ok" if "xa=xaaa" =~ /^(xa|=?\1a){2}$/'
    not ok
    %
    Agreed. I consider this a bug, and time permitting will look into it.

    Yves



    --
    perl -Mre=debug -e "/just|another|perl|hacker/"
  • Ph10 at Feb 24, 2013 at 6:08 pm

    On Sat, 23 Feb 2013, James E Keenan via RT wrote:

    1. Do you get the same problems if you use the '\g1' or '\g{1}' notation
    introduced in Perl 5.10 (see 'perlre')?
    Yes in both cases, with Perl 5.016002. Exactly the same behaviour in all
    three cases.
    2. Have we ever claimed (in our documentation) that "recursive
    backreferences" will, in some sense, "work"? (I.e., are we dealing with
    a bug or a feature request?)
    I have no idea on your documentation question, I'm afraid.

    Regards,
    Philip

    --
    Philip Hazel

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedFeb 24, '13 at 1:50a
activeFeb 25, '13 at 10:37a
posts5
users5
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase