FAQ
With a bleedperl as of this morning, this fails:

print "abc" =~ m/.*(?<=b)c/

I'm currently in the brain-not-working phase of a cold, so maybe I'm crazy
expecting that it should match. Need more NyQuil....

FWIW, the -Dr output is appended. From what I can tell, it looks like
the IFMATCH[-1] offset is being applied twice....
Jeffrey

jfriedl@fummy> perl5.7.2 -Dr -e 'print "abc" =~ m/.*(?<=b)c/'
Compiling REx `.*(?<=b)c'
size 11 Got 92 bytes for offset annotations.
first at 2
rarest char c at 0
1: STAR(3)
2: REG_ANY(0)
3: IFMATCH[-1](9)
5: EXACT <b>(7)
7: SUCCEED(0)
8: TAIL(9)
9: EXACT <c>(11)
11: END(0)
floating `c' at 0..2147483647 (checking floating) anchored(MBOL) implicit minlen 1
Offsets: [11]
2[1] 1[1] 7[1] 0[0] 7[1] 0[0] 7[0] 7[0] 9[1] 0[0] 10[0]
Omitting $` $& $' support.

EXECUTING...

Guessing start of match, REx `.*(?<=b)c' against `abc'...
Found floating substr `c' at offset 2...
Position at offset 0 does not contradict /^/m...
Guessed: match at offset 0
Matching REx `.*(?<=b)c' against `abc'
Setting an EVAL scope, savestack=3
0 <> <abc> | 1: STAR
REG_ANY can match 3 times out of 32767...
Setting an EVAL scope, savestack=3
1 <a> <bc> | 3: IFMATCH[-1]
0 <> <abc> | 5: EXACT <b>
failed...
ODD------^^^^^ failed...
failed...
Guessing start of match, REx `.*(?<=b)c' against `bc'...
Found floating substr `c' at offset 1...
Position at offset 0 does not contradict /^/m...
Guessed: match at offset 0
Setting an EVAL scope, savestack=3
1 <a> <bc> | 1: STAR
REG_ANY can match 2 times out of 32767...
Setting an EVAL scope, savestack=3
1 <a> <bc> | 3: IFMATCH[-1]
0 <> <abc> | 5: EXACT <b>
failed...
failed...
failed...
Guessing start of match, REx `.*(?<=b)c' against `c'...
Found floating substr `c' at offset 0...
Position at offset 0 does not contradict /^/m...
Guessed: match at offset 0
Setting an EVAL scope, savestack=3
2 <ab> <c> | 1: STAR
REG_ANY can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
failed...
Match failed
Freeing REx: `.*(?<=b)c'

Search Discussions

  • Hugo van der Sanden at Jan 13, 2002 at 6:04 pm
    Jeffrey Friedl wrote:
    :
    :With a bleedperl as of this morning, this fails:
    :
    : print "abc" =~ m/.*(?<=b)c/

    IFMATCH nodes encode both lookahead and lookbehind.

    Japhy's recent optimisation patch treated IFMATCH as 'skippable', which
    meant that we assumed any EXACT immediately within the IFMATCH had to
    match immediately following the preceding '.*'. The patch below changes
    it to act as before for a lookahead IFMATCH (when node->flags == 0), but
    to look instead for an EXACT after the close of the IFMATCH for a
    lookbehind.

    New tests have exposed an additional unrelated problem in intuit_start(),
    that I don't understand yet. The four problem cases are marked 'B' as
    known test failures for now.

    Hugo
    --- regexec.c.old Wed Jan 9 20:15:51 2002
    +++ regexec.c Sun Jan 13 18:04:12 2002
    @@ -140,13 +140,18 @@
    PL_regkind[(U8)OP(rn)] == EXACT || PL_regkind[(U8)OP(rn)] == REF \
    )

    +/*
    + Search for mandatory following text node; for lookahead, the text must
    + follow but for lookbehind (rn->flags != 0) we skip to the next step.
    +*/
    #define FIND_NEXT_IMPT(rn) STMT_START { \
    while (JUMPABLE(rn)) \
    - if (OP(rn) == SUSPEND || OP(rn) == IFMATCH || \
    - PL_regkind[(U8)OP(rn)] == CURLY) \
    + if (OP(rn) == SUSPEND || PL_regkind[(U8)OP(rn)] == CURLY) \
    rn = NEXTOPER(NEXTOPER(rn)); \
    else if (OP(rn) == PLUS) \
    rn = NEXTOPER(rn); \
    + else if (OP(rn) == IFMATCH) \
    + rn = (rn->flags == 0) ? NEXTOPER(NEXTOPER(rn)) : rn + ARG(rn); \
    else rn += NEXT_OFF(rn); \
    } STMT_END

    --- t/op/re_tests.old Wed Jan 9 14:52:09 2002
    +++ t/op/re_tests Sun Jan 13 17:59:05 2002
    @@ -799,3 +799,37 @@
    '^(o)(?!.*\1)'i Oo n - -
    (.*)\d+\1 abc12bc y $1 bc
    (?m:(foo\s*$)) foo\n bar y $1 foo
    +(.*)c abcd y $1 ab
    +(.*)(?=c) abcd y $1 ab
    +(.*)(?=c)c abcd yB $1 ab
    +(.*)(?=b|c) abcd y $1 ab
    +(.*)(?=b|c)c abcd y $1 ab
    +(.*)(?=c|b) abcd y $1 ab
    +(.*)(?=c|b)c abcd y $1 ab
    +(.*)(?=[bc]) abcd y $1 ab
    +(.*)(?=[bc])c abcd yB $1 ab
    +(.*)(?<=b) abcd y $1 ab
    +(.*)(?<=b)c abcd y $1 ab
    +(.*)(?<=b|c) abcd y $1 abc
    +(.*)(?<=b|c)c abcd y $1 ab
    +(.*)(?<=c|b) abcd y $1 abc
    +(.*)(?<=c|b)c abcd y $1 ab
    +(.*)(?<=[bc]) abcd y $1 abc
    +(.*)(?<=[bc])c abcd y $1 ab
    +(.*?)c abcd y $1 ab
    +(.*?)(?=c) abcd y $1 ab
    +(.*?)(?=c)c abcd yB $1 ab
    +(.*?)(?=b|c) abcd y $1 a
    +(.*?)(?=b|c)c abcd y $1 ab
    +(.*?)(?=c|b) abcd y $1 a
    +(.*?)(?=c|b)c abcd y $1 ab
    +(.*?)(?=[bc]) abcd y $1 a
    +(.*?)(?=[bc])c abcd yB $1 ab
    +(.*?)(?<=b) abcd y $1 ab
    +(.*?)(?<=b)c abcd y $1 ab
    +(.*?)(?<=b|c) abcd y $1 ab
    +(.*?)(?<=b|c)c abcd y $1 ab
    +(.*?)(?<=c|b) abcd y $1 ab
    +(.*?)(?<=c|b)c abcd y $1 ab
    +(.*?)(?<=[bc]) abcd y $1 ab
    +(.*?)(?<=[bc])c abcd y $1 ab
  • Jarkko Hietaniemi at Jan 13, 2002 at 6:39 pm

    On Sun, Jan 13, 2002 at 06:06:22PM +0000, Hugo van der Sanden wrote:
    Jeffrey Friedl wrote:
    :
    :With a bleedperl as of this morning, this fails:
    :
    : print "abc" =~ m/.*(?<=b)c/

    IFMATCH nodes encode both lookahead and lookbehind.

    Japhy's recent optimisation patch treated IFMATCH as 'skippable', which
    meant that we assumed any EXACT immediately within the IFMATCH had to
    match immediately following the preceding '.*'. The patch below changes
    it to act as before for a lookahead IFMATCH (when node->flags == 0), but
    to look instead for an EXACT after the close of the IFMATCH for a
    lookbehind.
    Thanks, applied.
    New tests have exposed an additional unrelated problem in intuit_start(),
    Just *one* problem in intuit_start()? :-)
    that I don't understand yet. The four problem cases are marked 'B' as
    known test failures for now.
    --
    $jhi++; # http://www.iki.fi/jhi/
    # There is this special biologist word we use for 'stable'.
    # It is 'dead'. -- Jack Cohen
  • Hugo van der Sanden at Jan 14, 2002 at 3:18 am
    Jarkko Hietaniemi wrote:
    :> New tests have exposed an additional unrelated problem in intuit_start(),
    :
    :Just *one* problem in intuit_start()? :-)

    Not even a problem in intuit_start - it seems that with:
    /(.*)(?=c)c/
    ... regcomp.c:study_chunk() is setting the stclass (which I think is
    short for 'start class', a manufactured class that the first character
    matched must be in) to [c], which is wrong - it should be [^\n] OR [c],
    resolved to [^\n].

    As a result intuit_start() finds the floating substr "c" at offset 2,
    and decides therefore it should match starting from offset 0; then
    the stclass check is done, and it 'improves' the starting point to
    offset 2.

    The same bug exists in 5.6.1, but it is apparently masked in this case
    by the effects of the implicit MBOL. Using Japhy's .{0} hack to avoid
    that, we can see it:
    crypt% perl -wle '"abcd" =~ /.{0}(.*)(?=c)c/ && print "<$1>"'
    <>
    crypt%

    Observing variants shows that it is ORing the first character of
    the lookahead with that of the following string (when they should
    be ANDed), and ignoring the optional preceding '.' (when it should
    be ORed). Replacing '(.*)' with '(b*)' appears to do something
    completely different, and still wrong. :(

    I've never really understood the stclass stuff, but I guess now is
    as good a time as any. Well, maybe tomorrow is.

    Hugo

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedJan 11, '02 at 11:06p
activeJan 14, '02 at 3:18a
posts4
users3
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase