FAQ

[P5P] [PATCH] New failing test for RT#45667 (m/[#]/x treated inconsistently)

Chris Dolan
Oct 28, 2007 at 2:59 pm
Hello,

RT#45667 points out that the Perl parser and the Regexp parser treat
m/[#]/x differently. The former treats the # as a comment character
and the latter treats it as a literal. I propose that we adopt the
Perl parser's behavior because 1) doing otherwise would require us to
add more regexp-specific knowledge to the Perl parser and 2) other
Regexp engines may not want to treat [#] the same as the current
default engine.

The patch below adds a (currently failing) test to the regexp parser
to enforce this decision. The patch is against rsync 5.10.0.32199

Chris

(P.S. my first post to P5P)

--- t/op/re_tests.orig 2007-10-28 09:26:00.000000000 -0500
+++ t/op/re_tests 2007-10-28 09:47:49.000000000 -0500
@@ -1338,3 +1338,5 @@
.*\z foo\n y - -
^(?:(\d)x)?\d$ 1 y ${\(defined($1)?1:0)} 0
.*?(?:(\w)|(\w))x abx y $1-$2 b-
+
+'foo [#]'x foo# c - Unmatched [
reply

Search Discussions

7 responses

  • Rafael Garcia-Suarez at Oct 30, 2007 at 12:53 pm

    On 28/10/2007, Chris Dolan wrote:
    Hello,

    RT#45667 points out that the Perl parser and the Regexp parser treat
    m/[#]/x differently. The former treats the # as a comment character
    and the latter treats it as a literal. I propose that we adopt the
    Perl parser's behavior because 1) doing otherwise would require us to
    add more regexp-specific knowledge to the Perl parser and 2) other
    Regexp engines may not want to treat [#] the same as the current
    default engine.

    The patch below adds a (currently failing) test to the regexp parser
    to enforce this decision. The patch is against rsync 5.10.0.32199
    it doesn't seem that op/regexp.t supports TODO tests. That would
    probably be a worthwhile addition...
  • Demerphq at Oct 30, 2007 at 2:04 pm

    On 10/30/07, Rafael Garcia-Suarez wrote:
    On 28/10/2007, Chris Dolan wrote:
    Hello,

    RT#45667 points out that the Perl parser and the Regexp parser treat
    m/[#]/x differently. The former treats the # as a comment character
    and the latter treats it as a literal. I propose that we adopt the
    Perl parser's behavior because 1) doing otherwise would require us to
    add more regexp-specific knowledge to the Perl parser and 2) other
    Regexp engines may not want to treat [#] the same as the current
    default engine.

    The patch below adds a (currently failing) test to the regexp parser
    to enforce this decision. The patch is against rsync 5.10.0.32199
    it doesn't seem that op/regexp.t supports TODO tests. That would
    probably be a worthwhile addition...
    Probably is. But im not entirely comfortable with the original patch.
    Its not clear to me that we can change this behaviour anymore. At
    least not in the regex engine. Its actually probably easier to change
    this in the perl parser, which we long term intend to change anyway in
    order to resolve the whole qr// as closure stuff as well as variable
    bindings amongst other issues.

    Probably a good idea if Dave Mitchell speaks up on this one as to my
    knowledge he has the most developed idea of the future of this.

    Yves


    --
    perl -Mre=debug -e "/just|another|perl|hacker/"
  • Dave Mitchell at Oct 30, 2007 at 2:15 pm

    On Tue, Oct 30, 2007 at 03:04:28PM +0100, demerphq wrote:
    Probably is. But im not entirely comfortable with the original patch.
    Its not clear to me that we can change this behaviour anymore. At
    least not in the regex engine. Its actually probably easier to change
    this in the perl parser, which we long term intend to change anyway in
    order to resolve the whole qr// as closure stuff as well as variable
    bindings amongst other issues.

    Probably a good idea if Dave Mitchell speaks up on this one as to my
    knowledge he has the most developed idea of the future of this.
    The bug's definitely in the perl parser, and I still intend to fix it post
    5.10.0. The fix is kinda orthogonal to the qr// and closure stuff.
    Actually, there's a bit of uncertainty as to whether I cans still do that
    qr// stuff, now that there's an API been added to the regex parser: I
    don't think it can can handle want I would like to pass to it (a mixture
    of const strings and snippets of opcode trees), of if I could pass it,
    whether other plugged-in engines (sucha as PCRE) wouldn't choke on it.
    Anyway, I haven't looked closely at it yet.

    --
    Gravity is just a theory; teach Intelligent Falling in our schools!
    http://www.theonion.com/content/node/39512
  • Demerphq at Oct 30, 2007 at 3:30 pm

    On 10/30/07, Dave Mitchell wrote:
    On Tue, Oct 30, 2007 at 03:04:28PM +0100, demerphq wrote:
    Probably is. But im not entirely comfortable with the original patch.
    Its not clear to me that we can change this behaviour anymore. At
    least not in the regex engine. Its actually probably easier to change
    this in the perl parser, which we long term intend to change anyway in
    order to resolve the whole qr// as closure stuff as well as variable
    bindings amongst other issues.

    Probably a good idea if Dave Mitchell speaks up on this one as to my
    knowledge he has the most developed idea of the future of this.
    The bug's definitely in the perl parser, and I still intend to fix it post
    5.10.0. The fix is kinda orthogonal to the qr// and closure stuff.
    Ok. I thought they were related.
    Actually, there's a bit of uncertainty as to whether I cans still do that
    qr// stuff, now that there's an API been added to the regex parser: I
    don't think it can can handle want I would like to pass to it (a mixture
    of const strings and snippets of opcode trees), of if I could pass it,
    whether other plugged-in engines (sucha as PCRE) wouldn't choke on it.
    Anyway, I haven't looked closely at it yet.
    I dont think you should worry about that. I think fixing the general
    behaviour takes precedence over supporting experimental regex engine
    plug ins. I believe that the API has been documented to be a work in
    progress anyway, and if it hasnt we should make sure to do so before
    5.10 is released. Of course if what you design has an api that makes
    it easy for the engine plug ins to adapt it would be a good thing. :-)

    Yves



    --
    perl -Mre=debug -e "/just|another|perl|hacker/"
  • Dave Mitchell at Oct 30, 2007 at 4:10 pm

    On Tue, Oct 30, 2007 at 04:30:46PM +0100, demerphq wrote:
    On 10/30/07, Dave Mitchell wrote:
    On Tue, Oct 30, 2007 at 03:04:28PM +0100, demerphq wrote:
    Probably is. But im not entirely comfortable with the original patch.
    Its not clear to me that we can change this behaviour anymore. At
    least not in the regex engine. Its actually probably easier to change
    this in the perl parser, which we long term intend to change anyway in
    order to resolve the whole qr// as closure stuff as well as variable
    bindings amongst other issues.

    Probably a good idea if Dave Mitchell speaks up on this one as to my
    knowledge he has the most developed idea of the future of this.
    The bug's definitely in the perl parser, and I still intend to fix it post
    5.10.0. The fix is kinda orthogonal to the qr// and closure stuff.
    Ok. I thought they were related.
    Well, they're slightly related, but the two fixes can be developed
    independently of each other. They both involve making the tokeniser be a
    bit more clever when processing literal strings. There's a function that
    scans through characters in a literal string until it finds something
    "interesting". For example, in the double-quoted string "abc$d", it stops
    at the $ and returns "abc" as a token. One fix involves stopping that
    function thinking that # is interesting when within [], the other involves
    making it think that '(?{' is interesting.
    Actually, there's a bit of uncertainty as to whether I cans still do that
    qr// stuff, now that there's an API been added to the regex parser: I
    don't think it can can handle want I would like to pass to it (a mixture
    of const strings and snippets of opcode trees), of if I could pass it,
    whether other plugged-in engines (sucha as PCRE) wouldn't choke on it.
    Anyway, I haven't looked closely at it yet.
    I dont think you should worry about that. I think fixing the general
    behaviour takes precedence over supporting experimental regex engine
    plug ins. I believe that the API has been documented to be a work in
    progress anyway, and if it hasnt we should make sure to do so before
    5.10 is released. Of course if what you design has an api that makes
    it easy for the engine plug ins to adapt it would be a good thing. :-)
    Okay :-)

    --
    The warp engines start playing up a bit, but seem to sort themselves out
    after a while without any intervention from boy genius Wesley Crusher.
    -- Things That Never Happen in "Star Trek" #17
  • Chris Dolan at Oct 31, 2007 at 11:09 am

    On Oct 30, 2007, at 11:08 AM, Dave Mitchell wrote:

    Well, they're slightly related, but the two fixes can be developed
    independently of each other. They both involve making the tokeniser
    be a
    bit more clever when processing literal strings. There's a function
    that
    scans through characters in a literal string until it finds something
    "interesting". For example, in the double-quoted string "abc$d", it
    stops
    at the $ and returns "abc" as a token. One fix involves stopping that
    function thinking that # is interesting when within [], the other
    involves
    making it think that '(?{' is interesting.
    OK, fair enough. Consistency is what I seek. :-) In that light,
    here's an alternative proposed test patch. As of 32199, this patch
    yields (abbreviated somewhat):

    ok 1342
    ok 1343
    ok 1344
    not ok 1345 () 'foo \[#\]'x:foo[#]:y:$&:foo[#] => `foo[', match=1
    not ok 1346 () 'foo [#]':foo#:y:$&:foo# => `'', match=
    ok 1347
    ok 1348
    not ok 1349 () 'foo \[#\]':foo[#]:y:$&:foo[#] => `'', match=

    Chris


    --- t/op/re_tests.orig 2007-10-28 09:26:00.000000000 -0500
    +++ t/op/re_tests 2007-10-30 19:55:16.000000000 -0500
    @@ -1338,3 +1338,12 @@
    .*\z foo\n y - -
    ^(?:(\d)x)?\d$ 1 y ${\(defined($1)?1:0)} 0
    .*?(?:(\w)|(\w))x abx y $1-$2 b-
    +
    +'foo [#]'x foo# y $& foo#
    +'foo [#'x foo# c - Unmatched [
    +'foo \[#]'x foo# n - -
    +'foo \[#\]'x foo[#] y $& foo[#]
    +'foo [#]' foo# y $& foo#
    +'foo [#' foo# c - Unmatched [
    +'foo \[#]' foo# n - -
    +'foo \[#\]' foo[#] y $& foo[#]
  • Demerphq at Oct 31, 2007 at 11:30 am

    On 10/31/07, Chris Dolan wrote:
    On Oct 30, 2007, at 11:08 AM, Dave Mitchell wrote:

    Well, they're slightly related, but the two fixes can be developed
    independently of each other. They both involve making the tokeniser
    be a
    bit more clever when processing literal strings. There's a function
    that
    scans through characters in a literal string until it finds something
    "interesting". For example, in the double-quoted string "abc$d", it
    stops
    at the $ and returns "abc" as a token. One fix involves stopping that
    function thinking that # is interesting when within [], the other
    involves
    making it think that '(?{' is interesting.
    OK, fair enough. Consistency is what I seek. :-) In that light,
    here's an alternative proposed test patch. As of 32199, this patch
    yields (abbreviated somewhat):

    ok 1342
    ok 1343
    ok 1344
    not ok 1345 () 'foo \[#\]'x:foo[#]:y:$&:foo[#] => `foo[', match=1
    not ok 1346 () 'foo [#]':foo#:y:$&:foo# => `'', match=
    ok 1347
    ok 1348
    not ok 1349 () 'foo \[#\]':foo[#]:y:$&:foo[#] => `'', match=

    Chris


    --- t/op/re_tests.orig 2007-10-28 09:26:00.000000000 -0500
    +++ t/op/re_tests 2007-10-30 19:55:16.000000000 -0500
    @@ -1338,3 +1338,12 @@
    .*\z foo\n y - -
    ^(?:(\d)x)?\d$ 1 y ${\(defined($1)?1:0)} 0
    .*?(?:(\w)|(\w))x abx y $1-$2 b-
    +
    +'foo [#]'x foo# y $& foo#
    +'foo [#'x foo# c - Unmatched [
    +'foo \[#]'x foo# n - -
    +'foo \[#\]'x foo[#] y $& foo[#]
    +'foo [#]' foo# y $& foo#
    +'foo [#' foo# c - Unmatched [
    +'foo \[#]' foo# n - -
    +'foo \[#\]' foo[#] y $& foo[#]
    I think there is a way to mark the failing ones as todo tests, youd
    have to poke around in the re_tests test file to see how, if not just
    modify regexp.t to do it somehow. Probably its just as simple as
    saying 'TODO' on the test name, if not, well thats how i would do it.

    Im hoping to find some perl related tuits soon.

    yves

    --
    perl -Mre=debug -e "/just|another|perl|hacker/"

Related Discussions

Discussion Navigation
viewthread | post