Grokbase
x

[perl #56690] Some bugs in Perl regexp (core Perl issues)

View TopicPrint | Flat  Thread  Threaded
1) Kevin Wolf This is caused by a failure of the start_class optimization in the case of lookahead, as per the...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue Jul 08 14:49:43 2008, [email protected: ab...@abigail.be] wrote:
> Here are some tests for this bug:
>
>
>
> --- t/op/re_tests.orig 2008-04-11 14:20:20.000000000 +0200
> +++ t/op/re_tests 2008-07-08 18:43:39.000000000 +0200
> @@ -1344,4 +1344,7 @@
>  .*?(?:(\w)|(\w))x abx y $1-$2 b-
>  
> 0{50} 000000000000000000000000000000000000000000000000000 y - -
> +# Bug #56690
> +^a?(?=b)b ab y $& ab
> +^a*(?=b)b ab y $& ab


This is caused by a failure of the start_class optimization in the case
of lookahead, as per the attached comment.

In more detail: at the point study_chunk() attempts to deal with the
start_class discovered for the lookahead chunk, we have
SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.

So given:
  start = ANYOF_EOS | ANYOF_UNICODE_ALL
  pre = [a] | ANYOF_EOS
  lookahead = [b]
  post = [b]
what we should be getting is:
  start_class = start & (pre | (lookahead & post))
      = start & (pre | [b])
      = start & [ab]
      = [ab]
but what we are getting is:
  start_class = start & ((pre & lookahead) | post)
      = start & (ANYOF_EOS | post)
      = start & [b]
      = [b]

In other words, we need to stack an alternation of ANDs and ORs to cope
with this situation, and we don't have a mechanism to do that except to
recurse into study_chunk() some more.

A simpler short-term fix is instead to throw up our hands in this
situation, and just nullify start_class. I'm not sure exactly how to do
that, but it seems the more likely to be achievable for 5.10.1.

Hugo --- regcomp.c.old 2009-06-18 10:21:11.000000000 +0100
+++ regcomp.c 2009-06-26 11:55:47.000000000 +0100
@@ -3729,6 +3729,12 @@
                 if (f & SCF_DO_STCLASS_AND) {
const int was = (data->start_class->flags & ANYOF_EOS);

+      /* [perl #56690] When (flags & SCF_DO_STCLASS_OR) this does
+         the wrong thing: this lookahead stclass should be ANDed
+         with what follows but not with what comes before.
+         We see this with /^a?(?=b)b/: here we combine [a]|$ with
+         [b] to give []|$, and end up with wrong stclass [b].
+      */
                     cl_and(data->start_class, &intrnl);
                     if (was)
                         data->start_class->flags |= ANYOF_EOS;
2) h...@crypt.org :This is caused by a failure of the start_class optimization in the case :of lookahead, as per the...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
"Hugo van der Sanden via RT" <perlbug-followup@perl.org> wrote:
:This is caused by a failure of the start_class optimization in the case
:of lookahead, as per the attached comment.
:
:In more detail: at the point study_chunk() attempts to deal with the
:start_class discovered for the lookahead chunk, we have
:SCF_DO_STCLASS_OR set, and_withp has the starting value of ANYOF_EOS |
:ANYOF_UNICODE_ALL, and data->start_class has [a] | ANYOF_EOS.
[...]
:In other words, we need to stack an alternation of ANDs and ORs to cope
:with this situation, and we don't have a mechanism to do that except to
:recurse into study_chunk() some more.
:
:A simpler short-term fix is instead to throw up our hands in this
:situation, and just nullify start_class. I'm not sure exactly how to do
:that, but it seems the more likely to be achievable for 5.10.1.

This patch implements the simple fix, and passes all tests including
Abigail's test cases for the bug.

Yves: note that I've preserved the 'was' code in this chunk, introduced
by you in the patch [1], discussed in the thread [2]. As far as I can
see the 3 lines propagating ANYOF_EOS via 'was' (and the copy of those
3 lines a little later) are simply doing the wrong thing - they seem
to be saying "when we combine two start classes using SCF_DO_STCLASS_AND,
claim that end-of-string is valid if the first class says it would be
even though the second says it wouldn't be". Removing those lines doesn't
cause any test failures - can you remember why you introduced those lines,
and maybe add a test case that fails without them?

Hugo

[1] http://perl5.git.perl.org/perl.git/commit/b515a41db88584b4fd1c30cf890c92d3f9697760
[2] http://groups.google.co.uk/group/perl.perl5.porters/browse_thread/thread/436187077ef96918/f11c3268394abf89

--- regcomp.c.old 2009-06-18 10:21:11.000000000 +0100
+++ regcomp.c 2009-07-02 11:16:29.000000000 +0100
@@ -3727,11 +3727,22 @@
                     data->whilem_c = data_fake.whilem_c;
                 }
                 if (f & SCF_DO_STCLASS_AND) {
- const int was = (data->start_class->flags & ANYOF_EOS);
-
-                    cl_and(data->start_class, &intrnl);
-                    if (was)
-                        data->start_class->flags |= ANYOF_EOS;
+      if (flags & SCF_DO_STCLASS_OR) {
+   /* OR before, AND after: ideally we would recurse with
+    * data_fake to get the AND applied by study of the
+    * remainder of the pattern, and then derecurse;
+    * *** HACK *** for now just treat as "no information".
+    * See [perl #56690].
+    */
+   cl_init(pRExC_state, data->start_class);
+      }  else {
+   /* AND before and after: combine and continue */
+   const int was = (data->start_class->flags & ANYOF_EOS);
+
+   cl_and(data->start_class, &intrnl);
+   if (was)
+       data->start_class->flags |= ANYOF_EOS;
+      }
                 }
      }
#if PERL_ENABLE_POSITIVE_ASSERTION_STUDY
--- t/op/re_tests.old 2009-06-18 10:21:11.000000000 +0100
+++ t/op/re_tests 2009-07-02 11:21:31.000000000 +0100
@@ -1365,8 +1365,8 @@
.*?(?:(\w)|(\w))x abx y $1-$2 b-

0{50} 000000000000000000000000000000000000000000000000000 y - -
-^a?(?=b)b ab B $& ab # Bug #56690
-^a*(?=b)b ab B $& ab # Bug #56690
+^a?(?=b)b ab y $& ab # Bug #56690
+^a*(?=b)b ab y $& ab # Bug #56690
/>\d+$ \n/ix >10\n y $& >10
/>\d+$ \n/ix >1\n y $& >1
/\d+$ \n/ix >10\n y $& 10
3) Craig Berry Thanks, applied here: <http://perl5.git.perl.org/perl.git/commitdiff/906cdd2>
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Thu, Jul 2, 2009 at 5:36 AM, <hv@crypt.org> wrote:

> This patch implements the simple fix, and passes all tests including
> Abigail's test cases for the bug.

Thanks, applied here:

<http://perl5.git.perl.org/perl.git/commitdiff/906cdd2>
spacer
View TopicPrint | Flat  Thread  Threaded
Home > Groups > Perl 5 Porters > [perl #56690] Some bugs in Perl regexp (core Perl issues) (3 posts)