FAQ
Hi, this patch should resolve the bugs that Slaven found in the regex
engine with case insensitive matches. There were actually several bugs
all interacting with each other, one of which is quite old. These were
all interacting to produce strange results.

The original bug was that

$ord=181; $u=$c=chr($ord); utf8::upgrade $u;
$u=~/$c|xyz/i;

was not working. This then lead to the discovery that

$u=~/$c/i

was not working for $ord>127.

This in turn lead to the discovery that when ibcmp_utf8() is called
without a defined endpoint for each string it automatically and
silently returns no match, which the regex was doing. I added asserts
to catch this case under DEBUGGING.

Following this I found that the FOLDCHAR regop was not being produced
when a codepoint it is supposed to match is escaped. This was also
fixed.

And lastly I discovered some unnecessary code and added some comments
about strangeness in the code.

The file 3030-up.patch is the full deal as one. The others are each
stage of the process.

The only thing im not feeling confident about is whether i was too
aggressive with the asserts. Its possible i have set it up to assert
in a legitimate use case. Im still waiting to see.

Ive included a single patch that has all the changes and a tar.gz of
each of my local commits.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Search Discussions

  • Rafael Garcia-Suarez at Dec 17, 2007 at 4:05 pm

    On 17/12/2007, demerphq wrote:
    Hi, this patch should resolve the bugs that Slaven found in the regex
    engine with case insensitive matches. There were actually several bugs
    all interacting with each other, one of which is quite old. These were
    all interacting to produce strange results.

    The original bug was that

    $ord=181; $u=$c=chr($ord); utf8::upgrade $u;
    $u=~/$c|xyz/i;

    was not working. This then lead to the discovery that

    $u=~/$c/i

    was not working for $ord>127.

    This in turn lead to the discovery that when ibcmp_utf8() is called
    without a defined endpoint for each string it automatically and
    silently returns no match, which the regex was doing. I added asserts
    to catch this case under DEBUGGING.

    Following this I found that the FOLDCHAR regop was not being produced
    when a codepoint it is supposed to match is escaped. This was also
    fixed.

    And lastly I discovered some unnecessary code and added some comments
    about strangeness in the code.

    The file 3030-up.patch is the full deal as one. The others are each
    stage of the process.

    The only thing im not feeling confident about is whether i was too
    aggressive with the asserts. Its possible i have set it up to assert
    in a legitimate use case. Im still waiting to see.

    Ive included a single patch that has all the changes and a tar.gz of
    each of my local commits.
    Thanks, applied as change #32628, except your patch 3032, where I
    commented out the asserts.

    Well, so, that's the last showstopper going off, right ?

    Andreas, if you could kick a BBC smoke, that would probably be helpful...
  • Dr.Ruud at Dec 17, 2007 at 6:27 pm

    "Rafael Garcia-Suarez" schreef:
    demerphq wrote:
    Hi, this patch should resolve the bugs that Slaven found in the regex
    engine with case insensitive matches. There were actually several
    bugs all interacting with each other, one of which is quite old.
    These were all interacting to produce strange results. [...]
    Thanks, applied as change #32628, except your patch 3032, where I
    commented out the asserts.

    Well, so, that's the last showstopper going off, right ?
    Ah, so maybe "Perl is 5.10<<2" can still be tomorrow?

    --
    Affijn, Ruud

    "Gewoon is een tijger."
  • Richard Foley at Dec 18, 2007 at 7:38 am

    On Monday 17 December 2007 19:26, Dr.Ruud wrote:

    Ah, so maybe "Perl is 5.10<<2" can still be tomorrow?
    What, no RC3...?

    --
    Richard Foley
    Ciao - shorter than aufwiedersehen

    http://www.rfi.net/
  • Andreas J. Koenig at Dec 18, 2007 at 8:01 am

    On Mon, 17 Dec 2007 17:05:43 +0100, "Rafael Garcia-Suarez" <rgarciasuarez@gmail.com> said:
    Andreas, if you could kick a BBC smoke, that would probably be helpful...
    My usual 1500 distribution smoke with stock options (no threads, no
    debugging, no 64bitint) completed successfully without surprises.

    I have a few BBC articles on the backburner which I could not finish.
    But as far as I can see, no showstoppers.

    A bit hot seems to be RT #31556 at the moment but it can probably
    wait.

    Happy birthday everybody,
    --
    andreas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedDec 17, '07 at 2:22p
activeDec 18, '07 at 8:01a
posts5
users5
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase