FAQ
Take 2.

substr() must accept both UVs and IVs for its $pos and $len arguments in
order to ensure any part of the string can be extracted, and to allow
negative indexing. (Even then, they are limits on the range of negative
indexing.) This patches provides that range.

It also fixes:

$ perl -wle'$[=2; print substr("abcdefghij", 1);'
panic: sv_setpvn called with negative strlen at -e line 1.

Search Discussions

  • Nicholas Clark at Feb 13, 2010 at 6:55 pm

    On Sat, Feb 13, 2010 at 12:15:56AM -0500, Eric Brine wrote:
    Take 2.

    substr() must accept both UVs and IVs for its $pos and $len arguments in
    order to ensure any part of the string can be extracted, and to allow
    negative indexing. (Even then, they are limits on the range of negative
    indexing.) This patches provides that range.

    It also fixes:

    $ perl -wle'$[=2; print substr("abcdefghij", 1);'
    panic: sv_setpvn called with negative strlen at -e line 1.
    Whilst I've looked at the patch, and don't see anything wrong with it, for
    some reason I'm uneasy about applying it today. (Tomorrow should be fine).

    Whilst this *isn't* technically the same bug, it is related to I32 abuse and
    the UTF-8 caching code. It's rather embarrassing that something as simple as
    length can return the wrong result, or panic:

    $ time ./perl -lwe 'open A, shift or die $!; read A, $a, (1<<32) + 4; chop $a; warn length $a; substr $a, 0x100000000, 1, chr 256; print ord substr $a, 0x100000000; print ord substr $a, 0xFFFFFFF0; warn length $a; warn length $a' /dev/zero
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    4294967299 at -e line 1.
    256
    panic: sv_len_utf8 cache 3 real 4294967299 for Ä at -e line 1.

    real 37m17.458s
    user 2m56.791s
    sys 1m11.734s


    I propose the following work around, which disables storing a bad value in
    the cache for the length in Unicode characters, if that value has wrapped:

    diff --git a/sv.c b/sv.c
    index 02be580..87fc348 100644
    --- a/sv.c
    +++ b/sv.c
    @@ -6072,6 +6072,10 @@ Perl_sv_len_utf8(pTHX_ register SV *const sv)
    }
    assert(mg);
    mg->mg_len = ulen;
    + /* For now, treat "overflowed" as "still unknown".
    + See RT #72784. */
    + if (ulen != (STRLEN) mg->mg_len)
    + mg->mg_len = -1;
    }
    }
    return ulen;


    It avoids the panic, and produces the right results (albeit slowly, but 10
    times faster than the panic takes to end):

    warn length $a; substr $a, 0x100000000, 1, chr 256; print ord substr $a, 0x100000000; print ord substr $a, 0xFFFFFFF0; warn length $a; warn length $a' /dev/zero
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    4294967299 at -e line 1.
    256

    4294967299 at -e line 1.
    4294967299 at -e line 1.

    real 3m41.042s
    user 3m36.332s
    sys 0m4.668s

    [I've compiled with -DDEBUGGING so I get debug mode enabled. If I turn it off:

    $ time ./perl -lwe '${^UTF8CACHE}=1; open A, shift or die $!; read A, $a, (1<<32) + 4; chop $a; warn length $a; substr $a, 0x100000000, 1, chr 256; print ord substr $a, 0x100000000; print ord substr $a, 0xFFFFFFF0; warn length $a; warn length $a' /dev/zero
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    4294967299 at -e line 1.
    256

    4294967299 at -e line 1.
    4294967299 at -e line 1.

    real 2m21.951s
    user 2m17.326s
    sys 0m4.579s

    I don't know how many fewer linear scans of the string that time drop equates
    to. One dromedary has the extra RAM installed it will become a lot less
    painful to start to investigate these things. 6GB is not enough :-) ]

    I've created a meta-ticket to group all the tickets related to I32 abuse,
    and collated the remaining uses of Perl_sv_pos_b2u() and Perl_sv_pos_u2b()
    as tickets under it:

    http://rt.perl.org/rt3/Ticket/Display.html?id=72784

    Nicholas Clark
  • Nicholas Clark at Feb 14, 2010 at 4:50 pm

    On Sat, Feb 13, 2010 at 06:54:33PM +0000, Nicholas Clark wrote:
    On Sat, Feb 13, 2010 at 12:15:56AM -0500, Eric Brine wrote:
    Take 2.

    substr() must accept both UVs and IVs for its $pos and $len arguments in
    order to ensure any part of the string can be extracted, and to allow
    negative indexing. (Even then, they are limits on the range of negative
    indexing.) This patches provides that range.

    It also fixes:

    $ perl -wle'$[=2; print substr("abcdefghij", 1);'
    panic: sv_setpvn called with negative strlen at -e line 1.
    Whilst I've looked at the patch, and don't see anything wrong with it, for
    some reason I'm uneasy about applying it today. (Tomorrow should be fine).
    Thanks, applied as 777f7c561610dee641c77666e5a4a0d9ac1d4230.
    Tweaked slightly with 1c90055717b05b3f652bf543a46419001314c53e.

    I think I worked out what I don't really like, and it's actually the API
    to sv_pos_u2b(), which has two value/return pointers, but returns void.
    (sv_pos_b2u() is worse. It uses pointers when it could just use values):

    So I renamed "_proper" to "_flags", and changed it to return the byte
    conversion of uoffset:

    -void
    -Perl_sv_pos_u2b_proper(pTHX_ register SV *const sv, STRLEN *const offsetp, STRLEN *const lenp)

    +STRLEN
    +Perl_sv_pos_u2b_flags(pTHX_ SV *const sv, STRLEN uoffset, STRLEN *const lenp,
    + U32 flags)


    I'd still like to address this related embarrassment:
    Whilst this *isn't* technically the same bug, it is related to I32 abuse and
    the UTF-8 caching code. It's rather embarrassing that something as simple as
    length can return the wrong result, or panic:

    $ time ./perl -lwe 'open A, shift or die $!; read A, $a, (1<<32) + 4; chop $a; warn length $a; substr $a, 0x100000000, 1, chr 256; print ord substr $a, 0x100000000; print ord substr $a, 0xFFFFFFF0; warn length $a; warn length $a' /dev/zero
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    Hexadecimal number > 0xffffffff non-portable at -e line 1.
    4294967299 at -e line 1.
    256
    panic: sv_len_utf8 cache 3 real 4294967299 for Ä at -e line 1.

    real 37m17.458s
    user 2m56.791s
    sys 1m11.734s


    I propose the following work around, which disables storing a bad value in
    the cache for the length in Unicode characters, if that value has wrapped:

    diff --git a/sv.c b/sv.c
    index 02be580..87fc348 100644
    --- a/sv.c
    +++ b/sv.c
    @@ -6072,6 +6072,10 @@ Perl_sv_len_utf8(pTHX_ register SV *const sv)
    }
    assert(mg);
    mg->mg_len = ulen;
    + /* For now, treat "overflowed" as "still unknown".
    + See RT #72784. */
    + if (ulen != (STRLEN) mg->mg_len)
    + mg->mg_len = -1;
    }
    }
    return ulen;


    It avoids the panic, and produces the right results (albeit slowly, but 10
    times faster than the panic takes to end):
    Nicholas Clark
  • Eric Brine at Feb 14, 2010 at 7:23 pm

    On Sun, Feb 14, 2010 at 11:50 AM, Nicholas Clark wrote:

    So I renamed "_proper" to "_flags", and changed it to return the byte
    conversion of uoffset:

    -void
    -Perl_sv_pos_u2b_proper(pTHX_ register SV *const sv, STRLEN *const offsetp,
    STRLEN *const lenp)

    +STRLEN
    +Perl_sv_pos_u2b_flags(pTHX_ SV *const sv, STRLEN uoffset, STRLEN *const
    lenp,
    + U32 flags)
    It looks like you haven't committed this yet. Before you do, can you please
    move PERL_ARGS_ASSERT_SV_POS_U2B to Perl_sv_pos_u2b() and add
    PERL_ARGS_ASSERT_SV_POS_U2B_FLAGS to Perl_sv_pos_u2b_flags()? Thanks
  • Nicholas Clark at Feb 16, 2010 at 10:40 am

    On Sun, Feb 14, 2010 at 02:23:37PM -0500, Eric Brine wrote:
    On Sun, Feb 14, 2010 at 11:50 AM, Nicholas Clark wrote:

    So I renamed "_proper" to "_flags", and changed it to return the byte
    conversion of uoffset:

    -void
    -Perl_sv_pos_u2b_proper(pTHX_ register SV *const sv, STRLEN *const offsetp,
    STRLEN *const lenp)

    +STRLEN
    +Perl_sv_pos_u2b_flags(pTHX_ SV *const sv, STRLEN uoffset, STRLEN *const
    lenp,
    + U32 flags)
    It looks like you haven't committed this yet. Before you do, can you please
    move PERL_ARGS_ASSERT_SV_POS_U2B to Perl_sv_pos_u2b() and add
    PERL_ARGS_ASSERT_SV_POS_U2B_FLAGS to Perl_sv_pos_u2b_flags()? Thanks
    I had committed it, and fixed that part, and found a bug in the test that's
    supposed to catch those: http://rt.perl.org/rt3/Ticket/Display.html?id=72800

    Nicholas Clark

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl5-porters @
categoriesperl
postedFeb 13, '10 at 5:16a
activeFeb 16, '10 at 10:40a
posts5
users2
websiteperl.org

2 users in discussion

Nicholas Clark: 3 posts Eric Brine: 2 posts

People

Translate

site design / logo © 2022 Grokbase