FAQ
Folks, I'm hoping someone can clarify for me the limitations that PHP
5.2/5.3/6.0 is expected to put on the size of strings on 64 bit linux.

The php manual documentation for the string type makes the following
note about string size:

"Note: It is no problem for a string to become very large. PHP imposes
no boundary on the size of a string; the only limit is the available
memory of the computer on which PHP is running."

However, it is clear based on the behavior of PHP 5.2 and 5.3 on 64
bit systems with > 2GB of RAM, that PHP string functions do not behave
properly with strings that exceed 2^31 bytes. I filed the now
resolved bug #50207 on this matter due to a segfault during in-place
concatenation, however the solution now implicitly limits that
operation to only strings less than 2^31 bytes in length.
Additionally, when strings grow this large, the behaviors of strlen,
substr, concatenation, etc. are unreliable.

For example, php tries to allocate an impossible amount of memory when
concatenating two strings of 2^30 bytes, I presume because the
overflowed length of the new string is cast to size_t for allocation:
---
Code:
<?php
$s = str_repeat('A', pow(2,30));
$t = $s.str_repeat('B', pow(2,30));; // fails with segfault
printf("strlen: %u last-char: %s", strlen($s), substr($s, pow(2,30)-1));
?>
---
Result:
./sapi/cli/php -d memory_limit=-1 a2.php

Fatal error: Out of memory (allocated 2148270080) (tried to allocate
18446744071562067969 bytes) in /home/matt/tmp/php-src-5.2/a2.php on
line 3
----

So should strings be limited to 2GB on 64 bit systems, is PHP not 64
bit compatible, or are these behaviors that should have bugs filed for
them?

Thanks,
-matt

Search Discussions

  • Jvlad at Nov 19, 2009 at 10:52 pm

    Code:
    <?php
    $s = str_repeat('A', pow(2,30));
    $t = $s.str_repeat('B', pow(2,30));; // fails with segfault
    printf("strlen: %u last-char: %s", strlen($s), substr($s, pow(2,30)-1));
    ?>
    ---
    Result:
    ./sapi/cli/php -d memory_limit=-1 a2.php

    Fatal error: Out of memory (allocated 2148270080) (tried to allocate
    18446744071562067969 bytes) in /home/matt/tmp/php-src-5.2/a2.php on
    line 3
    ----
    hmmm, 18446744071562067969 is 0xFFFFFFFF80000001
    it seems a 32bit variable was used somewhere in the calculations and was
    assigned to a 64bit signed int.

    what particular version of php did you use?
    Did you try 5.3.1RC4? 5.2.12RC1?

    I'd try myself if I had 4GB of RAM.

    -jv
  • Matt Wirges at Nov 20, 2009 at 2:56 am

    On Thu, Nov 19, 2009 at 4:52 PM, jvlad wrote:
    Code:
    <?php
    $s = str_repeat('A', pow(2,30));
    $t = $s.str_repeat('B', pow(2,30));; // fails with segfault
    printf("strlen: %u last-char: %s", strlen($s), substr($s, pow(2,30)-1));
    ?>
    ---
    Result:
    ./sapi/cli/php -d memory_limit=-1 a2.php

    Fatal error: Out of memory (allocated 2148270080) (tried to allocate
    18446744071562067969 bytes) in /home/matt/tmp/php-src-5.2/a2.php on
    line 3
    ----
    hmmm, 18446744071562067969 is 0xFFFFFFFF80000001
    it seems a 32bit variable was used somewhere in the calculations and was
    assigned to a 64bit signed int.

    what particular version of php did you use?
    Did you try 5.3.1RC4? 5.2.12RC1?

    I'd try myself if I had 4GB of RAM.

    -jv
    I've tried using PHP 5.2.11, 5.3.0, and PHP 5.2 svn branch as of this
    morning (when verifying the bug fix).

    Perhaps I'm looking at this naively, but from what I can tell in the
    source, the length of a string is stored as a signed int in the
    zvalue_value union. It seems that the string operations within PHP
    expect sizeof(pointer) and sizeof(size_t) to be 32 bit (and of course
    unsigned). However, on 64bit system they are 64bit (and unsigned).

    Focusing for the moment again on the concat_function in Zend/zend_operators.c:

    1203 if (result==op1) { /* special case, perform operations on result */
    1204 uint res_len = op1->value.str.len + op2->value.str.len;
    1205
    1206 if (Z_STRLEN_P(result) < 0) {
    1207 efree(Z_STRVAL_P(result));
    1208 ZVAL_EMPTY_STRING(result);
    1209 zend_error(E_ERROR, "String size overflow");
    1210 }
    1211
    1212 result->value.str.val = erealloc(result->value.str.val, res_len+1);
    1213
    1214 memcpy(result->value.str.val+result->value.str.len,
    op2->value.str.val, op2->value.str.len);
    1215 result->value.str.val[res_len]=0;
    1216 result->value.str.len = res_len;
    1217 } else {

    The problem with the segfault in memcpy from bug 50207 was that the
    pointer result->value.str.val is a 64 bit unsigned integer, and of
    course result->value.str.len is a signed 32 bit integer. The value of
    result->value.str.len is implicitly cast then to unsigned 64 bit int,
    which of course ends up with us trying to add a multi-exabyte offset
    to the original string pointer and thus segfaulting on access. Of
    course the bug fix (lines 1206-1210) prevents this now, but doesn't
    allow us to in-place concatenate two strings whose initial length is
    2^31 or greater.

    If you look at the other half of the concat operation:

    1217 } else {
    1218 result->value.str.len = op1->value.str.len + op2->value.str.len;
    1219 result->value.str.val = (char *)
    emalloc(result->value.str.len + 1);
    1220 memcpy(result->value.str.val, op1->value.str.val,
    op1->value.str.len);
    1221 memcpy(result->value.str.val+op1->value.str.len,
    op2->value.str.val, op2->value.str.len);
    1222 result->value.str.val[result->value.str.len] = 0;
    1223 result->type = IS_STRING;
    1224 }

    on line 1213 we pass result->value.str.len, which again is a 32 bit
    signed integer, to emalloc which expects it to be size_t. It is
    implicitly cast to an unsigned 64 bit integer. In the example in my
    previous email, when the length of the new string overflows the 32 bit
    signed int, we'll get huge values for the amount to attempt to
    allocate for the new string.


    -m
  • Jvlad at Nov 20, 2009 at 12:31 pm

    The problem with the segfault in memcpy from bug 50207 was that the
    pointer result->value.str.val is a 64 bit unsigned integer, and of
    course result->value.str.len is a signed 32 bit integer. The value of
    you're right, len is declared as int and it's indeed 32bit under 64bit
    Linux.
    It must be changed to long in order to have proper arithmetic with strings
    longer than 0x7fffffff bytes.

    I think it can't be changed in neither 5.2 nor 5.3, or it will break binary
    compatibility.
    Perhaps the change should be submited into 5.4 and 6.0 branches.

    -jv

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedNov 19, '09 at 8:18p
activeNov 20, '09 at 12:31p
posts4
users2
websitephp.net

2 users in discussion

Jvlad: 2 posts Matt Wirges: 2 posts

People

Translate

site design / logo © 2022 Grokbase