FAQ
Hi all,

I noticed awhile ago how most every use of zend[_u]_hash_init has nSize as
0. Of course it isn't always known how many elements will be added to the
array (nTableSize), but there are places where an accurate value could be
used instead of getting the minimum default (8). For anyone who doesn't
know, HashTable sizes are a power of 2, and when there are more than that
many elements, zend_hash_do_resize realloc's more memory to double the table
size, and then zend_hash_rehash basically goes through and "reinserts" all
the elements again.

I ran some tests to see the speed improvement from specifying the *actual*
number of elements in hash_init, thus saving that extra work (_rehash
mostly). The "n+1" is with one more element (9, 17, 33...) that triggers
another rehash when the actual number wasn't specified (most extreme). (I
used add_next_index_zval, so the direct zend_hash* functions would give
slightly higher % I guess, being less overhead, right?) Results with
5.2.0-RC4:

Elements
n | n n+1
---------------------
8 | 0% 14.2%
16 | 11.0% 20.9%
32 | 13.5% 22.1%
64 | 12.6% 21.3%
128 | 11.7% 18.5%
256 | 9.3% 16.4%
512 | 8.6% 17.0%
1024 | 7.9% 15.8%
2048 | 4.8% 33.3%
4096 | 7.8% 28.3%
8192 | 10.2% 58.4%
16384 | 24.1% 70.5%
32768 | 34.5% 80.4%
65536 | 34.8% 68.6%

I haven't looked thoroughly, but the only place I've seen that uses an
non-zero nSize in hash_init (besides some in the core engine) is the
unserialize() function. :-/ It seems the most simple and obvious place that
could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just
replace

zend[_u]_hash_init(tmp_ht, 0, ...
with
zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...

Other than that, some of PHP's array functions and array-returning functions
(database fetch row functions, etc.) are ones I thought of that could be
optimized like this. Maybe create an array_init_size() macro to be used in
places where the number of elements can be easily determined? Thoughts?
I'd volunteer to look for places that could be updated and modify them. :-)


Thanks,
Matt

P.S. I guess the array stuff applies for objects also? But I don't know
much about their internals...

Search Discussions

  • Nuno Lopes at Sep 27, 2006 at 6:35 pm
    Aaahh nice catch ;)

    From a quick lookup in PHP sources I also found these functions were the
    same optimization can be applied:
    zend_default_exception_new_ex() - trivial
    convert_to_array() - not as easy as others
    zend_object_std_init() - possibly
    clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
    compiler_globals_ctor()
    ...


    Nuno

    ----- Original Message -----
    Hi all,

    I noticed awhile ago how most every use of zend[_u]_hash_init has nSize as
    0. Of course it isn't always known how many elements will be added to the
    array (nTableSize), but there are places where an accurate value could be
    used instead of getting the minimum default (8). For anyone who doesn't
    know, HashTable sizes are a power of 2, and when there are more than that
    many elements, zend_hash_do_resize realloc's more memory to double the
    table
    size, and then zend_hash_rehash basically goes through and "reinserts" all
    the elements again.

    I ran some tests to see the speed improvement from specifying the *actual*
    number of elements in hash_init, thus saving that extra work (_rehash
    mostly). The "n+1" is with one more element (9, 17, 33...) that triggers
    another rehash when the actual number wasn't specified (most extreme). (I
    used add_next_index_zval, so the direct zend_hash* functions would give
    slightly higher % I guess, being less overhead, right?) Results with
    5.2.0-RC4:

    Elements
    n | n n+1
    ---------------------
    8 | 0% 14.2%
    16 | 11.0% 20.9%
    32 | 13.5% 22.1%
    64 | 12.6% 21.3%
    128 | 11.7% 18.5%
    256 | 9.3% 16.4%
    512 | 8.6% 17.0%
    1024 | 7.9% 15.8%
    2048 | 4.8% 33.3%
    4096 | 7.8% 28.3%
    8192 | 10.2% 58.4%
    16384 | 24.1% 70.5%
    32768 | 34.5% 80.4%
    65536 | 34.8% 68.6%

    I haven't looked thoroughly, but the only place I've seen that uses an
    non-zero nSize in hash_init (besides some in the core engine) is the
    unserialize() function. :-/ It seems the most simple and obvious place
    that
    could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just
    replace

    zend[_u]_hash_init(tmp_ht, 0, ...
    with
    zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...

    Other than that, some of PHP's array functions and array-returning
    functions
    (database fetch row functions, etc.) are ones I thought of that could be
    optimized like this. Maybe create an array_init_size() macro to be used
    in
    places where the number of elements can be easily determined? Thoughts?
    I'd volunteer to look for places that could be updated and modify them.
    :-)


    Thanks,
    Matt

    P.S. I guess the array stuff applies for objects also? But I don't know
    much about their internals...
  • Matt Wilmas at Nov 5, 2006 at 12:30 pm
    Hi Nuno,

    Late reply, but I'm glad the idea was able to be used. :-)


    To the other hash people: While checking how Ilia made implode() faster in
    5.2, I think I found another optimization that can be done. In
    zend_hash_copy/merge, it seems the zend_hash_quick_* functions can be used
    instead for associative entries, thus saving the key from being hashed
    again, right? I just checked, and it's definitely faster, how much so being
    dependent on the key length of course...

    Patches for this simple additional optimization (hopefully):
    http://realplain.com/php/hash_quick_copy.diff
    http://realplain.com/php/hash_quick_copy_5_2.diff


    Thanks,
    Matt


    ----- Original Message -----
    From: "Nuno Lopes"
    Sent: Wednesday, September 27, 2006
    Aaahh nice catch ;)
    From a quick lookup in PHP sources I also found these functions were the
    same optimization can be applied:
    zend_default_exception_new_ex() - trivial
    convert_to_array() - not as easy as others
    zend_object_std_init() - possibly
    clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
    compiler_globals_ctor()
    ...


    Nuno

    ----- Original Message -----
    Hi all,

    I noticed awhile ago how most every use of zend[_u]_hash_init has nSize
    as
    0. Of course it isn't always known how many elements will be added to
    the
    array (nTableSize), but there are places where an accurate value could
    be
    used instead of getting the minimum default (8). For anyone who doesn't
    know, HashTable sizes are a power of 2, and when there are more than
    that
    many elements, zend_hash_do_resize realloc's more memory to double the
    table
    size, and then zend_hash_rehash basically goes through and "reinserts"
    all
    the elements again.

    I ran some tests to see the speed improvement from specifying the
    *actual*
    number of elements in hash_init, thus saving that extra work (_rehash
    mostly). The "n+1" is with one more element (9, 17, 33...) that
    triggers
    another rehash when the actual number wasn't specified (most extreme).
    (I
    used add_next_index_zval, so the direct zend_hash* functions would give
    slightly higher % I guess, being less overhead, right?) Results with
    5.2.0-RC4:

    Elements
    n | n n+1
    ---------------------
    8 | 0% 14.2%
    16 | 11.0% 20.9%
    32 | 13.5% 22.1%
    64 | 12.6% 21.3%
    128 | 11.7% 18.5%
    256 | 9.3% 16.4%
    512 | 8.6% 17.0%
    1024 | 7.9% 15.8%
    2048 | 4.8% 33.3%
    4096 | 7.8% 28.3%
    8192 | 10.2% 58.4%
    16384 | 24.1% 70.5%
    32768 | 34.5% 80.4%
    65536 | 34.8% 68.6%

    I haven't looked thoroughly, but the only place I've seen that uses an
    non-zero nSize in hash_init (besides some in the core engine) is the
    unserialize() function. :-/ It seems the most simple and obvious place
    that
    could be changed is in Zend/zend_variables.c:_zval_copy_ctor_func? Just
    replace

    zend[_u]_hash_init(tmp_ht, 0, ...
    with
    zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...

    Other than that, some of PHP's array functions and array-returning
    functions
    (database fetch row functions, etc.) are ones I thought of that could be
    optimized like this. Maybe create an array_init_size() macro to be used
    in
    places where the number of elements can be easily determined? Thoughts?
    I'd volunteer to look for places that could be updated and modify them.
    :-)


    Thanks,
    Matt

    P.S. I guess the array stuff applies for objects also? But I don't
    know
    much about their internals...
  • Ilia Alshanetsky at Nov 7, 2006 at 8:35 pm
    Looks like a good optimization to me.

    On 5-Nov-06, at 7:30 AM, Matt Wilmas wrote:

    Hi Nuno,

    Late reply, but I'm glad the idea was able to be used. :-)


    To the other hash people: While checking how Ilia made implode()
    faster in
    5.2, I think I found another optimization that can be done. In
    zend_hash_copy/merge, it seems the zend_hash_quick_* functions can
    be used
    instead for associative entries, thus saving the key from being hashed
    again, right? I just checked, and it's definitely faster, how much
    so being
    dependent on the key length of course...

    Patches for this simple additional optimization (hopefully):
    http://realplain.com/php/hash_quick_copy.diff
    http://realplain.com/php/hash_quick_copy_5_2.diff


    Thanks,
    Matt


    ----- Original Message -----
    From: "Nuno Lopes"
    Sent: Wednesday, September 27, 2006
    Aaahh nice catch ;)
    From a quick lookup in PHP sources I also found these functions
    were the
    same optimization can be applied:
    zend_default_exception_new_ex() - trivial
    convert_to_array() - not as easy as others
    zend_object_std_init() - possibly
    clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
    compiler_globals_ctor()
    ...


    Nuno

    ----- Original Message -----
    Hi all,

    I noticed awhile ago how most every use of zend[_u]_hash_init has
    nSize
    as
    0. Of course it isn't always known how many elements will be
    added to
    the
    array (nTableSize), but there are places where an accurate value
    could
    be
    used instead of getting the minimum default (8). For anyone who
    doesn't
    know, HashTable sizes are a power of 2, and when there are more than
    that
    many elements, zend_hash_do_resize realloc's more memory to
    double the
    table
    size, and then zend_hash_rehash basically goes through and
    "reinserts"
    all
    the elements again.

    I ran some tests to see the speed improvement from specifying the
    *actual*
    number of elements in hash_init, thus saving that extra work
    (_rehash
    mostly). The "n+1" is with one more element (9, 17, 33...) that
    triggers
    another rehash when the actual number wasn't specified (most
    extreme).
    (I
    used add_next_index_zval, so the direct zend_hash* functions
    would give
    slightly higher % I guess, being less overhead, right?) Results
    with
    5.2.0-RC4:

    Elements
    n | n n+1
    ---------------------
    8 | 0% 14.2%
    16 | 11.0% 20.9%
    32 | 13.5% 22.1%
    64 | 12.6% 21.3%
    128 | 11.7% 18.5%
    256 | 9.3% 16.4%
    512 | 8.6% 17.0%
    1024 | 7.9% 15.8%
    2048 | 4.8% 33.3%
    4096 | 7.8% 28.3%
    8192 | 10.2% 58.4%
    16384 | 24.1% 70.5%
    32768 | 34.5% 80.4%
    65536 | 34.8% 68.6%

    I haven't looked thoroughly, but the only place I've seen that
    uses an
    non-zero nSize in hash_init (besides some in the core engine) is the
    unserialize() function. :-/ It seems the most simple and obvious
    place
    that
    could be changed is in Zend/
    zend_variables.c:_zval_copy_ctor_func? Just
    replace

    zend[_u]_hash_init(tmp_ht, 0, ...
    with
    zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...

    Other than that, some of PHP's array functions and array-returning
    functions
    (database fetch row functions, etc.) are ones I thought of that
    could be
    optimized like this. Maybe create an array_init_size() macro to
    be used
    in
    places where the number of elements can be easily determined?
    Thoughts?
    I'd volunteer to look for places that could be updated and modify
    them.
    :-)


    Thanks,
    Matt

    P.S. I guess the array stuff applies for objects also? But I don't
    know
    much about their internals...
    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
    Ilia Alshanetsky
  • Dmitry Stogov at Nov 8, 2006 at 4:02 pm
    The patch is fine.
    I applied it.

    Dmitry.
    -----Original Message-----
    From: Ilia Alshanetsky On Behalf Of
    Ilia Alshanetsky
    Sent: Tuesday, November 07, 2006 11:35 PM
    To: Matt Wilmas
    Cc: internals@lists.php.net; Dmitry Stogov
    Subject: Re: [PHP-DEV] array/HashTable filling optimization


    Looks like a good optimization to me.

    On 5-Nov-06, at 7:30 AM, Matt Wilmas wrote:

    Hi Nuno,

    Late reply, but I'm glad the idea was able to be used. :-)


    To the other hash people: While checking how Ilia made implode()
    faster in
    5.2, I think I found another optimization that can be done. In
    zend_hash_copy/merge, it seems the zend_hash_quick_* functions can
    be used
    instead for associative entries, thus saving the key from
    being hashed
    again, right? I just checked, and it's definitely faster, how much
    so being
    dependent on the key length of course...

    Patches for this simple additional optimization (hopefully):
    http://realplain.com/php/hash_quick_copy.diff
    http://realplain.com/php/hash_quick_copy_5_2.diff


    Thanks,
    Matt


    ----- Original Message -----
    From: "Nuno Lopes"
    Sent: Wednesday, September 27, 2006
    Aaahh nice catch ;)
    From a quick lookup in PHP sources I also found these functions
    were the
    same optimization can be applied:
    zend_default_exception_new_ex() - trivial
    convert_to_array() - not as easy as others
    zend_object_std_init() - possibly
    clone_wrapper_hash() - trivial (php-src/main/streams/streams.c)
    compiler_globals_ctor()
    ...


    Nuno

    ----- Original Message -----
    Hi all,

    I noticed awhile ago how most every use of zend[_u]_hash_init has
    nSize
    as
    0. Of course it isn't always known how many elements will be
    added to
    the
    array (nTableSize), but there are places where an accurate value
    could
    be
    used instead of getting the minimum default (8). For anyone who
    doesn't
    know, HashTable sizes are a power of 2, and when there
    are more than
    that
    many elements, zend_hash_do_resize realloc's more memory to
    double the
    table
    size, and then zend_hash_rehash basically goes through and
    "reinserts"
    all
    the elements again.

    I ran some tests to see the speed improvement from specifying the
    *actual*
    number of elements in hash_init, thus saving that extra work
    (_rehash
    mostly). The "n+1" is with one more element (9, 17, 33...) that
    triggers
    another rehash when the actual number wasn't specified (most
    extreme).
    (I
    used add_next_index_zval, so the direct zend_hash* functions
    would give
    slightly higher % I guess, being less overhead, right?) Results
    with
    5.2.0-RC4:

    Elements
    n | n n+1
    ---------------------
    8 | 0% 14.2%
    16 | 11.0% 20.9%
    32 | 13.5% 22.1%
    64 | 12.6% 21.3%
    128 | 11.7% 18.5%
    256 | 9.3% 16.4%
    512 | 8.6% 17.0%
    1024 | 7.9% 15.8%
    2048 | 4.8% 33.3%
    4096 | 7.8% 28.3%
    8192 | 10.2% 58.4%
    16384 | 24.1% 70.5%
    32768 | 34.5% 80.4%
    65536 | 34.8% 68.6%

    I haven't looked thoroughly, but the only place I've seen that
    uses an
    non-zero nSize in hash_init (besides some in the core
    engine) is the
    unserialize() function. :-/ It seems the most simple and
    obvious
    place
    that
    could be changed is in Zend/
    zend_variables.c:_zval_copy_ctor_func? Just
    replace

    zend[_u]_hash_init(tmp_ht, 0, ...
    with
    zend[_u]_hash_init(tmp_ht, original_ht->nNumOfElements, ...

    Other than that, some of PHP's array functions and
    array-returning
    functions (database fetch row functions, etc.) are ones I
    thought of
    that
    could be
    optimized like this. Maybe create an array_init_size() macro to
    be used
    in
    places where the number of elements can be easily determined?
    Thoughts?
    I'd volunteer to look for places that could be updated
    and modify
    them.
    :-)


    Thanks,
    Matt

    P.S. I guess the array stuff applies for objects also?
    But I don't
    know
    much about their internals...
    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
    Ilia Alshanetsky

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedSep 27, '06 at 12:34p
activeNov 8, '06 at 4:02p
posts5
users4
websitephp.net

People

Translate

site design / logo © 2022 Grokbase