FAQ
Hi,

Recently I took a look into jemalloc and tcmalloc internals and tried to
borrow some ideas. You may check the result at
https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
concept implementation of New Memory Manager for PHP. It was tested only on
Linux, release, non-ZTS build. It misses support for debug mode and ZTS
yet. The main advantage is small but consistent speed improvement on
real-life applications.

I would appreciate if you bench it vs vanilla PHP-5.6 on your applications,
review the code from performance and security points of views, and come
with comments, ideas and criticism. (For example: may be someone would
suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
usage on each emalloc() call? How to reduce cost of statistics collection?)

Currently, I'm not sure if 5% speed improvements costs the effort.

The results of my benchmarks follow.

Thanks. Dmitry.

    *PHP-5.6 32-bit* zend_alloc xx_malloc Improvement blog 105.6 109.7
3.88% drupal
1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4 11865.9 -1.51% qdig
464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress 188.9 196.8 4.19% xoops
132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1 Hello 1153.2 1228.4 6.52% ZF2
Test 263.0 275.5 4.75%







  *PHP-5.6 64-bit* zend_alloc xx_malloc Improvement blog 99.0 102.3
3.33% drupal
1531.1 1604.2 4.78% fw 197.6 206.5 4.50% hello 11702.0 11875.2 1.48% qdig
451.1 476.8 5.70% typo3 541.8 568.7 4.96% wordpress 176.5 185.8 5.27% xoops
126.1 134.5 6.66% scrum 169.3 174.9 3.31% ZF1 Hello 1042.9 1119.6 7.35% ZF2
Test 238.4 242.5 1.72%

Search Discussions

  • Julien Pauli at Jan 14, 2014 at 2:51 pm

    On Tue, Jan 14, 2014 at 3:21 PM, Dmitry Stogov wrote:
    Hi,

    Recently I took a look into jemalloc and tcmalloc internals and tried to
    borrow some ideas. You may check the result at
    https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
    concept implementation of New Memory Manager for PHP. It was tested only on
    Linux, release, non-ZTS build. It misses support for debug mode and ZTS
    yet. The main advantage is small but consistent speed improvement on
    real-life applications.

    I would appreciate if you bench it vs vanilla PHP-5.6 on your applications,
    review the code from performance and security points of views, and come
    with comments, ideas and criticism. (For example: may be someone would
    suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
    usage on each emalloc() call? How to reduce cost of statistics collection?)

    Currently, I'm not sure if 5% speed improvements costs the effort.

    The results of my benchmarks follow.

    Thanks. Dmitry.

    *PHP-5.6 32-bit* zend_alloc xx_malloc Improvement blog 105.6 109.7
    3.88% drupal
    1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4 11865.9 -1.51% qdig
    464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress 188.9 196.8 4.19% xoops
    132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1 Hello 1153.2 1228.4 6.52% ZF2
    Test 263.0 275.5 4.75%
    Great Dmitry !

    We worked on something with Joe few months ago, mainly adding a new
    ZendMM handler which binds jemalloc().

    Anyway, I'm gonna try your code and run it against several Symfony2
    applications.
    I come back to you end of week with some results ;-)

    Julien
  • Dmitry Stogov at Jan 14, 2014 at 4:45 pm
    Of course I tried to plug jemalloc and tcmalloc but they make slowdown
    instead of speedup, mainly because zend_alloc was especially designed for
    PHP and also because they suffer from multi-threading support overhead. On
    the other hand profiling PHP with oprofile I saw a lot of cache misses in
    zend_alloc.c, especially because of linked list handling. So I tried to
    combine the best from all approaches and then spend a couple of week tuning
    it.

    Thanks. Dmitry.


    On Tue, Jan 14, 2014 at 6:50 PM, Julien Pauli wrote:
    On Tue, Jan 14, 2014 at 3:21 PM, Dmitry Stogov wrote:
    Hi,

    Recently I took a look into jemalloc and tcmalloc internals and tried to
    borrow some ideas. You may check the result at
    https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
    concept implementation of New Memory Manager for PHP. It was tested only on
    Linux, release, non-ZTS build. It misses support for debug mode and ZTS
    yet. The main advantage is small but consistent speed improvement on
    real-life applications.

    I would appreciate if you bench it vs vanilla PHP-5.6 on your
    applications,
    review the code from performance and security points of views, and come
    with comments, ideas and criticism. (For example: may be someone would
    suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
    usage on each emalloc() call? How to reduce cost of statistics
    collection?)
    Currently, I'm not sure if 5% speed improvements costs the effort.

    The results of my benchmarks follow.

    Thanks. Dmitry.

    *PHP-5.6 32-bit* zend_alloc xx_malloc Improvement blog 105.6 109.7
    3.88% drupal
    1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4 11865.9 -1.51% qdig
    464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress 188.9 196.8 4.19% xoops
    132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1 Hello 1153.2 1228.4
    6.52% ZF2
    Test 263.0 275.5 4.75%
    Great Dmitry !

    We worked on something with Joe few months ago, mainly adding a new
    ZendMM handler which binds jemalloc().

    Anyway, I'm gonna try your code and run it against several Symfony2
    applications.
    I come back to you end of week with some results ;-)

    Julien
  • Julien Pauli at Jan 14, 2014 at 6:22 pm

    On Tue, Jan 14, 2014 at 5:45 PM, Dmitry Stogov wrote:
    Of course I tried to plug jemalloc and tcmalloc but they make slowdown
    instead of speedup, mainly because zend_alloc was especially designed for
    PHP and also because they suffer from multi-threading support overhead. On
    the other hand profiling PHP with oprofile I saw a lot of cache misses in
    zend_alloc.c, especially because of linked list handling. So I tried to
    combine the best from all approaches and then spend a couple of week tuning
    it.

    Thanks. Dmitry.
    Yes, I was reading the great job you've done so far !
    Looking forward in testing this myself and why not fix bugs or give
    some more ideas :-)

    Anyway, the different pool sizes is nice.
    We already got an idea like this in ZendMM with the "small free block"
    VS "free block" linked lists, but the implementation you've done so
    far is pretty nice evolution.

    I think we can improve stuff by studying more accurate caches for
    frequently used C-objects such as zvals or zend_object's structures.

    So many ideas, glad to see you're having fun with them ;-)

    Julien.P
  • Terry Ellison at Jan 14, 2014 at 11:48 pm

    On 14/01/14 18:21, Julien Pauli wrote:
    On Tue, Jan 14, 2014 at 5:45 PM, Dmitry Stogov wrote:
    Of course I tried to plug jemalloc and tcmalloc but they make slowdown
    instead of speedup, mainly because zend_alloc was especially designed for
    PHP and also because they suffer from multi-threading support overhead. On
    the other hand profiling PHP with oprofile I saw a lot of cache misses in
    zend_alloc.c, especially because of linked list handling. So I tried to
    combine the best from all approaches and then spend a couple of week tuning
    it.

    Thanks. Dmitry.
    Yes, I was reading the great job you've done so far !
    Looking forward in testing this myself and why not fix bugs or give
    some more ideas :-)

    Anyway, the different pool sizes is nice.
    We already got an idea like this in ZendMM with the "small free block"
    VS "free block" linked lists, but the implementation you've done so
    far is pretty nice evolution.

    I think we can improve stuff by studying more accurate caches for
    frequently used C-objects such as zvals or zend_object's structures.
    One of the most common macro uses for both emalloc and efree primitives
    is in the CTOR and DTOR of zval structures. This is HOT code, yet the
    code involved is split across three main modules totalling less that 10K
    lines. As a comparison zend_vm_execute.h is over 40K lines to allow the
    CC optimizer to optimize across the entire source code.

    Wouldn't it make a lot more sense to combine zend_variables.c,
    zend_hash.c and zend_alloc.c into a single module (do it by #include
    directive as zend_execute.c incorporates zend_vm_execute.h) so that the
    CC optimizer can properly optimitise these CTORs and DTORs? The DTOR
    for a ZVAL which itself a simple ZVAL hierarchy such as a an array
    should be executed as a dense code sequence that can comfortably run out
    of the L1Instr cache, with minimal cache misses and BLT failure stalls.

    Regards Terry
  • Dmitry Stogov at Jan 15, 2014 at 7:06 am
    Hi Terry,

    May be I misunderstood you.

    Macros must be inlined at compile-time anyway.
    Inlining of "slow-paths" of zval_copy_ctor/zval_dtor would cause "code
    explosion" and increase cache misses.
    However on Linux it must possible to put "hot" functions in one code
    section and reduce cache misses.

    Anyway, it's unrelated to Memory Manager.

    Thanks. Dmitry.


    On Wed, Jan 15, 2014 at 3:47 AM, Terry Ellison wrote:

    On 14/01/14 18:21, Julien Pauli wrote:

    On Tue, Jan 14, 2014 at 5:45 PM, Dmitry Stogov <dmitry@zend.com> wrote:

    Of course I tried to plug jemalloc and tcmalloc but they make slowdown
    instead of speedup, mainly because zend_alloc was especially designed for
    PHP and also because they suffer from multi-threading support overhead. On
    the other hand profiling PHP with oprofile I saw a lot of cache misses in
    zend_alloc.c, especially because of linked list handling. So I tried to
    combine the best from all approaches and then spend a couple of week tuning
    it.

    Thanks. Dmitry.


    Yes, I was reading the great job you've done so far !
    Looking forward in testing this myself and why not fix bugs or give
    some more ideas :-)

    Anyway, the different pool sizes is nice.
    We already got an idea like this in ZendMM with the "small free block"
    VS "free block" linked lists, but the implementation you've done so
    far is pretty nice evolution.

    I think we can improve stuff by studying more accurate caches for
    frequently used C-objects such as zvals or zend_object's structures.

    One of the most common macro uses for both emalloc and efree primitives
    is in the CTOR and DTOR of zval structures. This is HOT code, yet the code
    involved is split across three main modules totalling less that 10K lines.
    As a comparison zend_vm_execute.h is over 40K lines to allow the CC
    optimizer to optimize across the entire source code.

    Wouldn't it make a lot more sense to combine zend_variables.c, zend_hash.c
    and zend_alloc.c into a single module (do it by #include directive as
    zend_execute.c incorporates zend_vm_execute.h) so that the CC optimizer can
    properly optimitise these CTORs and DTORs? The DTOR for a ZVAL which
    itself a simple ZVAL hierarchy such as a an array should be executed as a
    dense code sequence that can comfortably run out of the L1Instr cache, with
    minimal cache misses and BLT failure stalls.

    Regards Terry
  • Christian Seiler at Jan 14, 2014 at 7:38 pm
    Hi there,
    (For example: may be someone would
    suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
    usage on each emalloc() call?
    Rename emalloc() -> real_emalloc(), then:

    .h:
    extern void (*emalloc)(size_t n);

    .c:
    void (*emalloc)(size_t n) = real_emalloc;

    startup code:

    if (USE_ZEND_ALLOC is 0) {
        emalloc = malloc;
    }

    Probably some adjustments needed (esp. for potentially different calling
    conventions, so maybe malloc() will need a small wrapper), this is just
    from the top of my head.

    Note: this breaks ABI compatibility on most archs (because it changes
    the symbol type). But could be done independently of anything else.

    Note 2: Depending on system architecture, each call to the function may
    incur an additional (small) penalty. No idea how this compares to the
    penalty of the if() at the start of emalloc().

    Regards,
    Christian
  • Dmitry Stogov at Jan 15, 2014 at 6:58 am
    Hi Christian,

    It's a clear solution, but indirect call may cause a lot of branch
    miss-predictions, so it needs to be tested if it can improve performance.
    binary compatibility is going to be broken anyway, so it's not a problem.

    Thanks. Dmitry.

    On Tue, Jan 14, 2014 at 11:38 PM, Christian Seiler wrote:

    Hi there,


    (For example: may be someone would
    suggest how to avoid check for USE_ZEND_ALLOC=0 to allow system malloc()
    usage on each emalloc() call?
    Rename emalloc() -> real_emalloc(), then:

    .h:
    extern void (*emalloc)(size_t n);

    .c:
    void (*emalloc)(size_t n) = real_emalloc;

    startup code:

    if (USE_ZEND_ALLOC is 0) {
    emalloc = malloc;
    }

    Probably some adjustments needed (esp. for potentially different calling
    conventions, so maybe malloc() will need a small wrapper), this is just
    from the top of my head.

    Note: this breaks ABI compatibility on most archs (because it changes
    the symbol type). But could be done independently of anything else.

    Note 2: Depending on system architecture, each call to the function may
    incur an additional (small) penalty. No idea how this compares to the
    penalty of the if() at the start of emalloc().

    Regards,
    Christian
  • Anatol Belski at Jan 15, 2014 at 8:44 am
    Hi Dmitry,
    On Tue, January 14, 2014 15:21, Dmitry Stogov wrote:
    Hi,


    Recently I took a look into jemalloc and tcmalloc internals and tried to
    borrow some ideas. You may check the result at
    https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
    concept implementation of New Memory Manager for PHP. It was tested only
    on Linux, release, non-ZTS build. It misses support for debug mode and ZTS
    yet. The main advantage is small but consistent speed improvement on
    real-life applications.
    i've just gave it a try on windows, the compilation breaks with this error

    zend\xx_malloc.c(41) : fatal error C1083: Cannot open include file:
    'sys/mman.h': No such file or directory

    Google gave me this link
    https://code.google.com/p/mman-win32/source/browse/trunk/ . I can go for a
    fix using this lib or maybe look for another equivalent after the
    str_size_and_int64 RFC finish.

    Regards

    Anatol
  • Dmitry Stogov at Jan 15, 2014 at 8:56 am
    Hi Anatol,

    It's a Prove of Concept implementation.
    I publish it to decide if it makes sense to integrate it into PHP and
    implement support for missed things or just forget.
    As I wrote, at this moment, it supports only Linux, non-zts, release build.

    Thanks. Dmitry.



    On Wed, Jan 15, 2014 at 12:44 PM, Anatol Belski wrote:

    Hi Dmitry,
    On Tue, January 14, 2014 15:21, Dmitry Stogov wrote:
    Hi,


    Recently I took a look into jemalloc and tcmalloc internals and tried to
    borrow some ideas. You may check the result at
    https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
    concept implementation of New Memory Manager for PHP. It was tested only
    on Linux, release, non-ZTS build. It misses support for debug mode and ZTS
    yet. The main advantage is small but consistent speed improvement on
    real-life applications.
    i've just gave it a try on windows, the compilation breaks with this error

    zend\xx_malloc.c(41) : fatal error C1083: Cannot open include file:
    'sys/mman.h': No such file or directory

    Google gave me this link
    https://code.google.com/p/mman-win32/source/browse/trunk/ . I can go for a
    fix using this lib or maybe look for another equivalent after the
    str_size_and_int64 RFC finish.

    Regards

    Anatol
  • Anatol Belski at Jan 15, 2014 at 9:11 am
    Hi Dmitry,
    On Wed, January 15, 2014 09:56, Dmitry Stogov wrote:
    Hi Anatol,


    It's a Prove of Concept implementation.
    I publish it to decide if it makes sense to integrate it into PHP and
    implement support for missed things or just forget. As I wrote, at this
    moment, it supports only Linux, non-zts, release build.

    Thanks. Dmitry.
    yep, I was aware of it. Right for this reason I took the 5 minutes risk of
    git clone and make :) Now we know at least where it'll need some more
    hacking.

    Regards

    anatol
  • Julien Pauli at Jan 18, 2014 at 8:14 pm

    On Wed, Jan 15, 2014 at 10:11 AM, Anatol Belski wrote:
    Hi Dmitry,
    On Wed, January 15, 2014 09:56, Dmitry Stogov wrote:
    Hi Anatol,


    It's a Prove of Concept implementation.
    I publish it to decide if it makes sense to integrate it into PHP and
    implement support for missed things or just forget. As I wrote, at this
    moment, it supports only Linux, non-zts, release build.

    Thanks. Dmitry.
    yep, I was aware of it. Right for this reason I took the 5 minutes risk of
    git clone and make :) Now we know at least where it'll need some more
    hacking.

    Regards

    anatol

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php

    I could not see improvement on a basic hello world under Symfony2.

    I will try more complex apps ;-)

    Julien
  • Nikita Popov at Jan 18, 2014 at 10:05 pm

    On Tue, Jan 14, 2014 at 3:21 PM, Dmitry Stogov wrote:
    Hi,

    Recently I took a look into jemalloc and tcmalloc internals and tried to
    borrow some ideas. You may check the result at
    https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
    concept implementation of New Memory Manager for PHP. It was tested only on
    Linux, release, non-ZTS build. It misses support for debug mode and ZTS
    yet. The main advantage is small but consistent speed improvement on
    real-life applications.

    I would appreciate if you bench it vs vanilla PHP-5.6 on your
    applications, review the code from performance and security points of
    views, and come with comments, ideas and criticism. (For example: may be
    someone would suggest how to avoid check for USE_ZEND_ALLOC=0 to allow
    system malloc() usage on each emalloc() call? How to reduce cost of
    statistics collection?)

    Currently, I'm not sure if 5% speed improvements costs the effort.

    The results of my benchmarks follow.

    Thanks. Dmitry.

    *PHP-5.6 32-bit* zend_alloc xx_malloc Improvement blog 105.6 109.7
    3.88% drupal 1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4
    11865.9 -1.51% qdig 464.4 495.3 6.65% typo3 563.8 584.9 3.74% wordpress
    188.9 196.8 4.19% xoops 132.7 140.0 5.50% scrum 181.6 192.7 6.11% ZF1
    Hello 1153.2 1228.4 6.52% ZF2 Test 263.0 275.5 4.75%







    *PHP-5.6 64-bit* zend_alloc xx_malloc Improvement blog 99.0 102.3 3.33% drupal
    1531.1 1604.2 4.78% fw 197.6 206.5 4.50% hello 11702.0 11875.2 1.48% qdig
    451.1 476.8 5.70% typo3 541.8 568.7 4.96% wordpress 176.5 185.8 5.27% xoops
    126.1 134.5 6.66% scrum 169.3 174.9 3.31% ZF1 Hello 1042.9 1119.6 7.35% ZF2
    Test 238.4 242.5 1.72%
    I tested your patch on some parsing code (which is very heavy on object
    creation for syntax trees) and saw ~12% performance improvement and ~15%
    memory usage improvement there. So looks like the new allocator works
    particularly well if a lot of object allocation is involved.

    Nikita
  • Dmitry Stogov at Jan 20, 2014 at 6:42 am
    Hi Nikita,

    12% improvement on real task looks amazing :)
    Was it on 32-bit or 64-bit PHP?

    Thanks. Dmitry.


    On Sun, Jan 19, 2014 at 2:05 AM, Nikita Popov wrote:
    On Tue, Jan 14, 2014 at 3:21 PM, Dmitry Stogov wrote:


    Hi,

    Recently I took a look into jemalloc and tcmalloc internals and tried to
    borrow some ideas. You may check the result at
    https://github.com/dstogov/php-src/tree/xx_malloc. It's a dirty prove of
    concept implementation of New Memory Manager for PHP. It was tested only on
    Linux, release, non-ZTS build. It misses support for debug mode and ZTS
    yet. The main advantage is small but consistent speed improvement on
    real-life applications.

    I would appreciate if you bench it vs vanilla PHP-5.6 on your
    applications, review the code from performance and security points of
    views, and come with comments, ideas and criticism. (For example: may be
    someone would suggest how to avoid check for USE_ZEND_ALLOC=0 to allow
    system malloc() usage on each emalloc() call? How to reduce cost of
    statistics collection?)

    Currently, I'm not sure if 5% speed improvements costs the effort.

    The results of my benchmarks follow.

    Thanks. Dmitry.

    *PHP-5.6 32-bit* zend_alloc xx_malloc Improvement blog 105.6 109.7
    3.88% drupal 1625.0 1667.6 2.62% fw 231.6 286.4 23.66% hello 12048.4
    11865.9 -1.51% qdig 464.4 495.3 6.65% typo3 563.8 584.9 3.74%
    wordpress 188.9 196.8 4.19% xoops 132.7 140.0 5.50% scrum 181.6 192.7
    6.11% ZF1 Hello 1153.2 1228.4 6.52% ZF2 Test 263.0 275.5 4.75%







    *PHP-5.6 64-bit* zend_alloc xx_malloc Improvement blog 99.0 102.3 3.33% drupal
    1531.1 1604.2 4.78% fw 197.6 206.5 4.50% hello 11702.0 11875.2 1.48% qdig
    451.1 476.8 5.70% typo3 541.8 568.7 4.96% wordpress 176.5 185.8 5.27% xoops
    126.1 134.5 6.66% scrum 169.3 174.9 3.31% ZF1 Hello 1042.9 1119.6 7.35% ZF2
    Test 238.4 242.5 1.72%
    I tested your patch on some parsing code (which is very heavy on object
    creation for syntax trees) and saw ~12% performance improvement and ~15%
    memory usage improvement there. So looks like the new allocator works
    particularly well if a lot of object allocation is involved.

    Nikita

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedJan 14, '14 at 2:22p
activeJan 20, '14 at 6:42a
posts14
users6
websitephp.net

People

Translate

site design / logo © 2021 Grokbase