FAQ
Hi all,

As some of you may be aware, we (OpenParallel) have already successfully
shown that adding access to threading primitives (Intel's GPL TBB
library) can be a performance benefit to WordPress in the context of
PHP/HipHop¹.

We are now experimenting with adding similar functionality to PHP/Zend.
Now, they are really entirely different because one is a compiler and
the other is a virtual machine/interpreter.

So, the first obstacle we must overcome is being able to create working
slave executors. Once this is shown to work, then the executors can be
set to run in their own threads. Then we have the fun job of fixing all
the leaks :-). In the context of ZTS this is essentially:

void* newinterp = tsrm_new_interpreter_context();
tsrm_set_interpreter_context(newinterp);
init_executor(newinterp);

// ... do something with newinterp, eg zend_call_function ...

tsrm_set_interpreter_context(tsrm_ls);

(full code is available at http://github.com/openparallel/php-src.git )

This is somewhat similar to the work that a SAPI would have to do to
make an executor for use. In fact one of the possible tidy approaches to
implementing threading might be to treat threads as an internal SAPI.

However the call to init_executor() does not seem to completely set up
the executor to a working state, and trying to use it results in
crashes. Using gdb I can track these down, yet they are executor fields
which init_executor() at least tried to set up; I haven't tracked down
what is going on.

What I'm after is:

1) any hints or clues from people familiar with the Zend subsystems -
such as memory management, and the various stacks, to provide hints as
to how to set them up "correctly"

2) interested internals experts for continuing and/or providing ongoing
mentoring assistance (funding may be available for this)

3) pointers to any hardcore internals documentation which may be useful
in training up new internals experts. Right now I'm mostly just reading
source and making educated guesses; the online internals manuals seem
more focused on things like opcodes and nothing relevant to this.

Thanks in advance for any help!

Cheers,
Sam

1 -
http://openparallel.wordpress.com/2010/11/01/tbb-in-wordpress-–-white-paper/

Search Discussions

  • Sam Vilain at Jan 18, 2011 at 8:54 am

    On 18/01/11 17:21, Sam Vilain wrote:
    (full code is available at http://github.com/openparallel/php-src.git )
    *ahem* that should be http://github.com/openparallel/php-src

    In fact to skip straight to the function, try

    http://github.com/openparallel/php-src/blob/9205db3/ext/tbb/tbb.c#L208
    While I'm here, I may as well provide a version of that link which
    doesn't get mangled on the way through to the http gateway...

    http://xrl.us/hiphoptbbwordpress

    Sam
  • Stas Malyshev at Jan 18, 2011 at 9:17 am
    Hi!
    1) any hints or clues from people familiar with the Zend subsystems -
    such as memory management, and the various stacks, to provide hints as
    to how to set them up "correctly"
    Zend Engine keeps all state (including memory manager state, etc.)
    separate in each thread, which means once you've created a new thread it
    has to run initializations for the data structures. It should happen
    automatically when you build the engine in threaded mode
    (--enable-maintainer-zts).
    You can not share any data between the engine threads - unless you
    communicate it through some channel external to the engine - and even in
    this case you should use a copy, never the original pointer.
    This also means you can not use PHP functions, classes, etc. from one
    thread in another one.
    I'm not sure what you tried to do in your code, so hard to say what
    exactly went wrong there.
    Another caveat: while Zend Engine makes a lot of effort to keep the
    state localized and thus be thread-safe, not all libraries PHP is using
    do so, so running multithreaded PHP with these libraries may cause
    various trouble.
    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Sam Vilain at Jan 18, 2011 at 9:16 pm

    On 18/01/11 22:17, Stas Malyshev wrote:
    1) any hints or clues from people familiar with the Zend subsystems -
    such as memory management, and the various stacks, to provide hints as
    to how to set them up "correctly"
    Zend Engine keeps all state (including memory manager state, etc.)
    separate in each thread, which means once you've created a new thread
    it has to run initializations for the data structures. It should
    happen automatically when you build the engine in threaded mode
    (--enable-maintainer-zts).
    Yes, I expected the two functions - tsrm_new_interpreter() and
    init_executor() to do that, as it is the function called in
    php_request_startup() in main/main.c

    It seems to do a lot of the work, and as far as I could tell there is no
    TSRM function to reap an individual thread etc.

    There is also zend_startup() - which seems to do a bit more. If anyone
    knowledgeable would care to give or point to an overview, that would be
    very useful.
    You can not share any data between the engine threads - unless you
    communicate it through some channel external to the engine - and even
    in this case you should use a copy, never the original pointer.
    Sure, I'm expecting to have to pass in all data as deep copies as well
    as the return value from the function. This is useful for
    array_map-like functions. The parallel_for API, while it worked in the
    context of HipHop, is unlikely to work with Zend; there doesn't seem to
    be an interpreter under the sun which has successfully pulled off
    threading with shared data.

    Another possible application would be a parallel_include() type call,
    which would call a given PHP file for each member of an array (or a PDO
    result set), buffering the output from each, and inserting into the
    output stream in sequence once each fragment is done (hopefully
    interacting well with normal output buffering, if you didn't want the
    results sent yet). This would allow a large number of results to be
    rendered in parallel on multicore systems.
    This also means you can not use PHP functions, classes, etc. from one
    thread in another one.
    I hope it will be possible to share already compiled code between
    threads; this may mean disabling "eval" inside the thread or otherwise
    hobbling the compiler to avoid separate threads trying to modify the
    optree at once. If a shared optree cannot be achieved, then I guess it
    would have to go back to the APC, but it would be good to avoid
    overheads where possible to keep the thread startup cost low.

    Even extremely restricted parallelism can help speed up some types of
    work, so limitations I am happy to accept.
    I'm not sure what you tried to do in your code, so hard to say what
    exactly went wrong there.
    Another caveat: while Zend Engine makes a lot of effort to keep the
    state localized and thus be thread-safe, not all libraries PHP is
    using do so, so running multithreaded PHP with these libraries may
    cause various trouble.
    Yes, currently I am not looking at calling individual module startup
    functions to avoid this problem (and save time on thread startup). It
    seems that there is a facility for limiting the available functions
    visible to the created executor, too, which may make this easy to make
    "safe".

    Thanks for your feedback,
    Sam
  • Stefan Marr at Jan 18, 2011 at 9:51 pm
    Hi Sam:

    I am following the discussion very interested, but just a question for clarification:
    On 18 Jan 2011, at 22:16, Sam Vilain wrote:
    there doesn't seem to
    be an interpreter under the sun which has successfully pulled off
    threading with shared data.
    Could you explain what you mean with that statement?

    Sorry, but that's my topic, and the most well know interpreters that 'pulled off' threading with shared data are for Java. The interpreter I am working on is for manycore systems (running on a 64-core Tilera chip) and executes Smalltalk (https://github.com/smarr/RoarVM).

    Best regards
    Stefan


    --
    Stefan Marr
    Software Languages Lab
    Vrije Universiteit Brussel
    Pleinlaan 2 / B-1050 Brussels / Belgium
    http://soft.vub.ac.be/~smarr
    Phone: +32 2 629 2974
    Fax: +32 2 629 3525
  • Stas Malyshev at Jan 18, 2011 at 10:10 pm
    Hi!
    Sorry, but that's my topic, and the most well know interpreters that
    'pulled off' threading with shared data are for Java. The interpreter
    Given to what complications Java programmers should go to make their
    threaded code work, I have a lot of doubt that 95% of PHP users would be
    able to write correct threaded programs. Reasoning about threaded
    programs is very hard, and IMHO putting it into the beginners language
    would be a mistake.
    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Hannes Landeholm at Jan 18, 2011 at 10:36 pm
    Hello,

    I don't think a language becomes a "beginners language" just because many
    new programmers use it. And it's still not a good argument for not including
    new features.

    As long as the new thread doesn't share any memory/variables with the
    spawning context, no "reasoning" is required at all. It's when you start
    sharing objects that things get complex. Just a simple threading
    implementation with a strictly defined way to IPC would be very helpful.
    It's not super useful in web application programming as handling web
    requests is already packaged into small units of work.. web requests. So in
    that sense a web application is already "multi threaded". However it's
    interesting for CGI scripts. The other week I wrote a PHP CGI proxy for
    example. Because PHP didn't have threading, I had to bother with select
    polling.

    Hannes
    On 18 January 2011 23:10, Stas Malyshev wrote:

    Hi!


    Sorry, but that's my topic, and the most well know interpreters that
    'pulled off' threading with shared data are for Java. The interpreter
    Given to what complications Java programmers should go to make their
    threaded code work, I have a lot of doubt that 95% of PHP users would be
    able to write correct threaded programs. Reasoning about threaded programs
    is very hard, and IMHO putting it into the beginners language would be a
    mistake.

    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Ben Schmidt at Jan 19, 2011 at 2:54 am
    Strongly second this. PHP is not a toy language restricted to beginners. If it has
    advanced features, beginners simply don't need to use them.

    If anything, I would argue that PHP is a language unsuited to beginners (and other
    scripting languages), as it is so flexible it doesn't enforce good programming
    practice. Java is much more a 'beginner language' because it has much stricter
    syntax, type checking, exception handling, etc., which force and even teach people
    to program well in some regards (or at least do something to raise their awareness
    that they're programming sloppily!). Mind you, it's pretty easy to write bad code
    in any language....

    Ben.


    On 19/01/11 9:36 AM, Hannes Landeholm wrote:
    Hello,

    I don't think a language becomes a "beginners language" just because many
    new programmers use it. And it's still not a good argument for not including
    new features.

    As long as the new thread doesn't share any memory/variables with the
    spawning context, no "reasoning" is required at all. It's when you start
    sharing objects that things get complex. Just a simple threading
    implementation with a strictly defined way to IPC would be very helpful.
    It's not super useful in web application programming as handling web
    requests is already packaged into small units of work.. web requests. So in
    that sense a web application is already "multi threaded". However it's
    interesting for CGI scripts. The other week I wrote a PHP CGI proxy for
    example. Because PHP didn't have threading, I had to bother with select
    polling.

    Hannes

    On 18 January 2011 23:10, Stas Malyshevwrote:
    Hi!


    Sorry, but that's my topic, and the most well know interpreters that
    'pulled off' threading with shared data are for Java. The interpreter
    Given to what complications Java programmers should go to make their
    threaded code work, I have a lot of doubt that 95% of PHP users would be
    able to write correct threaded programs. Reasoning about threaded programs
    is very hard, and IMHO putting it into the beginners language would be a
    mistake.

    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Stas Malyshev at Jan 19, 2011 at 4:51 am
    Hi!
    If anything, I would argue that PHP is a language unsuited to beginners (and other
    scripting languages), as it is so flexible it doesn't enforce good programming
    practice. Java is much more a 'beginner language' because it has much stricter
    Contrary to popular belief, people usually don't start with programming
    to be taught good practices and become enlightened in the ways of Art.
    They usually start because they need their computers to do something for
    them. And scripting languages are often the easiest way to make that
    happen.
    Java, on the other hand, forces you to deal with exceptions, patterns,
    interfaces, generics, covariants and contravariants, locking, etc. which
    you neither want nor need to know, only because somebody somewhere
    decided that it's right for you.
    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Ben Schmidt at Jan 19, 2011 at 5:12 am

    On 19/01/11 3:51 PM, Stas Malyshev wrote:
    Hi!
    If anything, I would argue that PHP is a language unsuited to beginners (and other
    scripting languages), as it is so flexible it doesn't enforce good programming
    practice. Java is much more a 'beginner language' because it has much stricter
    Contrary to popular belief, people usually don't start with programming to be
    taught good practices and become enlightened in the ways of Art. They usually
    start because they need their computers to do something for them. And scripting
    languages are often the easiest way to make that happen.
    Java, on the other hand, forces you to deal with exceptions, patterns, interfaces,
    generics, covariants and contravariants, locking, etc. which you neither want nor
    need to know, only because somebody somewhere decided that it's right for you.
    Yeah, well, I was playing Devil's advocate and went a bit far (as you
    have too--arguing is fun, isn't it?).

    On a more serious note, I think what is much more helpful for beginners
    is a good teacher, or good materials to learn from. Almost any language
    can be a good one for beginners if taught well. Including PHP; indeed I
    have recommended and taught people PHP as beginners without much
    trouble. And Java.

    At any rate, the important thing is that beginners shouldn't hold a good
    language back, particularly if the innovations are not obligatory for
    them to use.

    Smiles,

    Ben.
  • Arnaud Le Blanc at Jan 22, 2011 at 10:18 pm
    Hi,

    Le mardi 18 janvier 2011 à 23:36 +0100, Hannes Landeholm a écrit :
    Just a simple threading
    implementation with a strictly defined way to IPC would be very helpful.
    If you just want to throw some executors and pass messages between them
    you can already fork processes with pcntl [1] and pass messages in a
    variety of ways with [2][3][4], or just plain files :-)

    This is often enough for speeding up batch scripts or creating some
    simple servers.

    [1] http://php.net/manual/en/function.pcntl-fork.php
    [2] http://php.net/manual/en/book.shmop.php
    [3] http://php.net/manual/en/book.sem.php
    [4] http://php.net/stream_socket_pair

    Best Regards,
    On 18 January 2011 23:10, Stas Malyshev wrote:

    Hi!


    Sorry, but that's my topic, and the most well know interpreters that
    'pulled off' threading with shared data are for Java. The interpreter
    Given to what complications Java programmers should go to make their
    threaded code work, I have a lot of doubt that 95% of PHP users would be
    able to write correct threaded programs. Reasoning about threaded programs
    is very hard, and IMHO putting it into the beginners language would be a
    mistake.

    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Sam Vilain at Jan 19, 2011 at 3:14 am

    On 19/01/11 10:50, Stefan Marr wrote:
    On 18 Jan 2011, at 22:16, Sam Vilain wrote:
    there doesn't seem to
    be an interpreter under the sun which has successfully pulled off
    threading with shared data.
    Could you explain what you mean with that statement?

    Sorry, but that's my topic, and the most well know interpreters that 'pulled off' threading with shared data are for Java. The interpreter I am working on is for manycore systems (running on a 64-core Tilera chip) and executes Smalltalk (https://github.com/smarr/RoarVM).
    You raise a very good point. My statement is too broad and should
    probably apply only to dynamic languages, executed on reference counted
    VMs. Look at some major ones - PHP, Python, Ruby, Perl, most JS engines
    - none of them actually thread properly. Well, Perl's "threading" does
    run full speed, but actually copies every variable on the heap for each
    new thread, massively bloating the process.

    So the question is why should this be so, if C++ and Java, even
    interpreted on a JVM, can do it?

    In general, Java's basic types typically correspond with types that can
    be dealt with atomically by processors, or are small enough to be passed
    by value. This already makes things a lot easier.

    I've had another reason for the differences explained to me. I'm not
    sure I understand it fully enough to be able to re-explain it, but I'll
    try anyway. As I grasped the concept, the key to making VMs fully
    threadable with shared state, is to first allow reference addresses to
    change, such as via generational garbage collection. This allows you to
    have much clearer "stack frames", perhaps even really stored on the
    thread-local/C stack, as opposed to most dynamic language interpreters
    which barely use the C stack at all. Then, when the long-lived objects
    are discovered at scope exit time they can be safely moved into the next
    memory pool, as well as letting access to "old" objects be locked (or
    copied, in the case of Software Transactional Memory). Access to
    objects in your own frame can therefore be fast, and the number of locks
    that have to be held reduced.

    Perhaps to support/refute this argument, in your JVM, how do you handle:

    - memory allocation: object references' timeline and garbage collection
    - call stack frames and/or return continuations - the C stack or the heap?
    - atomicity of functions (that's the "synchronized" keyword?)
    - timely object destruction

    I put it forward that the overall design of the interpreter, and
    therefore what is possible in terms of threading, is highly influenced
    by these factors.

    When threading in C or C++ for instance (and this includes HipHop-TBB),
    the call stack frame is on the C stack, so shared state is possible so
    long as you pass heap pointers around and synchronise appropriately.
    The "virtual" machine is of a different nature, and it can work. For
    JVMs, as far as I know references are temporary and again the nature of
    the execution environment is different.

    For VMs where there is basically nothing on the stack, and everything on
    the heap, it becomes a lot harder. To talk about a VM I know better,
    Perl has about 6 internal stacks all represented on the heap; a function
    call/return stack, a lexical scope stack to represent what is in scope,
    a variable stack (the "tmps" stack) for variables declared in those
    scopes and for timely destruction, a stack to implement local($var)
    called the "save" stack, a "mark" stack used for garbage collection, ok
    well only 5 but I think you get my point. From my reading of the PHP
    internals so far there are similar set there too, so comparisons are
    quite likely to be instructive. It's a bit hard figuring out everything
    that is going on internally (all these internal void* types don't help
    either), and whether or not there is some inherent property of reference
    counting, or whether it just makes a shared state model harder, is a
    question I'm not sure is easy to answer.

    In any case, full shared state is not required for a large set of useful
    parallelism APIs, and in fact contains a number of pitfalls which are
    difficult to explain, debug and fix. I'm far more interested in simple
    acceleration of tight loops - to make use of otherwise idle CPU cores
    (perhaps virtual as in hyperthreading) to increase throughput - and APIs
    like "map" express this well. The idea is that the executor can start
    up with no variables in scope, though hopefully shared code segments,
    call some function on the data it is passed in, and pass the answers
    back to the main thread and then set about cleaning itself up.

    Sam
  • Stas Malyshev at Jan 19, 2011 at 5:00 am
    Hi!
    like "map" express this well. The idea is that the executor can start
    up with no variables in scope, though hopefully shared code segments,
    For that you would probably need to put some severe restrictions on your
    code, such as:

    1. No usage of default properties or statics in classes or functions.
    2. No assigning of constants to any variable (comparison and operators
    may be ok, not sure how refcounts work out)
    3. No defining new functions or classes or including new files

    This probably could still do something useful - such as run 3 sql
    queries in parallel and return the result - but I'm not sure how you
    could enforce such conditions... If you do not, you'll have some
    "interesting" race conditions leading to variables disappearing,
    leaking, being assigned wrong values, etc.
    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Sam Vilain at Jan 19, 2011 at 11:40 am

    On 19/01/11 16:14, Sam Vilain wrote:
    In general, Java's basic types typically correspond with types that can
    be dealt with atomically by processors, or are small enough to be passed
    by value. This already makes things a lot easier.

    I've had another reason for the differences explained to me. I'm not
    sure I understand it fully enough to be able to re-explain it, but I'll
    try anyway. As I grasped the concept, the key to making VMs fully
    threadable with shared state, is to first allow reference addresses to
    change, such as via generational garbage collection. This allows you to
    have much clearer "stack frames", perhaps even really stored on the
    thread-local/C stack, as opposed to most dynamic language interpreters
    which barely use the C stack at all. Then, when the long-lived objects
    are discovered at scope exit time they can be safely moved into the next
    memory pool, as well as letting access to "old" objects be locked (or
    copied, in the case of Software Transactional Memory). Access to
    objects in your own frame can therefore be fast, and the number of locks
    that have to be held reduced.
    Ref:
    http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33308
    and to a lesser extent, the note on
    http://java.sun.com/docs/books/jvms/second_edition/html/Threads.doc.html#22244
    Perhaps to support/refute this argument, in your JVM, how do you handle:

    - memory allocation: object references' timeline and garbage collection
    - call stack frames and/or return continuations - the C stack or the heap?
    - atomicity of functions (that's the "synchronized" keyword?)
    - timely object destruction

    put it forward that the overall design of the interpreter, and
    therefore what is possible in terms of threading, is highly influenced
    by these factors.

    When threading in C or C++ for instance (and this includes HipHop-TBB),
    the call stack frame is on the C stack, so shared state is possible so
    long as you pass heap pointers around and synchronise appropriately.
    The "virtual" machine is of a different nature, and it can work. For
    JVMs, as far as I know references are temporary and again the nature of
    the execution environment is different.

    For VMs where there is basically nothing on the stack, and everything on
    the heap, it becomes a lot harder. To talk about a VM I know better,
    Perl has about 6 internal stacks all represented on the heap; a function
    call/return stack, a lexical scope stack to represent what is in scope,
    a variable stack (the "tmps" stack) for variables declared in those
    scopes and for timely destruction, a stack to implement local($var)
    called the "save" stack, a "mark" stack used for garbage collection, ok
    well only 5 but I think you get my point. From my reading of the PHP
    internals so far there are similar set there too, so comparisons are
    quite likely to be instructive. It's a bit hard figuring out everything
    that is going on internally (all these internal void* types don't help
    either), and whether or not there is some inherent property of reference
    counting, or whether it just makes a shared state model harder, is a
    question I'm not sure is easy to answer
    Based on https://github.com/smarr/RoarVM/blob/98caf11d0/README.rst it
    can be seen that indeed it is a completely different architecture. From
    the first of the ACM papers' abstract:

    In addition to the cost of inter-core communication, two hardware
    characteristics influenced our design: the absence of hardware-provided
    cache-coherence, and the inability to move a single object from one
    core's cache to another's without changing its address.
    In any case, full shared state is not required for a large set of useful
    parallelism APIs, and in fact contains a number of pitfalls which are
    difficult to explain, debug and fix. I'm far more interested in simple
    acceleration of tight loops - to make use of otherwise idle CPU cores
    (perhaps virtual as in hyperthreading) to increase throughput - and APIs
    like "map" express this well. The idea is that the executor can start
    up with no variables in scope, though hopefully shared code segments,
    call some function on the data it is passed in, and pass the answers
    back to the main thread and then set about cleaning itself up.
    You could probably support this with any paper on Erlang ;-)

    Sam
  • Martin Scotta at Jan 19, 2011 at 3:42 pm
    I think the point is that the php language itself does not provide solid
    construct for writing rock-solid code. Yes, there are many
    programmers/hackers that can, but the effort they put is huge.

    it's so easy to break well-written bug-free code, that's impossible for
    developers to share libraries, and even those who share has the problems
    that the language does not provides the language construct for the system to
    evolve without breaking its clients code.

    As you were speaking about Java, we must learn from Java experience. All
    that non-sense stuff that it imposes is the same stuff that provide to Java
    developers to share their libraries. All you need to do is put the .jar in
    your classpath, and that's it.

    In Java you are free to extend a class --yours or imported-- without worries
    about it's internal implementation. Is that possible in PHP? nope.
    __construct breaks that.

    So instead of hacking the language, why don't we start by adding better
    language constructs.
    Look at the foreach statement and the Iterators, that is a really good
    example of a well-designed language construct.

    I'm really interested on threads for PHP, but as a language construct.
    Threads are not easy, even the most experienced programmer could not get it
    right from the scratch.

    IMHO, as a simple PHP programmer, the language should provide the simplest
    language construct and the engine should handle all the complexity under the
    hood.

    Martin Scotta

    On Wed, Jan 19, 2011 at 8:40 AM, Sam Vilain wrote:
    On 19/01/11 16:14, Sam Vilain wrote:
    In general, Java's basic types typically correspond with types that can
    be dealt with atomically by processors, or are small enough to be passed
    by value. This already makes things a lot easier.

    I've had another reason for the differences explained to me. I'm not
    sure I understand it fully enough to be able to re-explain it, but I'll
    try anyway. As I grasped the concept, the key to making VMs fully
    threadable with shared state, is to first allow reference addresses to
    change, such as via generational garbage collection. This allows you to
    have much clearer "stack frames", perhaps even really stored on the
    thread-local/C stack, as opposed to most dynamic language interpreters
    which barely use the C stack at all. Then, when the long-lived objects
    are discovered at scope exit time they can be safely moved into the next
    memory pool, as well as letting access to "old" objects be locked (or
    copied, in the case of Software Transactional Memory). Access to
    objects in your own frame can therefore be fast, and the number of locks
    that have to be held reduced.
    Ref:

    http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#33308
    and to a lesser extent, the note on

    http://java.sun.com/docs/books/jvms/second_edition/html/Threads.doc.html#22244
    Perhaps to support/refute this argument, in your JVM, how do you handle:

    - memory allocation: object references' timeline and garbage collection
    - call stack frames and/or return continuations - the C stack or the heap?
    - atomicity of functions (that's the "synchronized" keyword?)
    - timely object destruction

    put it forward that the overall design of the interpreter, and
    therefore what is possible in terms of threading, is highly influenced
    by these factors.

    When threading in C or C++ for instance (and this includes HipHop-TBB),
    the call stack frame is on the C stack, so shared state is possible so
    long as you pass heap pointers around and synchronise appropriately.
    The "virtual" machine is of a different nature, and it can work. For
    JVMs, as far as I know references are temporary and again the nature of
    the execution environment is different.

    For VMs where there is basically nothing on the stack, and everything on
    the heap, it becomes a lot harder. To talk about a VM I know better,
    Perl has about 6 internal stacks all represented on the heap; a function
    call/return stack, a lexical scope stack to represent what is in scope,
    a variable stack (the "tmps" stack) for variables declared in those
    scopes and for timely destruction, a stack to implement local($var)
    called the "save" stack, a "mark" stack used for garbage collection, ok
    well only 5 but I think you get my point. From my reading of the PHP
    internals so far there are similar set there too, so comparisons are
    quite likely to be instructive. It's a bit hard figuring out everything
    that is going on internally (all these internal void* types don't help
    either), and whether or not there is some inherent property of reference
    counting, or whether it just makes a shared state model harder, is a
    question I'm not sure is easy to answer
    Based on https://github.com/smarr/RoarVM/blob/98caf11d0/README.rst it
    can be seen that indeed it is a completely different architecture. From
    the first of the ACM papers' abstract:

    In addition to the cost of inter-core communication, two hardware
    characteristics influenced our design: the absence of hardware-provided
    cache-coherence, and the inability to move a single object from one
    core's cache to another's without changing its address.
    In any case, full shared state is not required for a large set of useful
    parallelism APIs, and in fact contains a number of pitfalls which are
    difficult to explain, debug and fix. I'm far more interested in simple
    acceleration of tight loops - to make use of otherwise idle CPU cores
    (perhaps virtual as in hyperthreading) to increase throughput - and APIs
    like "map" express this well. The idea is that the executor can start
    up with no variables in scope, though hopefully shared code segments,
    call some function on the data it is passed in, and pass the answers
    back to the main thread and then set about cleaning itself up.
    You could probably support this with any paper on Erlang ;-)

    Sam

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
  • Pierre Joye at Jan 19, 2011 at 3:50 pm
    hi,
    On Wed, Jan 19, 2011 at 4:41 PM, Martin Scotta wrote:
    I think the point is that the php language itself does not provide solid
    construct for writing rock-solid code. Yes, there are many
    programmers/hackers that can, but the effort they put is huge.
    Care to enlighten me and tell me what is missing to allow one to write
    rock-solid code?
    it's so easy to break well-written bug-free code, that's impossible for
    developers to share libraries, and even those who share has the problems
    that the language does not provides the language construct for the system to
    evolve without breaking its clients code.
    I think that most of PHP is actually thread safe. And almost all
    libraries are now either thread safe or used in a way that makes them
    thread safe.

    Now, about making the engine itself and the userland scripts able to
    implement parallelized functions for multi-core architecture (which is
    very disputable in a web environment, btw), that's a totally different
    topic and I don't think it is worth the effort.

    I'm really interested on threads for PHP, but as a language construct.
    Threads are not easy, even the most experienced programmer could not get it
    right from the scratch.
    Most of the time what PHP needs are non blocking operations, not
    necessary multi threaded operations. That's what some of the newly
    implemented features do (like in mysqlnd, to fetch the data).
    IMHO, as a simple PHP programmer, the language should provide the simplest
    language construct and the engine should handle all the complexity under the
    hood.
    Honestly if a given part of an application needs something along this
    line for performance reasons, then doing that on the same box where
    the request is executed may be a bad idea. Tools like gearman will do
    a far better jobs and will let you do resource intensive processing on
    other machines where cores may not be already busy serving other
    requests.

    my 2 cents based on my experiences and benches in this area,

    Cheers,
  • Rasmus Lerdorf at Jan 19, 2011 at 5:14 pm

    On 1/19/11 7:50 AM, Pierre Joye wrote:
    Honestly if a given part of an application needs something along this
    line for performance reasons, then doing that on the same box where
    the request is executed may be a bad idea. Tools like gearman will do
    a far better jobs and will let you do resource intensive processing on
    other machines where cores may not be already busy serving other
    requests.

    my 2 cents based on my experiences and benches in this area,
    In real-world situations this is what I see as well. People either want
    to parallelize operations like fetching data from multiple URLs at once,
    where they think they need threading, but actually just need to learn
    the async calls, or they want to background something that takes a while
    to finish. This second case is much better handled by a separate job
    manager like Gearman.

    One example I have written is a rule engine that calculates a trust
    score for a financial transaction. The rules can get a bit complicated
    so it isn't something I want to have the web request wait on. Using the
    Kohana framework the call to kick off the rule engine looks like this:

    Gearman::doBackground('kohana', "gearman/payment_score/{$payment->id}")

    And I have a 'kohana' gearman worker that loads the entire framework
    which means my actual worker code is just another controller that looks
    exactly like my Web code. Any controller can be backgrounded that way
    with the added advantage that I can distribute these backgrounded jobs
    to a pool of worker servers that are separate from my frontend web
    servers, but they all run the same code stack. To me this is a much
    more flexible way to solve the problem that having to write
    thread-management code in my Web code and have my already overloaded web
    servers take on more work.

    -Rasmus
  • Stas Malyshev at Jan 19, 2011 at 7:58 pm
    Hi!
    I think the point is that the php language itself does not provide solid
    construct for writing rock-solid code. Yes, there are many
    programmers/hackers that can, but the effort they put is huge.
    I think this is completely untrue.
    In Java you are free to extend a class --yours or imported-- without worries
    about it's internal implementation. Is that possible in PHP? nope.
    __construct breaks that.
    Could you please explain what you mean? How __construct breaks extending
    a class?
    IMHO, as a simple PHP programmer, the language should provide the simplest
    language construct and the engine should handle all the complexity under the
    hood.
    I see no way of hiding threads complexity "under the hood" - if you want
    threads, you'll need to deal with synchronization, locking, race
    conditions, etc. Do you see any way to avoid it?
    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Martin Scotta at Jan 20, 2011 at 12:06 am
    Many PHP features should be language constructs, but they were made as
    language hacks.

    __construct is evil, as like any other language hack

    It does not provides a safe fundation to build safe abstractions, reusable
    and extendibles components, which leads to the lack of PHP libraries.

    Let's suppose there is a library that provides an utility class, which has
    no super class nor constructor.

    // lives in library.phar
    class Utility { }

    A client uses this class by extending it

    // includes library.phar
    class Client extends Utility {
    function __construct() {
    // client initialization code here
    }
    }

    At that point the Utility class can not add __construct safely, and if it
    does Client will break it, it's not calling the constructor.

    but what happen if Utility provides a __constructor
    class Utility {
    function __construct() {
    // Utility initialization here
    }
    }

    class Client {
    function __construct() {
    // some code
    parent::__construct(); // as good client call the super class
    // and then more code
    }
    }

    In this case the Utility is forced to keep the __construct, if it's removed
    the Client call will fail as parent::__construct will not exists.

    In both cases there were no API changes, only the way the objects are
    initializated was what changed.

    My point is that the language does not provide solid fundations (aka
    language constructs) for systems and libraries to evolve in a safe way.

    Martin Scotta

    On Wed, Jan 19, 2011 at 4:58 PM, Stas Malyshev wrote:

    Hi!


    I think the point is that the php language itself does not provide solid
    construct for writing rock-solid code. Yes, there are many
    programmers/hackers that can, but the effort they put is huge.
    I think this is completely untrue.


    In Java you are free to extend a class --yours or imported-- without
    worries
    about it's internal implementation. Is that possible in PHP? nope.
    __construct breaks that.
    Could you please explain what you mean? How __construct breaks extending a
    class?


    IMHO, as a simple PHP programmer, the language should provide the simplest
    language construct and the engine should handle all the complexity under
    the
    hood.
    I see no way of hiding threads complexity "under the hood" - if you want
    threads, you'll need to deal with synchronization, locking, race conditions,
    etc. Do you see any way to avoid it?

    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Stas Malyshev at Jan 20, 2011 at 12:19 am
    Hi!
    Many PHP features should be language constructs, but they were made as
    language hacks.

    __construct is evil, as like any other language hack
    Constructors are standard feature in many languages. There's nothing
    evil in them.
    class Client {
    function __construct() {
    // some code
    parent::__construct(); // as good client call the super class
    // and then more code
    }
    }
    Arguably, initialization is the part of the API, but I see your point -
    it might be useful to supply all objects with empty default ctor so that
    parent::__construct() always works. Submit a feature request to
    bugs.php.net.
    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Martin Scotta at Jan 20, 2011 at 2:25 pm
    Martin Scotta

    On Wed, Jan 19, 2011 at 9:19 PM, Stas Malyshev wrote:

    Hi!


    Many PHP features should be language constructs, but they were made as
    language hacks.

    __construct is evil, as like any other language hack
    Constructors are standard feature in many languages. There's nothing evil
    in them.


    class Client {
    function __construct() {
    // some code
    parent::__construct(); // as good client call the super class
    // and then more code
    }
    }
    Arguably, initialization is the part of the API, but I see your point - it
    might be useful to supply all objects with empty default ctor so that
    parent::__construct() always works. Submit a feature request to
    bugs.php.net.
    and what what happen if the extending class does not call
    parent::__construct() ?
    __construct is just like any other function, but with semantic added on top
    of.

    Changing the way it behaves will cause many headaches

    ---
    BTW, Did you noted that "self" keyword is allowed as method name? so I
    belive it is not a keyword at all

    class Foo{
    function self() {
    echo __METHOD__, PHP_EOL;
    }
    }

    $foo = new Foo();
    $foo->self();

    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • David Muir at Jan 21, 2011 at 4:47 am

    On 20/01/11 23:25, Martin Scotta wrote:
    and what what happen if the extending class does not call
    parent::__construct() ?
    __construct is just like any other function, but with semantic added on top
    of.

    Changing the way it behaves will cause many headaches

    ---
    BTW, Did you noted that "self" keyword is allowed as method name? so I
    belive it is not a keyword at all

    class Foo{
    function self() {
    echo __METHOD__, PHP_EOL;
    }
    }

    $foo = new Foo();
    $foo->self();
    'self' is not a keyword, but a special predefined class:
    http://www.php.net/manual/en/reserved.classes.php

    'parent' also works as a method name for the same reason.
    'static' does not work, as it's a keyword.

    Cheers,
    David
  • Pierre Joye at Jan 21, 2011 at 10:16 am

    On Thu, Jan 20, 2011 at 3:25 PM, Martin Scotta wrote:

    and what what happen if the extending class does not call
    parent::__construct() ?
    __construct is just like any other function, but with semantic added on top
    of.

    Changing the way it behaves will cause many headaches
    What does that have to do with the topic of this thread? Also the
    __construct behavior and how/why one should call the parent
    __construct is also well documented. Anyway, that does not prevent
    anyone to write rock solid code.


    Cheers,
  • Stefan Marr at Jan 19, 2011 at 11:01 pm
    Hi Sam:

    (becomes off-topic here, but for the sake of argument)
    On 19 Jan 2011, at 04:14, Sam Vilain wrote:
    On 19/01/11 10:50, Stefan Marr wrote:
    On 18 Jan 2011, at 22:16, Sam Vilain wrote:
    there doesn't seem to
    be an interpreter under the sun which has successfully pulled off
    threading with shared data.
    Could you explain what you mean with that statement?

    Sorry, but that's my topic, and the most well know interpreters that 'pulled off' threading with shared data are for Java. The interpreter I am working on is for manycore systems (running on a 64-core Tilera chip) and executes Smalltalk (https://github.com/smarr/RoarVM).
    You raise a very good point. My statement is too broad and should
    probably apply only to dynamic languages, executed on reference counted
    VMs. Look at some major ones - PHP, Python, Ruby, Perl, most JS engines
    - none of them actually thread properly.
    Ok, but the reason here is that building such VMs is inherently complex.
    And it has nothing to do with dynamic or not, with typed or what ever.
    The mentioned languages happen to be very successful in the domain of web applications, and as others already mentioned, the need for fine-grained shared-memory parallelism here is not clear. So, why don't we have Python without the GIL? Because nobody cared enough. However, there is still JRuby...
    Well, Perl's "threading" does
    run full speed, but actually copies every variable on the heap for each
    new thread, massively bloating the process.
    Cutting corners is the only way, if you do not have a great team of engineers.
    For the RoarVM we also have to cut more corners than we would like.
    So the question is why should this be so, if C++ and Java, even
    interpreted on a JVM, can do it?
    JVMs suffer from the same complexity. And C++, well, last time I checked there is just no threading model.
    There will be a memory model in C++0x, but there is nothing which makes it inherently hard to implement.
    Since you don't get any guarantees (beside the memory model semantics) and you don't have any GC either.
    In general, Java's basic types typically correspond with types that can
    be dealt with atomically by processors, or are small enough to be passed
    by value. This already makes things a lot easier.
    I don't think that buys you anything. Which basic types can be pass by copy?
    Ints, and bools perhaps. That takes a bit pressure from the GC, but does not really help with making things safe. Smalltalk does not know basic types. However, it knows an implementation technique called tagged pointers/tagged integers. This allows you to have 31-bit integers since pointer are aligned and do not need all bits. However, that really helps only with GC pressure.
    I've had another reason for the differences explained to me. I'm not
    sure I understand it fully enough to be able to re-explain it, but I'll
    try anyway. As I grasped the concept, the key to making VMs fully
    threadable with shared state, is to first allow reference addresses to
    change, such as via generational garbage collection.
    Hm, there is usually the wish that you can run your GC threads in parallel with mutator threads, here it is indeed helpful to support moving GCs. But how does it help with threads working in parallel on some shared object? Any point were an object is allowed to move requires synchronization. So, either someone has to change the pointer you own to that object, or you need an additional level of indirection.

    I guess you are talking here about having such an additional indirection, object handles?
    This allows you to
    have much clearer "stack frames", perhaps even really stored on the
    thread-local/C stack, as opposed to most dynamic language interpreters
    which barely use the C stack at all.
    Why does having object handles give you a better stack frame layout?
    Using the C stack can be helpful for performance, well, makes other languages features harder to implement.
    For instance what about closures?
    Other techniques like recycling you stack-frame-objects is usually a simpler optimization without making it harder to stuff like closures.

    Then, when the long-lived objects
    are discovered at scope exit time they can be safely moved into the next
    memory pool,
    Ui ui ui. Slooow. I don't follow. Ok, there are things like escape analysis.
    And then there are techniques like on-stack-allocation. Both usually done in JIT compilers, not so much in interpreters. Are we still talking about interpreters?
    Or are you implying a incremental GC that is triggered on the return of method calls?

    as well as letting access to "old" objects be locked (or
    copied, in the case of Software Transactional Memory).
    There are to many things here discussed in a single sentence. Sorry, I am lost.
    Access to
    objects in your own frame can therefore be fast, and the number of locks
    that have to be held reduced.
    Ok, on-stack-allocation and biased locking?
    - memory allocation: object references' timeline and garbage collection
    - call stack frames and/or return continuations - the C stack or the heap?
    - atomicity of functions (that's the "synchronized" keyword?)
    - timely object destruction

    I put it forward that the overall design of the interpreter, and
    therefore what is possible in terms of threading, is highly influenced
    by these factors.
    Sure, but neither the fact that it is implemented in an interpreter is highly relevant (it is in terms of performance, and whether some of these techniques are actually relevant (overhead vs. performance benefit)) nor whether the language is dynamic or not.

    When threading in C or C++ for instance (and this includes HipHop-TBB),
    the call stack frame is on the C stack, so shared state is possible so
    long as you pass heap pointers around and synchronise appropriately.
    Nobody prevents you from handing out a pointer to a stack object in C/C++.
    Nobody prevents you from not synchronizing properly.
    How is that related to the implementation of an interpreter? If you are satisfied with those kind of guarantees than it is pretty easy to implement such an interpreter, no? (minus the GC question of course)

    The "virtual" machine is of a different nature, and it can work. For
    JVMs, as far as I know references are temporary and again the nature of
    the execution environment is different.
    References are temporary?

    For VMs where there is basically nothing on the stack, and everything on
    the heap, it becomes a lot harder.
    What exactly becomes harder?

    To talk about a VM I know better,
    Perl has about 6 internal stacks all represented on the heap; a function
    call/return stack, a lexical scope stack to represent what is in scope,
    a variable stack (the "tmps" stack) for variables declared in those
    scopes and for timely destruction, a stack to implement local($var)
    called the "save" stack, a "mark" stack used for garbage collection, ok
    well only 5 but I think you get my point. From my reading of the PHP
    internals so far there are similar set there too, so comparisons are
    quite likely to be instructive. It's a bit hard figuring out everything
    that is going on internally (all these internal void* types don't help
    either), and whether or not there is some inherent property of reference
    counting, or whether it just makes a shared state model harder, is a
    question I'm not sure is easy to answer.
    GC is always a problem as you already pointed out earlier.
    But the fact that reference counting is used should actually make the model more simple.


    Anyway, back to the original question. I think it is by now: What are the fundamentally hard things when it comes to implement threaded shared-memory VMs? Well, and the answer is: GC.
    If you don't have GC and you don't give any fancy guarantees then you only have to care about following what your memory model promises, and probably restrict yourself to insert a memory fence here and there, no?


    Best regards
    Stefan


    --
    Stefan Marr
    Software Languages Lab
    Vrije Universiteit Brussel
    Pleinlaan 2 / B-1050 Brussels / Belgium
    http://soft.vub.ac.be/~smarr
    Phone: +32 2 629 2974
    Fax: +32 2 629 3525
  • Stas Malyshev at Jan 19, 2011 at 4:47 am
    Hi!
    Yes, I expected the two functions - tsrm_new_interpreter() and
    init_executor() to do that, as it is the function called in
    php_request_startup() in main/main.c
    As far as I remember, you need to run the whole request startup for the
    the thread, otherwise there will be unitilialized pieces. TSRM magic
    will create needed per-thread structures and call ctors, but ctors
    usually just null out stuff, you'd still need to fill it in.
    Another possible application would be a parallel_include() type call,
    which would call a given PHP file for each member of an array (or a PDO
    result set), buffering the output from each, and inserting into the
    output stream in sequence once each fragment is done (hopefully
    interacting well with normal output buffering, if you didn't want the
    results sent yet). This would allow a large number of results to be
    rendered in parallel on multicore systems.
    That's what webservers do already, don't they? :)
    I hope it will be possible to share already compiled code between
    threads; this may mean disabling "eval" inside the thread or otherwise
    The main problems you will be facing are the following:

    1. All ZE structures are per-thread. This means using one thread's
    structures in another will be non-trivial task, as all code assumes that
    current thread's structures are used.

    2. Even if you manage to hack around it by always passing the tsrm_ls
    pointers, etc. - memory managers are per-thread too. Which means you
    will be using data in one thread that is controlled by MM residing in
    another thread. Without locking.

    3. You may think this is not very bad, since you'll be using stuff
    that's quite static, like classes and functions - they don't get
    deallocated inside request, so who cares which MM uses them? However,
    while classes themselves don't, structures containing them - hashtables
    - can change, be rebuilt, etc. and if it happens in a wrong moment,
    you're in trouble.

    4. Next problem with using classes/functions is that they can contain
    variables - zvals, as default properties, static variables, etc. Since
    ZF is refcounting, these zvals may be modified by anybody who uses these
    variables - even just for reading. Again, no locking. Which, again,
    means trouble.

    5. Then come resources and module globals. Imagine some function touches
    in some way some resource - connection, file, etc. - that another thread
    is using at the same time, without locking? Modules generally assume
    resources belong to their respective threads, so you'll need to run
    module initializations for each thread separately.
    hobbling the compiler to avoid separate threads trying to modify the
    optree at once. If a shared optree cannot be achieved, then I guess it
    would have to go back to the APC, but it would be good to avoid
    overheads where possible to keep the thread startup cost low.
    Because of the things described above, it will be very challenging to
    avoid those startup costs.
    Even extremely restricted parallelism can help speed up some types of
    work, so limitations I am happy to accept.
    If you restrict it to using only copied data and never running any PHP
    code, it might work. Alternatively, you might launch independent engine
    instances that don't share structures and have them communicate, like
    Erlang does. Though, unlike Erlang, PHP engine would not help you much
    in this, I'm afraid.

    --
    Stanislav Malyshev, Software Architect
    SugarCRM: http://www.sugarcrm.com/
    (408)454-6900 ext. 227
  • Ángel González at Jan 19, 2011 at 9:15 pm
    Have you taken a look at Runkit_Sandbox? It may provide useful tips.
  • Sam Vilain at Jan 19, 2011 at 10:11 pm

    On 20/01/11 10:17, Ángel González wrote:
    Have you taken a look at Runkit_Sandbox? It may provide useful tips.
    *headdesk*

    No, I hadn't seen that. Thanks for pointing this out, it looks like
    exactly what I was trying to reinvent...

    Cheers,
    Sam.
  • Ángel González at Jan 19, 2011 at 10:25 pm

    On 19/01/11 23:10, Sam Vilain wrote:
    On 20/01/11 10:17, Ángel González wrote:
    Have you taken a look at Runkit_Sandbox? It may provide useful tips.
    *headdesk*

    No, I hadn't seen that. Thanks for pointing this out, it looks like
    exactly what I was trying to reinvent...

    Cheers,
    Sam.
    You may need to patch it to work on 5.3 as-is. Patches at its bugzilla
    are your friend.

    Dmitry Zenovich was going to take care of maintaining it, but I don't know if he
    finally got his account or not.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedJan 18, '11 at 4:21a
activeJan 22, '11 at 10:18p
posts28
users11
websitephp.net

People

Translate

site design / logo © 2022 Grokbase