FAQ
Dear Experts,

Is there any hope of a parallel processing toolkit being incorporated into
the python standard library? I've seen a wide variety of toolkits each with
various features and limitations. Unfortunately, each has its own API. For
coarse-grained parallelism, I suspect I'd be pretty happy with many of the
existing toolkits, but if I'm going to pick one API to learn and program to,
I'd rather pick one that I'm confident is going to be supported for a while.

So is there any hope of adoption of a parallel processing system into the
python standard library? If not, is there any hope of something like the
db-api for coarse grained parallelism (i.e, a common API that different
toolkits can support)?

Thanks,
-Emin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-list/attachments/20071227/c98fef46/attachment.htm

Search Discussions

  • Robert Kern at Dec 27, 2007 at 9:13 pm

    Emin.shopper Martinian.shopper wrote:
    Dear Experts,

    Is there any hope of a parallel processing toolkit being incorporated
    into the python standard library? I've seen a wide variety of toolkits
    each with various features and limitations. Unfortunately, each has its
    own API. For coarse-grained parallelism, I suspect I'd be pretty happy
    with many of the existing toolkits, but if I'm going to pick one API to
    learn and program to, I'd rather pick one that I'm confident is going to
    be supported for a while.

    So is there any hope of adoption of a parallel processing system into
    the python standard library? If not, is there any hope of something like
    the db-api for coarse grained parallelism (i.e, a common API that
    different toolkits can support)?
    The problem is that for SQL databases, there is a substantial API that they can
    all share. The implementations are primarily differentiated by other factors
    like speed, in-memory or on-disk, embedded or server, the flavor of SQL, etc.
    and only secondarily differentiated by their extensions to the DB-API. With
    parallel processing, the API itself is a key differentiator between toolkits and
    approaches. Different problems require different APIs, not just different
    implementations.

    I suspect that one of the smaller implementations like processing.py might get
    adopted into the standard library if the author decides to push for it. The ones
    I am thinking of are relatively new, so I imagine that it might take a couple of
    years of vigorous use by the community before it gets into the standard library.

    My recommendation to you is to pick one of the smaller implementations that
    solves the problems in front of you. Read and understand that module so you
    could maintain it yourself if you had to. Post to this list about how you use
    it. Blog about it if you blog. Write some Python Cookbook recipes to show how
    you solve problems with it. If there is a lively community around it, that will
    help it get into the standard library. Things get into the standard library
    *because* they are supported, not the other way around.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Emin.shopper Martinian.shopper at Dec 28, 2007 at 3:23 pm

    On Dec 27, 2007 4:13 PM, Robert Kern wrote:

    Emin.shopper Martinian.shopper wrote:
    If not, is there any hope of something like
    the db-api for coarse grained parallelism (i.e, a common API that
    different toolkits can support)?
    The problem is that for SQL databases, there is a substantial API that
    they can
    all share. The implementations are primarily differentiated by other
    factors
    like speed, in-memory or on-disk, embedded or server, the flavor of SQL,
    etc.
    and only secondarily differentiated by their extensions to the DB-API.
    With
    parallel processing, the API itself is a key differentiator between
    toolkits and
    approaches. Different problems require different APIs, not just different
    implementations.

    I disagree. Most of the implementations of coarse-grained parallelism I have
    seen and used share many features. For example, they generally have a notion
    of spawning processes/tasks, scheduling/load-balancing, checking tasks on a
    server, sending messages to/from tasks, detecting when tasks finish or die,
    logging the results for debugging purposes, etc. Sure they all do these
    things in slightly different ways, but for coarse-grained parallelism the
    API difference rarely matter (although the implementation differences can
    matter).

    I suspect that one of the smaller implementations like processing.py might
    get
    adopted into the standard library if the author decides to push for it.

    That would be great.

    My recommendation to you is to pick one of the smaller implementations that
    solves the problems in front of you. Read and understand that module so
    you
    could maintain it yourself if you had to. Post to this list about how you
    use
    it. Blog about it if you blog. Write some Python Cookbook recipes to show
    how
    you solve problems with it.

    That is a good suggestion, but for most of the coarse grained parallelism
    tasks I've worked on it would be easier to roll my own system than do that.
    To put it another way, why spend the effort to use a particular API if I
    don't know its going to be around for a while? Since a lot of the value is
    in the API as opposed to the implementation, unless there is something
    special about the API (e.g., it is an official or at least de factor
    standard) the learning curve may not be worth it.

    If there is a lively community around it, that will
    help it get into the standard library. Things get into the standard
    library
    *because* they are supported, not the other way around.

    You make a good point and in general I would agree with you. Isn't it
    possible, however, that there are cases where inclusion in the standard
    library would build a better community? I think this is the argument for
    many types of standards. A good example is wireless networking. The
    development of a standard like 802.11 provided hardware manufacturers the
    incentive to build devices that could communicate with each other and that
    made people want to buy the products.

    Still, I take your basic point to heart: if I want a good API, I should get
    off my but and contribute to it somehow.

    How would you or the rest of the community react to a proposal for a generic
    parallelism API? I suspect the response would be "show us an implementation
    of the code". I could whip up an implementation or adapt one of the existing
    systems, but then I worry that the discussion would devolve into an argument
    about the pros and cons of the particular implementation instead of the API.
    Even worse, it might devolve into an argument of the value of fine-grained
    vs. coarse-grained parallelism or the GIL. Considering that these issues
    seem to have been discussed quite a bit already and there are already
    multiple parallel processing implementations, it seems like the way forward
    lies in either a blessing of a particular package that already exists or
    adoption of an API instead of a particular implementation.

    Thanks for your thoughts,
    -Emin
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://mail.python.org/pipermail/python-list/attachments/20071228/f385cab8/attachment.htm
  • Robert Kern at Dec 29, 2007 at 6:32 am

    Emin.shopper Martinian.shopper wrote:
    On Dec 27, 2007 4:13 PM, Robert Kern <robert.kern at gmail.com
    wrote:
    My recommendation to you is to pick one of the smaller
    implementations that
    solves the problems in front of you. Read and understand that module
    so you
    could maintain it yourself if you had to. Post to this list about
    how you use
    it. Blog about it if you blog. Write some Python Cookbook recipes to
    show how
    you solve problems with it.


    That is a good suggestion, but for most of the coarse grained
    parallelism tasks I've worked on it would be easier to roll my own
    system than do that. To put it another way, why spend the effort to use
    a particular API if I don't know its going to be around for a while?
    Since a lot of the value is in the API as opposed to the implementation,
    unless there is something special about the API ( e.g., it is an
    official or at least de factor standard) the learning curve may not be
    worth it.
    And you think that you will encounter no learning curve writing your own code?
    At least take the opportunity to see how other people have solved your problem.
    Some of the implementations floating around now fit into one module. Surely, it
    would take less time to understand one of them than write your own. And let's
    not forget testing your module. The initial writing is never the timesink; it's
    the testing!
    If there is a lively community around it, that will
    help it get into the standard library. Things get into the standard
    library
    *because* they are supported, not the other way around.


    You make a good point and in general I would agree with you. Isn't it
    possible, however, that there are cases where inclusion in the standard
    library would build a better community?
    Not inclusion by itself, no. The standard library's APIs are only as supported
    as there exists people willing to support them. *Their being in the standard
    library does not create people out of thin air*. That's why the python-dev team
    now have a hard requirement that new contributions must come with a guarantee of
    support. Asking for inclusion without offering the corresponding guarantee will
    be met with rejection, and rightly so.
    How would you or the rest of the community react to a proposal for a
    generic parallelism API? I suspect the response would be "show us an
    implementation of the code". I could whip up an implementation or adapt
    one of the existing systems, but then I worry that the discussion would
    devolve into an argument about the pros and cons of the particular
    implementation instead of the API. Even worse, it might devolve into an
    argument of the value of fine-grained vs. coarse-grained parallelism or
    the GIL. Considering that these issues seem to have been discussed quite
    a bit already and there are already multiple parallel processing
    implementations, it seems like the way forward lies in either a blessing
    of a particular package that already exists or adoption of an API
    instead of a particular implementation.
    Well, you can't design a good API without having an implementation of it. If you
    can't use the API in real problems, then you won't know what problems it has.
    Preferably, for an API that's intended to have multiple "vendors", you should
    have 2 implementations taking different approaches so you can get some idea of
    whether the API generalizes well.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Stefan Behnel at Dec 28, 2007 at 9:07 am

    Robert Kern wrote:
    The problem is that for SQL databases, there is a substantial API that they can
    all share. The implementations are primarily differentiated by other factors
    like speed, in-memory or on-disk, embedded or server, the flavor of SQL, etc.
    and only secondarily differentiated by their extensions to the DB-API. With
    parallel processing, the API itself is a key differentiator between toolkits and
    approaches. Different problems require different APIs, not just different
    implementations.
    Well, there is one parallel processing API that already *is* part of stdlib:
    the threading module. So the processing module would fit just nicely into the
    idea of a "standard" library.

    Stefan
  • Christian Heimes at Dec 28, 2007 at 1:46 pm

    Stefan Behnel wrote:
    Well, there is one parallel processing API that already *is* part of stdlib:
    the threading module. So the processing module would fit just nicely into the
    idea of a "standard" library.
    Don't you forget the select module and its siblings for I/O bound
    concurrency?

    Christian
  • Stefan Behnel at Dec 28, 2007 at 2:15 pm

    Christian Heimes wrote:
    Stefan Behnel wrote:
    Well, there is one parallel processing API that already *is* part of stdlib:
    the threading module. So the processing module would fit just nicely into the
    idea of a "standard" library.
    Don't you forget the select module and its siblings for I/O bound
    concurrency?
    Hmm, when I think of "parallel processing", it's usually about processing, not
    about I/O. If it starts getting I/O bound, it's rather worth considering
    single-threaded processing instead.

    Stefan
  • Robert Kern at Dec 29, 2007 at 6:16 am

    Stefan Behnel wrote:
    Robert Kern wrote:
    The problem is that for SQL databases, there is a substantial API that they can
    all share. The implementations are primarily differentiated by other factors
    like speed, in-memory or on-disk, embedded or server, the flavor of SQL, etc.
    and only secondarily differentiated by their extensions to the DB-API. With
    parallel processing, the API itself is a key differentiator between toolkits and
    approaches. Different problems require different APIs, not just different
    implementations.
    Well, there is one parallel processing API that already *is* part of stdlib:
    the threading module. So the processing module would fit just nicely into the
    idea of a "standard" library.
    True. I suspect that if any of them get into the standard library, it will be
    that one.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Calvin Spealman at Dec 28, 2007 at 3:44 pm
    I think we are a ways off from the point where any of the solutions
    are well used, matured, and trusted to promote as a Python standard
    module. I'd love to see it happen, but even worse than it never
    happening is it happening too soon.

    On Dec 27, 2007 8:52 AM, Emin.shopper Martinian.shopper
    wrote:
    Dear Experts,

    Is there any hope of a parallel processing toolkit being incorporated into
    the python standard library? I've seen a wide variety of toolkits each with
    various features and limitations. Unfortunately, each has its own API. For
    coarse-grained parallelism, I suspect I'd be pretty happy with many of the
    existing toolkits, but if I'm going to pick one API to learn and program to,
    I'd rather pick one that I'm confident is going to be supported for a while.

    So is there any hope of adoption of a parallel processing system into the
    python standard library? If not, is there any hope of something like the
    db-api for coarse grained parallelism (i.e, a common API that different
    toolkits can support)?

    Thanks,
    -Emin

    --
    http://mail.python.org/mailman/listinfo/python-list


    --
    Read my blog! I depend on your acceptance of my opinion! I am interesting!
    http://ironfroggy-code.blogspot.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedDec 27, '07 at 1:52p
activeDec 29, '07 at 6:32a
posts9
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase