FAQ

[Cython] Out of order side effects of argument evaluation in function calls (ticket #654)

Stefan Behnel
Mar 11, 2011 at 12:45 pm
Hi,

ticket 654 describes a code generation problem where the arguments to a
cdef function are not being evaluated in the order they are written down in
the code.

http://trac.cython.org/cython_trac/ticket/654

This introduces problems when the arguments have side effects or are not
simple, e.g. they are function calls themselves or are taken from object
attributes or live in a closure. For example,

f(g(a), a.x, h(a))

will produce different results depending on the evaluation order if g or h
change the value of a.x.

However, apparently, the order of evaluation is only guaranteed by Python,
not by C. Now, the question is: what are the right semantics for Cython
here: follow Python or C?

Personally, I think it would be nice to keep up Python's semantics, but
when I implemented this I broke quite some code in Sage (you may have
noticed that the sage-build project in Hudson has been red for a while).
There are things in C and especially in C++ that cannot be easily copied
into a temporary variable in order to make sure they are evaluated before
the following arguments. This is not a problem for Python function calls
where all arguments end up being copied (and often converted) anyway. It is
a problem for C function calls, though.

What do you think about this?

Stefan
reply

Search Discussions

17 responses

  • Vitja Makarov at Mar 11, 2011 at 2:04 pm

    2011/3/11 Stefan Behnel <stefan_ml at behnel.de>:
    Hi,

    ticket 654 describes a code generation problem where the arguments to a cdef
    function are not being evaluated in the order they are written down in the
    code.

    http://trac.cython.org/cython_trac/ticket/654

    This introduces problems when the arguments have side effects or are not
    simple, e.g. they are function calls themselves or are taken from object
    attributes or live in a closure. For example,

    ? ?f(g(a), a.x, h(a))

    will produce different results depending on the evaluation order if g or h
    change the value of a.x.

    However, apparently, the order of evaluation is only guaranteed by Python,
    not by C. Now, the question is: what are the right semantics for Cython
    here: follow Python or C?
    +1 for Python, Cython is mostly Python than C, btw I don't think that
    it makes much sense.
    Personally, I think it would be nice to keep up Python's semantics, but when
    I implemented this I broke quite some code in Sage (you may have noticed
    that the sage-build project in Hudson has been red for a while). There are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls where
    all arguments end up being copied (and often converted) anyway. It is a
    problem for C function calls, though.

    What do you think about this?
    ? ?f(g(a), a.x, h(a))
    Why could not this be translated into:

    tmp1 = g(a)
    tmp2 = a.x
    tmp3 = h(a)

    f(tmp1, tmp2, tmp3)


    --
    vitja.
  • Stefan Behnel at Mar 11, 2011 at 2:08 pm

    Vitja Makarov, 11.03.2011 15:04:
    2011/3/11 Stefan Behnel:
    Personally, I think it would be nice to keep up Python's semantics, but when
    I implemented this I broke quite some code in Sage (you may have noticed
    that the sage-build project in Hudson has been red for a while). There are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls where
    all arguments end up being copied (and often converted) anyway. It is a
    problem for C function calls, though.
    f(g(a), a.x, h(a))
    Why could not this be translated into:

    tmp1 = g(a)
    tmp2 = a.x
    tmp3 = h(a)

    f(tmp1, tmp2, tmp3)
    See above.

    Stefan
  • Stefan Behnel at Mar 11, 2011 at 5:36 pm

    Stefan Behnel, 11.03.2011 15:08:
    Vitja Makarov, 11.03.2011 15:04:
    2011/3/11 Stefan Behnel:
    Personally, I think it would be nice to keep up Python's semantics, but
    when
    I implemented this I broke quite some code in Sage (you may have noticed
    that the sage-build project in Hudson has been red for a while). There are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls where
    all arguments end up being copied (and often converted) anyway. It is a
    problem for C function calls, though.
    f(g(a), a.x, h(a))
    Why could not this be translated into:

    tmp1 = g(a)
    tmp2 = a.x
    tmp3 = h(a)

    f(tmp1, tmp2, tmp3)
    See above.
    To be a little clearer here, it's a problem in C for example with struct
    values. Copying them by value into a temp variable can be expensive,
    potentially twice as expensive as simply passing them into the function
    normally.

    Not sure what kind of additional devilry C++ provides here, but I'd expect
    that object values can exhibit bizarre behaviour when being assigned. Maybe
    others can enlighten me here.

    I have no idea how many cases there actually are that we can't handle or
    that may lead to a performance degradation when using temps, but the
    problem is that they exist at all.

    Stefan
  • Robert Bradshaw at Mar 11, 2011 at 6:33 pm

    On Fri, Mar 11, 2011 at 9:36 AM, Stefan Behnel wrote:
    Stefan Behnel, 11.03.2011 15:08:
    Vitja Makarov, 11.03.2011 15:04:
    2011/3/11 Stefan Behnel:
    Personally, I think it would be nice to keep up Python's semantics, but
    when
    I implemented this I broke quite some code in Sage (you may have noticed
    that the sage-build project in Hudson has been red for a while). There
    are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls
    where
    all arguments end up being copied (and often converted) anyway. It is a
    problem for C function calls, though.
    f(g(a), a.x, h(a))
    Why could not this be translated into:

    tmp1 = g(a)
    tmp2 = a.x
    tmp3 = h(a)

    f(tmp1, tmp2, tmp3)
    See above.
    To be a little clearer here, it's a problem in C for example with struct
    values. Copying them by value into a temp variable can be expensive,
    potentially twice as expensive as simply passing them into the function
    normally.
    Yep, and some types (e.g. array types) can't be assigned to at all.
    FWIW, the issues with Sage is that many libraries use the "stack
    allocated, pass-by-reference" trick

    typedef foo_struct foo_t[1]

    but we have declared them to be of type "void*" because we don't care
    about or want to muck with the internals. Cleaning this up is
    something we should do, but is taking a while, and aside from that it
    makes Cython even more dependent on correct type declarations (and
    backwards incompatible in this regard). Sage is a great test for the
    buildbot, so keeping it red for so long is not a good thing either.
    Not sure what kind of additional devilry C++ provides here, but I'd expect
    that object values can exhibit bizarre behaviour when being assigned. Maybe
    others can enlighten me here.
    Yes, C++ allows overloading of the assignment operator, so assigning
    may lead to arbitrary code (as well as probably an expensive copy, as
    with structs, and structs in C++ are really just classes with a
    different visibility).
    I have no idea how many cases there actually are that we can't handle or
    that may lead to a performance degradation when using temps, but the problem
    is that they exist at all.
    Note that this applies not just to function arguments, but a host of
    other places, e.g. the order of evaluation of "f() + g()" in C is
    unspecified. Fleshing this out completely will lead to a lot more
    temps and verbose C code. And then we'd just cross our fingers that
    the C compiler was able to optimize all these temps away (though still
    possibly producing inferior code if it were able to switch order of
    execution).

    Whatever we do, we need a flag/directive. Perhaps there's a way to
    guarantee correct ordering for all valid Python code, even in the
    presence of (function and variable) type inference, but allow some
    leeway for explicitly declared C functions.

    - Robert
  • Stefan Behnel at Mar 16, 2011 at 2:53 pm

    Robert Bradshaw, 11.03.2011 19:33:
    On Fri, Mar 11, 2011 at 9:36 AM, Stefan Behnelwrote:
    Stefan Behnel, 11.03.2011 15:08:
    Vitja Makarov, 11.03.2011 15:04:
    2011/3/11 Stefan Behnel:
    Personally, I think it would be nice to keep up Python's semantics, but
    when
    I implemented this I broke quite some code in Sage (you may have noticed
    that the sage-build project in Hudson has been red for a while). There
    are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls
    where
    all arguments end up being copied (and often converted) anyway. It is a
    problem for C function calls, though.
    f(g(a), a.x, h(a))
    Why could not this be translated into:

    tmp1 = g(a)
    tmp2 = a.x
    tmp3 = h(a)

    f(tmp1, tmp2, tmp3)
    See above.
    To be a little clearer here, it's a problem in C for example with struct
    values. Copying them by value into a temp variable can be expensive,
    potentially twice as expensive as simply passing them into the function
    normally.
    Yep, and some types (e.g. array types) can't be assigned to at all.
    FWIW, the issues with Sage is that many libraries use the "stack
    allocated, pass-by-reference" trick

    typedef foo_struct foo_t[1]

    but we have declared them to be of type "void*" because we don't care
    about or want to muck with the internals. Cleaning this up is
    something we should do, but is taking a while, and aside from that it
    makes Cython even more dependent on correct type declarations (and
    backwards incompatible in this regard). Sage is a great test for the
    buildbot, so keeping it red for so long is not a good thing either.
    Not sure what kind of additional devilry C++ provides here, but I'd expect
    that object values can exhibit bizarre behaviour when being assigned. Maybe
    others can enlighten me here.
    Yes, C++ allows overloading of the assignment operator, so assigning
    may lead to arbitrary code (as well as probably an expensive copy, as
    with structs, and structs in C++ are really just classes with a
    different visibility).
    I have no idea how many cases there actually are that we can't handle or
    that may lead to a performance degradation when using temps, but the problem
    is that they exist at all.
    Note that this applies not just to function arguments, but a host of
    other places, e.g. the order of evaluation of "f() + g()" in C is
    unspecified. Fleshing this out completely will lead to a lot more
    temps and verbose C code. And then we'd just cross our fingers that
    the C compiler was able to optimize all these temps away (though still
    possibly producing inferior code if it were able to switch order of
    execution).

    Whatever we do, we need a flag/directive. Perhaps there's a way to
    guarantee correct ordering for all valid Python code
    Plain Python code should never suffer from the original problem (with the
    obvious exception of cpdef functions), but I also see how type inference
    can affect this guarantee for the evaluation order in expressions.

    I agree with Greg, though, that it has a code smell if (multiple)
    subexpressions in an expression have side-effects, especially when they are
    impacted by the evaluation order. That means that the average innocent
    future maintainer could break the code by manually rearranging the
    expression in a way that's perfectly valid mathematically, e.g. as a
    readability fix.

    even in the
    presence of (function and variable) type inference, but allow some
    leeway for explicitly declared C functions.
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to write
    bad code.

    Stefan
  • Greg Ewing at Mar 16, 2011 at 11:13 pm

    Stefan Behnel wrote:
    That means that the average
    innocent future maintainer could break the code by manually rearranging
    the expression in a way that's perfectly valid mathematically,
    Another thing is that when the subexpressions concerned
    are function arguments, it only works if the order of the
    arguments happens to be the same as the required order of
    evaluation. Changes in the function signature can therefore
    lead to subtle breakage.

    --
    Greg
  • Robert Bradshaw at Mar 17, 2011 at 4:05 am

    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel wrote:
    Robert Bradshaw, 11.03.2011 19:33:
    On Fri, Mar 11, 2011 at 9:36 AM, Stefan Behnel<stefan_ml at behnel.de>
    ?wrote:
    Stefan Behnel, 11.03.2011 15:08:
    Vitja Makarov, 11.03.2011 15:04:
    2011/3/11 Stefan Behnel:
    Personally, I think it would be nice to keep up Python's semantics,
    but
    when
    I implemented this I broke quite some code in Sage (you may have
    noticed
    that the sage-build project in Hudson has been red for a while). There
    are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls
    where
    all arguments end up being copied (and often converted) anyway. It is
    a
    problem for C function calls, though.
    f(g(a), a.x, h(a))
    Why could not this be translated into:

    tmp1 = g(a)
    tmp2 = a.x
    tmp3 = h(a)

    f(tmp1, tmp2, tmp3)
    See above.
    To be a little clearer here, it's a problem in C for example with struct
    values. Copying them by value into a temp variable can be expensive,
    potentially twice as expensive as simply passing them into the function
    normally.
    Yep, and some types (e.g. array types) can't be assigned to at all.
    FWIW, the issues with Sage is that many libraries use the "stack
    allocated, pass-by-reference" trick

    ? ? typedef foo_struct foo_t[1]

    but we have declared them to be of type "void*" because we don't care
    about or want to muck with the internals. Cleaning this up is
    something we should do, but is taking a while, and aside from that it
    makes Cython even more dependent on correct type declarations (and
    backwards incompatible in this regard). Sage is a great test for the
    buildbot, so keeping it red for so long is not a good thing either.
    Not sure what kind of additional devilry C++ provides here, but I'd
    expect
    that object values can exhibit bizarre behaviour when being assigned.
    Maybe
    others can enlighten me here.
    Yes, C++ allows overloading of the assignment operator, so assigning
    may lead to arbitrary code (as well as probably an expensive copy, as
    with structs, and structs in C++ are really just classes with a
    different visibility).
    I have no idea how many cases there actually are that we can't handle or
    that may lead to a performance degradation when using temps, but the
    problem
    is that they exist at all.
    Note that this applies not just to function arguments, but a host of
    other places, e.g. the order of evaluation of "f() + g()" in C is
    unspecified. Fleshing this out completely will lead to a lot more
    temps and verbose C code. And then we'd just cross our fingers that
    the C compiler was able to optimize all these temps away (though still
    possibly producing inferior code if it were able to switch order of
    execution).

    Whatever we do, we need a flag/directive. Perhaps there's a way to
    guarantee correct ordering for all valid Python code
    Plain Python code should never suffer from the original problem (with the
    obvious exception of cpdef functions), but I also see how type inference can
    affect this guarantee for the evaluation order in expressions.
    In particular, we need to be careful if we ever automatically cpdef a
    function (as part of a future optimization).
    I agree with Greg, though, that it has a code smell if (multiple)
    subexpressions in an expression have side-effects, especially when they are
    impacted by the evaluation order. That means that the average innocent
    future maintainer could break the code by manually rearranging the
    expression in a way that's perfectly valid mathematically, e.g. as a
    readability fix.

    even in the
    presence of (function and variable) type inference, but allow some
    leeway for explicitly declared C functions.
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to write
    bad code.
    Yep. Lets keep the code in for the above case.

    - Robert
  • Jason Grout at Mar 17, 2011 at 1:15 pm

    On 3/16/11 11:05 PM, Robert Bradshaw wrote:
    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnelwrote:
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to write
    bad code.
    Yep. Lets keep the code in for the above case.

    Is there a huge big warning in the docs? Maybe on this page would be a
    good place: http://docs.cython.org/src/userguide/limitations.html

    Jason
  • Robert Bradshaw at Mar 17, 2011 at 8:37 pm

    On Thu, Mar 17, 2011 at 6:15 AM, Jason Grout wrote:
    On 3/16/11 11:05 PM, Robert Bradshaw wrote:

    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel<stefan_ml at behnel.de>
    ?wrote:
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work
    around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to
    write
    bad code.
    Yep. Lets keep the code in for the above case.

    Is there a huge big warning in the docs? ?Maybe on this page would be a good
    place: http://docs.cython.org/src/userguide/limitations.html
    This doesn't affect Python functions at all, so I'm not sure it
    belongs on that page. I agree that it should be mentioned that c(p)def
    functions have C calling semantics, *including* an unspecified order
    or argument evaluation.

    - Robert
  • Vitja Makarov at Mar 17, 2011 at 8:48 pm

    2011/3/17 Robert Bradshaw <robertwb at math.washington.edu>:
    On Thu, Mar 17, 2011 at 6:15 AM, Jason Grout
    wrote:
    On 3/16/11 11:05 PM, Robert Bradshaw wrote:

    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel<stefan_ml at behnel.de>
    ?wrote:
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work
    around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to
    write
    bad code.
    Yep. Lets keep the code in for the above case.

    Is there a huge big warning in the docs? ?Maybe on this page would be a good
    place: http://docs.cython.org/src/userguide/limitations.html
    This doesn't affect Python functions at all, so I'm not sure it
    belongs on that page. I agree that it should be mentioned that c(p)def
    functions have C calling semantics, *including* an unspecified order
    or argument evaluation.
    As you noticed above, how should be handled def function inlining?

    I guess there are restrictions on def function argument type, so
    side-effect isn't issue here.

    --
    vitja.
  • Robert Bradshaw at Mar 17, 2011 at 8:57 pm

    On Thu, Mar 17, 2011 at 1:48 PM, Vitja Makarov wrote:
    2011/3/17 Robert Bradshaw <robertwb at math.washington.edu>:
    On Thu, Mar 17, 2011 at 6:15 AM, Jason Grout
    wrote:
    On 3/16/11 11:05 PM, Robert Bradshaw wrote:

    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel<stefan_ml at behnel.de>
    ?wrote:
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work
    around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to
    write
    bad code.
    Yep. Lets keep the code in for the above case.

    Is there a huge big warning in the docs? ?Maybe on this page would be a good
    place: http://docs.cython.org/src/userguide/limitations.html
    This doesn't affect Python functions at all, so I'm not sure it
    belongs on that page. I agree that it should be mentioned that c(p)def
    functions have C calling semantics, *including* an unspecified order
    or argument evaluation.
    As you noticed above, how should be handled def function inlining?

    I guess there are restrictions on def function argument type, so
    side-effect isn't issue here.
    Yep, or at least not near as much of an issue. (I think the C compiler
    can optimize away an, e.g, extra copy of, e.g, int, double and
    PyObject* temps in most cases.)

    - Robert
  • Stefan Behnel at Mar 17, 2011 at 9:25 pm

    Robert Bradshaw, 17.03.2011 05:05:
    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel wrote:
    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to write
    bad code.
    Yep. Lets keep the code in for the above case.
    Erm, the current status is that we try to guarantee the order by pushing
    everything into temps, thus breaking Sage. What code exactly did you intend
    to keep here?

    Stefan
  • Robert Bradshaw at Mar 17, 2011 at 9:26 pm

    On Thu, Mar 17, 2011 at 2:25 PM, Stefan Behnel wrote:
    Robert Bradshaw, 17.03.2011 05:05:
    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel wrote:

    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work
    around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to
    write
    bad code.
    Yep. Lets keep the code in for the above case.
    Erm, the current status is that we try to guarantee the order by pushing
    everything into temps, thus breaking Sage. What code exactly did you intend
    to keep here?
    I was thinking about guarding it with an if False (or flag in Options.py).

    - Robert
  • Stefan Behnel at Mar 17, 2011 at 9:35 pm

    Robert Bradshaw, 17.03.2011 22:26:
    On Thu, Mar 17, 2011 at 2:25 PM, Stefan Behnel wrote:
    Robert Bradshaw, 17.03.2011 05:05:
    On Wed, Mar 16, 2011 at 7:53 AM, Stefan Behnel wrote:

    I'm actually leaning towards not guaranteeing the order of execution if C
    doesn't do it either. If this is really required, it's easy to work
    around
    for users, but it's severely hard to fix for Cython in all cases, and the
    gain is truly small. After all, we'd only make it easier for users to
    write
    bad code.
    Yep. Lets keep the code in for the above case.
    Erm, the current status is that we try to guarantee the order by pushing
    everything into temps, thus breaking Sage. What code exactly did you intend
    to keep here?
    I was thinking about guarding it with an if False (or flag in Options.py).
    Ah - the "poor man's VCS" approach? ;)

    Stefan
  • Greg Ewing at Mar 11, 2011 at 11:37 pm

    Stefan Behnel wrote:

    To be a little clearer here, it's a problem in C for example with struct
    values. Copying them by value into a temp variable can be expensive,
    potentially twice as expensive as simply passing them into the function
    normally.
    What are you actually proposing to do here? Anything that
    Cython does in the generated code to guarantee evaluation
    order is going to involve using temp variables, and thus
    incur the same overhead.

    --
    Greg
  • Lisandro Dalcin at Mar 11, 2011 at 3:05 pm

    On 11 March 2011 09:45, Stefan Behnel wrote:
    Hi,

    ticket 654 describes a code generation problem where the arguments to a cdef
    function are not being evaluated in the order they are written down in the
    code.

    http://trac.cython.org/cython_trac/ticket/654

    This introduces problems when the arguments have side effects or are not
    simple, e.g. they are function calls themselves or are taken from object
    attributes or live in a closure. For example,

    ? ?f(g(a), a.x, h(a))

    will produce different results depending on the evaluation order if g or h
    change the value of a.x.

    However, apparently, the order of evaluation is only guaranteed by Python,
    not by C. Now, the question is: what are the right semantics for Cython
    here: follow Python or C?

    Personally, I think it would be nice to keep up Python's semantics, but when
    I implemented this I broke quite some code in Sage (you may have noticed
    that the sage-build project in Hudson has been red for a while). There are
    things in C and especially in C++ that cannot be easily copied into a
    temporary variable in order to make sure they are evaluated before the
    following arguments. This is not a problem for Python function calls where
    all arguments end up being copied (and often converted) anyway. It is a
    problem for C function calls, though.

    What do you think about this?
    Regarding our previous history, I would go for Python semantics...

    Perhaps you could add a compiler directive for these rare cases where
    you need/want C/C++ semantics?


    --
    Lisandro Dalcin
    ---------------
    CIMEC (INTEC/CONICET-UNL)
    Predio CONICET-Santa Fe
    Colectora RN 168 Km 472, Paraje El Pozo
    3000 Santa Fe, Argentina
    Tel: +54-342-4511594 (ext 1011)
    Tel/Fax: +54-342-4511169
  • Greg Ewing at Mar 11, 2011 at 11:29 pm

    Stefan Behnel wrote:

    This introduces problems when the arguments have side effects or are not
    simple, e.g.

    f(g(a), a.x, h(a))

    What do you think about this?
    I think it's a bad idea to write code that relies on the order
    of evaluation like this. If the order matters, it's better to
    be explicit about it:

    arg1 = g(a)
    arg2 = h(a)
    f(arg1, a.x, arg2)

    So I would say it's not something worth worrying about overly
    much.

    --
    Greg

Related Discussions