FAQ
Dear All,

For the first time I have come across a Python feature that seems
completely wrong. After the introduction of rich comparisons, equality
comparison does not have to return a truth value, and may indeed return
nothing at all and throw an error instead. As a result, code like
if foo == bar:
or
foo in alist
cannot be relied on to work.

This is clearly no accident. According to the documentation all comparison
operators are allowed to return non-booleans, or to throw errors. There is
explicitly no guarantee that x == x is True.

Personally I would like to get these !@#$%&* misfeatures removed, and
constrain the __eq__ function to always return a truth value. That is
clearly not likely to happen. Unless I have misunderstood something, could
somebody explain to me

1) Why was this introduced? I can understand relaxing the restrictions on
'<', '<=' etc. - after all you cannot define an ordering for all types of
object. But surely you can define an equal/unequal classification for all
types of object, if you want to? Is it just the numpy people wanting to
type 'a == b' instead of 'equals(a,b)', or is there a better reason?

2) If I want to write generic code, can I somehow work around the fact
that
if foo == bar:
or
foo in alist
does not work for arbitrary objects?

Yours,

Rasmus



Some details:

CCPN has a table display class that maintains a list of arbitrary objects,
one per line in the table. The table class is completely generic, and
subclassed for individual cases. It contains the code:

if foo in tbllist:
...
else:
...
tbllist.append(foo)
...

One day the 'if' statement gave this rather obscure error:
"ValueError:
The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()"
A subclass had used objects passed in from some third party code, and as
it turned out foo happened to be a tuple containing a tuple containing a
numpy array.

Some more precise tests gave the following:
# Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22)
# [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
# set up
import numpy
a = float('NaN')
b = float('NaN')
ll = [a,b]
c = numpy.zeros((2,3))
d = numpy.zeros((2,3))
mm = [c,d]

# try NaN
print (a == a) # gives False
print (a is a) # gives True
print (a == b) # gives False
print (a is b) # gives False
print (a in ll) # gives True
print (b in ll) # gives True
print (ll.index(a)) # gives 0
print (ll.index(b)) # gives 1

# try numpy array
print (c is c) # gives True
print (c is d) # gives False
print (c in mm) # gives True
print (mm.index(c)) # 0
print (c == c) # gives [[ True True True][ True True True]]
print (c == d) # gives [[ True True True][ True True True]]
print (bool(1 == c)) # raises error - see below
print (d in mm) # raises error - see below
print (mm.index(d)) # raises error - see below
print (c in ll) # raises error - see below
print (ll.index(c)) # raises error - see below

The error was the same in each case:
"ValueError:
The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()"


---------------------------------------------------------------------------
Dr. Rasmus H. Fogh Email: r.h.fogh at bioc.cam.ac.uk
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002

Search Discussions

  • Terry Reedy at Dec 6, 2008 at 7:56 pm

    Rasmus Fogh wrote:
    Dear All,

    For the first time I have come across a Python feature that seems
    completely wrong. After the introduction of rich comparisons, equality
    comparison does not have to return a truth value, and may indeed return
    nothing at all and throw an error instead. As a result, code like
    if foo == bar:
    or
    foo in alist
    cannot be relied on to work.

    This is clearly no accident. According to the documentation all comparison
    operators are allowed to return non-booleans, or to throw errors. There is
    explicitly no guarantee that x == x is True.
    You have touched on a real and known issue that accompanies dynamic
    typing and the design of Python. *Every* Python function can return any
    Python object and may raise any exception either actively, by design, or
    passively, by not catching exceptions raised in the functions *it* calls.
    Personally I would like to get these !@#$%&* misfeatures removed,
    What you are calling a misfeature is an absence, not a presence that can
    be removed.
    and constrain the __eq__ function to always return a truth value.
    It is impossible to do that with certainty by any mechanical
    creation-time checking. So the implementation of operator.eq would have
    to check the return value of the ob.__eq__ function it calls *every
    time*. That would slow down the speed of the 99.xx% of cases where the
    check is not needed and would still not prevent exceptions. And if the
    return value was bad, all operator.eq could do is raise and exception
    anyway.
    That is clearly not likely to happen. Unless I have misunderstood something, could
    somebody explain to me.
    a. See above.
    b. Python programmers are allowed to define 'weird' but possibly
    useful-in-context behaviors, such as try out 3-value logic, or to
    operate on collections element by element (as with numpy).
    1) Why was this introduced?
    The 6 comparisons were previously done with one __cmp__ function that
    was supposed to return -1, 0, or 1 and which worked with negative, 0, or
    positive response, but which could return anything or raise an
    exception. The compare functions could mask but not prevent weird returns.

    I can understand relaxing the restrictions on
    '<', '<=' etc. - after all you cannot define an ordering for all types of
    object. But surely you can define an equal/unequal classification for all
    types of object, if you want to? Is it just the numpy people wanting to
    type 'a == b' instead of 'equals(a,b)', or is there a better reason?

    2) If I want to write generic code, can I somehow work around the fact
    that
    if foo == bar:
    or
    foo in alist
    does not work for arbitrary objects?
    Every Python function is 'generic' unless restrained by type tests.
    However, even 'generic' functions can only work as expected with objects
    that meet the assumptions embodied in the function. In my Python-based
    algorithm book-in-progess, I am stating this explicitly. In particular,
    I say taht the book only applies to objects for which '==' gives a
    boolean result that is reflexive, symmetric, and transitive. This
    exludes float('nan'), for instance (as I see you discovered), which
    follows the IEEE mandate to act otherwise.
    CCPN has a table display class that maintains a list of arbitrary objects,
    one per line in the table. The table class is completely generic,
    but only for the objects that meet the implied assumption. This is true
    for *all* Python code. If you want to apply the function to other
    objects, you must either adapt the function or adapt or wrap the objects
    to give them an interface that does meet the assumptions.
    and subclassed for individual cases. It contains the code:

    if foo in tbllist:
    ...
    else:
    ...
    tbllist.append(foo)
    ...

    One day the 'if' statement gave this rather obscure error:
    "ValueError:
    The truth value of an array with more than one element is ambiguous.
    Use a.any() or a.all()"
    A subclass had used objects passed in from some third party code, and as
    it turned out foo happened to be a tuple containing a tuple containing a
    numpy array.
    Right. 'in' calls '==' and assumes a boolean return. Assumption
    violated, exception raised. Completely normal. The error message even
    suggests a solution: wrap the offending objects in an adaptor class that
    gives them a normal interface with .all (or perhaps the all() builtin).

    Terry Jan Reedy
  • Robert Kern at Dec 6, 2008 at 11:57 pm

    Terry Reedy wrote:
    Rasmus Fogh wrote:
    Dear All,

    For the first time I have come across a Python feature that seems
    completely wrong. After the introduction of rich comparisons, equality
    comparison does not have to return a truth value, and may indeed return
    nothing at all and throw an error instead. As a result, code like
    if foo == bar:
    or
    foo in alist
    cannot be relied on to work.

    This is clearly no accident. According to the documentation all
    comparison
    operators are allowed to return non-booleans, or to throw errors.
    There is
    explicitly no guarantee that x == x is True.
    You have touched on a real and known issue that accompanies dynamic
    typing and the design of Python. *Every* Python function can return any
    Python object and may raise any exception either actively, by design, or
    passively, by not catching exceptions raised in the functions *it* calls.
    Personally I would like to get these !@#$%&* misfeatures removed,
    What you are calling a misfeature is an absence, not a presence that can
    be removed.
    That's not quite true. Rich comparisons explicitly allow non-boolean return
    values. Breaking up __cmp__ into multiple __special__ methods was not the sole
    purpose of rich comparisons. One of the prime examples at the time was numpy
    (well, Numeric at the time). We wanted to use == to be able to return an array
    with boolean values where the two operand arrays were equal. E.g.

    In [1]: from numpy import *

    In [2]: array([1, 2, 3]) == array([4, 2, 3])
    Out[2]: array([False, True, True], dtype=bool)

    SQLAlchemy uses these operators to build up objects that will be turned into SQL
    expressions.
    print users.c.id==addresses.c.user_id
    users.id = addresses.user_id

    Basically, the idea was to turn these operators into full-fledged operators like
    +-/*. Returning a non-boolean violates neither the letter, nor the spirit of the
    feature.

    Unfortunately, if you do overload __eq__ to build up expressions or whatnot, the
    other places where users of __eq__ are implicitly expecting a boolean break.
    While I was (and am) a supporter of rich comparisons, I feel Rasmus's pain from
    time to time. It would be nice to have an alternate method to express the
    boolean "yes, this thing is equal in value to that other thing". Unfortunately,
    I haven't figured out a good way to fit it in now without sacrificing rich
    comparisons entirely.
    and constrain the __eq__ function to always return a truth value.
    It is impossible to do that with certainty by any mechanical
    creation-time checking. So the implementation of operator.eq would have
    to check the return value of the ob.__eq__ function it calls *every
    time*. That would slow down the speed of the 99.xx% of cases where the
    check is not needed and would still not prevent exceptions. And if the
    return value was bad, all operator.eq could do is raise and exception
    anyway.
    Sure, but then it would be a bug to return a non-boolean from __eq__ and
    friends. It is not a bug today. I think that's what Rasmus is proposing.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Terry Reedy at Dec 7, 2008 at 10:11 pm

    Robert Kern wrote:
    Terry Reedy wrote:
    Rasmus Fogh wrote:
    Personally I would like to get these !@#$%&* misfeatures removed,
    What you are calling a misfeature is an absence, not a presence that
    can be removed.
    That's not quite true.
    In what way, pray tell. My statement still looks quite true to me.
    Rich comparisons explicitly allow non-boolean return values.
    They do so by not doing anything to the return value of the underlying
    method. As I said, the OP is complaining about an absence of a check.
    Moreover, the absence is intentional as I explained in the part snipped
    and as you further explained.

    And if the return value was bad, all operator.eq could do is raise and
    exception anyway.
    Sure, but then it would be a bug to return a non-boolean from __eq__ and
    friends. It is not a bug today. I think that's what Rasmus is proposing.
    Right, the addition of a check that is absent today.

    tjr
  • Robert Kern at Dec 8, 2008 at 12:27 am

    Terry Reedy wrote:
    Robert Kern wrote:
    Terry Reedy wrote:
    Rasmus Fogh wrote:
    Personally I would like to get these !@#$%&* misfeatures removed,
    What you are calling a misfeature is an absence, not a presence that
    can be removed.
    That's not quite true.
    In what way, pray tell. My statement still looks quite true to me.
    There is an explicit policy that __eq__() methods can return non-bools for
    various purposes. I consider that policy to a "presence that can be removed".
    There is no check because that policy exists, not the other way around.

    Anyways, this is really a semantic digression, and not particularly important.
    Peace?

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Terry Reedy at Dec 8, 2008 at 6:40 pm

    Robert Kern wrote:

    There is an explicit policy that __eq__() methods can return non-bools
    for various purposes. I consider that policy to a "presence that can be
    removed". There is no check because that policy exists, not the other
    way around.
    OK, presence in manual versus presence in code.
    Anyways, this is really a semantic digression, and not particularly
    important. Peace?
    Yes
  • James Stroud at Dec 7, 2008 at 11:57 am

    Rasmus Fogh wrote:
    Dear All,

    For the first time I have come across a Python feature that seems
    completely wrong. After the introduction of rich comparisons, equality
    comparison does not have to return a truth value, and may indeed return
    nothing at all and throw an error instead. As a result, code like
    if foo == bar:
    or
    foo in alist
    cannot be relied on to work.

    This is clearly no accident. According to the documentation all comparison
    operators are allowed to return non-booleans, or to throw errors. There is
    explicitly no guarantee that x == x is True.
    I'm not a computer scientist, so my language and perspective on the
    topic may be a bit naive, but I'll try to demonstrate my caveman
    understanding example.

    First, here is why the ability to throw an error is a feature:

    class Apple(object):
    def __init__(self, appleness):
    self.appleness = appleness
    def __cmp__(self, other):
    assert isinstance(other, Apple), 'must compare apples to apples'
    return cmp(self.appleness, other.appleness)

    class Orange(object): pass

    Apple(42) == Orange()


    Second, consider that any value in python also evaluates to a truth
    value in boolean context.

    Third, every function returns something. A function's returning nothing
    is not a possibility in the python language. None is something but
    evaluates to False in boolean context.
    But surely you can define an equal/unequal classification for all
    types of object, if you want to?
    This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
    Even in the realm of pure mathematics, the generality of objects (i.e.
    numbers) can not be assumed.


    James


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
  • Luis Zarrabeitia at Dec 7, 2008 at 2:23 pm

    Quoting James Stroud <jstroud at mbi.ucla.edu>:

    First, here is why the ability to throw an error is a feature:

    class Apple(object):
    def __init__(self, appleness):
    self.appleness = appleness
    def __cmp__(self, other):
    assert isinstance(other, Apple), 'must compare apples to apples'
    return cmp(self.appleness, other.appleness)

    class Orange(object): pass

    Apple(42) == Orange()
    I beg to disagree.
    The right answer for the question "Am I equal to this chair right here?" is not
    "I don't know", nor "I can't compare". The answer is "No, I'm not a chair, thus
    I'm not equal to this chair right here". If someone comes to my house, looking
    for me, he will not run away because he sees a chair before he sees me. Your
    assert doesn't belong inside the methot, it should be up to the caller to decide
    if the human-chair comparisons make sense or not. I certainly don't want to be
    type-checking when looking for an object within a mixed-type collection.
    This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
    I assume you meant sqrt(32i).
    Well, sqrt is a function, and if its result value is defined as 4+4i, then the
    answer is 'yes', otherwise, the answer should be no.

    sqrt(4) is *not* -2, and should not be equal to -2. The standard definition of
    the square root _function_ for real numbers is to take the non-negative real
    root. I haven't heard of a standard square root _function_ for complex numbers
    (there is of course, a definition of square root, but it is not a function).

    So, if by your definition of sqrt, sqrt(32i) returns a number, there is no
    ambiguity. -2 is not sqrt(4). If you need the answer to be 'True', you may be
    asking the wrong question.
  • James Stroud at Dec 7, 2008 at 9:53 pm

    Luis Zarrabeitia wrote:
    Quoting James Stroud <jstroud at mbi.ucla.edu>:
    First, here is why the ability to throw an error is a feature:

    class Apple(object):
    def __init__(self, appleness):
    self.appleness = appleness
    def __cmp__(self, other):
    assert isinstance(other, Apple), 'must compare apples to apples'
    return cmp(self.appleness, other.appleness)

    class Orange(object): pass

    Apple(42) == Orange()
    I beg to disagree.
    The right answer for the question "Am I equal to this chair right here?" is not
    "I don't know", nor "I can't compare". The answer is "No, I'm not a chair, thus
    I'm not equal to this chair right here". If someone comes to my house, looking
    for me, he will not run away because he sees a chair before he sees me. Your
    assert doesn't belong inside the methot, it should be up to the caller to decide
    if the human-chair comparisons make sense or not. I certainly don't want to be
    type-checking when looking for an object within a mixed-type collection.
    This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
    I assume you meant sqrt(32i).
    No, I definitely didn't mean sqrt(32i). I'm using sqrt() to represent
    the mathematical square root, and not an arbitrary function one might
    define, by the way.

    My point is that 4 + 4i, sqrt(32), and sqrt(-32) all exist in different
    spaces. They are not comparable, even when testing for equality in a
    pure mathematical sense. If when encounter these values in our programs,
    we might like the power to decide the results of these comparisons. In
    one context it might make sense to throw an exception, in another, it
    might make sense to return False based on the fact that we consider them
    different "types", in yet another context, it might make sense to look
    at complex plane values as vectors and return their scalar magnitude for
    comparison to real numbers. I think this ability to define the results
    of comparisons is not a shortcoming of the language but a strength.

    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
  • Rasmus Fogh at Dec 7, 2008 at 12:43 pm

    Robert Kern Wrote:
    Terry Reedy wrote:
    Rasmus Fogh wrote:
    Personally I would like to get these !@#$%&* misfeatures removed,
    What you are calling a misfeature is an absence, not a presence that
    can be removed.
    That's not quite true. Rich comparisons explicitly allow non-boolean
    return values. Breaking up __cmp__ into multiple __special__ methods was
    not the sole purpose of rich comparisons. One of the prime examples at the
    time was numpy (well, Numeric at the time). We wanted to use == to be able
    to return an array
    with boolean values where the two operand arrays were equal. E.g.

    In [1]: from numpy import *

    In [2]: array([1, 2, 3]) == array([4, 2, 3])
    Out[2]: array([False, True, True], dtype=bool)

    SQLAlchemy uses these operators to build up objects that will be turned
    into SQL expressions.
    print users.c.id==addresses.c.user_id
    <users.id = addresses.user_id

    Basically, the idea was to turn these operators into full-fledged
    operators like +-/*. Returning a non-boolean violates neither the letter,
    nor the spirit of the feature.

    Unfortunately, if you do overload __eq__ to build up expressions or
    whatnot, the other places where users of __eq__ are implicitly expecting
    a boolean break.
    While I was (and am) a supporter of rich comparisons, I feel Rasmus's
    pain from time to time. It would be nice to have an alternate method to
    express the boolean "yes, this thing is equal in value to that other thing".
    Unfortunately, I haven't figured out a good way to fit it in now without
    sacrificing rich comparisons entirely.
    The best way, IMHO, would have been to use an alternative notation in
    numpy and SQLalchemy, and have '==' always return only a truth value - it
    could be a non-boolean as long as the bool() function gave the correct
    result. Surely the extra convenience of overloading '==' in special cases
    was not worth breaking such basic operations as 'bool(x == y)' or
    'x in alist'. Again, the problem is only with '==', not with '>', '<='
    etc. Of course it is done now, and unlikely to be reversed.
    and constrain the __eq__ function to always return a truth value.
    It is impossible to do that with certainty by any mechanical
    creation-time checking. So the implementation of operator.eq would
    have to check the return value of the ob.__eq__ function it calls *every
    time*. That would slow down the speed of the 99.xx% of cases where the
    check is not needed and would still not prevent exceptions. And if the
    return value was bad, all operator.eq could do is raise and exception
    anyway.
    Sure, but then it would be a bug to return a non-boolean from __eq__ and
    friends. It is not a bug today. I think that's what Rasmus is proposing.
    Yes, that is the point. If __eq__ functions are *supposed* to return
    booleans I can write generic code that will work for well-behaved objects,
    and any errors will be somebody elses fault. If __eq__ is free to return
    anything, or throw an error, it becomes my responsibility to write generic
    code that will work anyway, including with floating point numbers, numpy,
    or SQLalchemy. And I cannot see any way to do that (suggestions welcome).
    If purportedly general code does not work with numpy, your average numpy
    user will not be receptive to the idea that it is all numpys fault.

    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    ll = [x]
    x in ll
    True
    x == ll[0]
    False
    import numpy
    y = numpy.zeros((3,))
    y
    array([ 0., 0., 0.])
    bool(y==y)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    ll1 = [y,1]
    y in ll1
    True
    ll2 = [1,y]
    y in ll2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    >>>

    Can anybody see a way this could be fixed (please)? I may well have to
    live with it, but I would really prefer not to.

    ---------------------------------------------------------------------------
    Dr. Rasmus H. Fogh Email: r.h.fogh at bioc.cam.ac.uk
    Dept. of Biochemistry, University of Cambridge,
    80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
  • Robert Kern at Dec 7, 2008 at 9:32 pm

    Rasmus Fogh wrote:

    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that one.
    ll = [x]
    x in ll
    True
    x == ll[0]
    False
    import numpy
    y = numpy.zeros((3,))
    y
    array([ 0., 0., 0.])
    bool(y==y)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    ll1 = [y,1]
    y in ll1
    True
    ll2 = [1,y]
    y in ll2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()

    Can anybody see a way this could be fixed (please)? I may well have to
    live with it, but I would really prefer not to.
    Make a concrete proposal for fixing it that does not break backwards compatibility.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Terry Reedy at Dec 7, 2008 at 10:16 pm

    Rasmus Fogh wrote:
    Can anybody see a way this could be fixed (please)? I may well have to
    live with it, but I would really prefer not to.
    I made a suggestion in my first response, which perhaps you missed.

    tjr
  • Steven D'Aprano at Dec 7, 2008 at 11:20 pm

    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.

    --
    Steven
  • Steven D'Aprano at Dec 7, 2008 at 11:37 pm

    On Sun, 07 Dec 2008 23:20:12 +0000, Steven D'Aprano wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.

    Sorry, I should explain why.

    Given:

    x = log(-5) # a NaN
    y = log(-2) # the same NaN
    x == y # Some people want this to be true for NaNs.

    Then:

    # Compare x and y directly.
    log(-5) == log(-2)
    # If x == y then exp(x) == exp(y) for all x, y.
    exp(log(-5)) == exp(log(-2))
    -5 == -2


    and now the entire foundations of mathematics collapses into a steaming
    pile of rubble.


    --
    Steven
  • Robert Kern at Dec 8, 2008 at 12:14 am

    Steven D'Aprano wrote:
    On Sun, 07 Dec 2008 23:20:12 +0000, Steven D'Aprano wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Sorry, I should explain why.

    Given:

    x = log(-5) # a NaN
    y = log(-2) # the same NaN
    x == y # Some people want this to be true for NaNs.

    Then:

    # Compare x and y directly.
    log(-5) == log(-2)
    # If x == y then exp(x) == exp(y) for all x, y.
    exp(log(-5)) == exp(log(-2))
    -5 == -2


    and now the entire foundations of mathematics collapses into a steaming
    pile of rubble.
    I didn't mean to suggest that it was incorrect, just that that particular
    surprising behavior is not related to rich comparisons. Even if the OP gets an
    __equals__() or some such, NaN will still not compare equal to NaN.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • George Sakkis at Dec 8, 2008 at 12:24 am

    On Dec 7, 6:37?pm, Steven D'Aprano <st... at REMOVE-THIS- cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 23:20:12 +0000, Steven D'Aprano wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Sorry, I should explain why.

    Given:

    x = log(-5) ?# a NaN
    y = log(-2) ?# the same NaN
    x == y ?# Some people want this to be true for NaNs.

    Then:

    # Compare x and y directly.
    log(-5) == log(-2)
    # If x == y then exp(x) == exp(y) for all x, y.
    exp(log(-5)) == exp(log(-2))
    -5 == -2

    and now the entire foundations of mathematics collapses into a steaming
    pile of rubble.
    And why doesn't this happen with the current behavior if x = y = log
    (-5) ? According to the same proof, -5 != -5.

    George
  • Steven D'Aprano at Dec 9, 2008 at 2:45 am

    On Sun, 07 Dec 2008 16:24:58 -0800, George Sakkis wrote:

    On Dec 7, 6:37?pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    ...
    Given:

    x = log(-5) ?# a NaN
    y = log(-2) ?# the same NaN
    x == y ?# Some people want this to be true for NaNs.

    Then:

    # Compare x and y directly.
    log(-5) == log(-2)
    # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) ==
    exp(log(-2))
    -5 == -2

    and now the entire foundations of mathematics collapses into a steaming
    pile of rubble.
    And why doesn't this happen with the current behavior if x = y = log
    (-5) ? According to the same proof, -5 != -5.
    You're right, I was a little sloppy in my "proof". There are additional
    subtleties going on.



    --
    Steven
  • Rhamphoryncus at Dec 8, 2008 at 6:20 pm

    On Dec 7, 4:20?pm, Steven D'Aprano <st... at REMOVE-THIS- cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. They should
    raise an exception when compared. In fact, they should raise an
    exception when *created*. But that's not what we want. What we want
    is a dummy value that silently plods through our calculations. For a
    dummy value it seems a lot more sense to pick an arbitrary yet
    consistent sort order (I suggest just above -Inf), rather than quietly
    screwing up the sort.

    Regarding the mythical IEEE 754, although it's extremely rare to find
    quotations, I have one on just this subject. And it does NOT say "x
    == NaN gives false". It says it gives *unordered*. It is C and
    probably most other languages that turn that into false (as they want
    a dummy value, not an error.)

    http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=st&q=why+does+NaN+not+equal+itself%3F&rnum=3&hl=en&pli=1
  • Robert Kern at Dec 8, 2008 at 6:54 pm

    Rhamphoryncus wrote:
    On Dec 7, 4:20 pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. They should
    raise an exception when compared. In fact, they should raise an
    exception when *created*. But that's not what we want. What we want
    is a dummy value that silently plods through our calculations. For a
    dummy value it seems a lot more sense to pick an arbitrary yet
    consistent sort order (I suggest just above -Inf), rather than quietly
    screwing up the sort.
    Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
    accommodate both requirements. Additionally, there is significant flexibility in
    trapping the signals.
    Regarding the mythical IEEE 754, although it's extremely rare to find
    quotations, I have one on just this subject. And it does NOT say "x
    == NaN gives false". It says it gives *unordered*. It is C and
    probably most other languages that turn that into false (as they want
    a dummy value, not an error.)

    http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=st&q=why+does+NaN+not+equal+itself%3F&rnum=3&hl=en&pli=1
    Table 4 on page 9 of the standard is pretty clear on the subject. When the two
    operands are unordered, the operator == returns False. The standard defines how
    to do comparisons notionally; two operands can be "greater than", "less than",
    "equal" or "unordered". It then goes on to map these notional concepts to
    programming language boolean predicates.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Rhamphoryncus at Dec 8, 2008 at 7:32 pm

    On Dec 8, 11:54?am, Robert Kern wrote:
    Rhamphoryncus wrote:
    On Dec 7, 4:20 pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. ?They should
    raise an exception when compared. ?In fact, they should raise an
    exception when *created*. ?But that's not what we want. ?What we want
    is a dummy value that silently plods through our calculations. ?For a
    dummy value it seems a lot more sense to pick an arbitrary yet
    consistent sort order (I suggest just above -Inf), rather than quietly
    screwing up the sort.
    Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
    accommodate both requirements. Additionally, there is significant flexibility in
    trapping the signals.
    Right, but most of that's lower level. By the time it reaches Python
    we only care about quiet NaNs.

    Regarding the mythical IEEE 754, although it's extremely rare to find
    quotations, I have one on just this subject. ?And it does NOT say "x
    == NaN gives false". ?It says it gives *unordered*. ?It is C and
    probably most other languages that turn that into false (as they want
    a dummy value, not an error.)
    http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
    Table 4 on page 9 of the standard is pretty clear on the subject. When the two
    operands are unordered, the operator == returns False. The standard defines how
    to do comparisons notionally; two operands can be "greater than", "less than",
    "equal" or "unordered". It then goes on to map these notional concepts to
    programming language boolean predicates.
    Ahh, interesting. Still though, does it give an explanation for such
    behaviour, or use cases? There must be some situation where blindly
    returning false is enough benefit to trump screwing up sorting.
  • Robert Kern at Dec 8, 2008 at 8:04 pm

    Rhamphoryncus wrote:
    On Dec 8, 11:54 am, Robert Kern wrote:
    Rhamphoryncus wrote:
    On Dec 7, 4:20 pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. They should
    raise an exception when compared. In fact, they should raise an
    exception when *created*. But that's not what we want. What we want
    is a dummy value that silently plods through our calculations. For a
    dummy value it seems a lot more sense to pick an arbitrary yet
    consistent sort order (I suggest just above -Inf), rather than quietly
    screwing up the sort.
    Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
    accommodate both requirements. Additionally, there is significant flexibility in
    trapping the signals.
    Right, but most of that's lower level. By the time it reaches Python
    we only care about quiet NaNs.
    No, signaling NaNs raise the exception that you are asking for. You're right
    that if you get a Python float object that is a NaN, it is probably going to be
    quiet, but signaling NaNs can affect Python in the way that you want.
    Regarding the mythical IEEE 754, although it's extremely rare to find
    quotations, I have one on just this subject. And it does NOT say "x
    == NaN gives false". It says it gives *unordered*. It is C and
    probably most other languages that turn that into false (as they want
    a dummy value, not an error.)
    http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
    Table 4 on page 9 of the standard is pretty clear on the subject. When the two
    operands are unordered, the operator == returns False. The standard defines how
    to do comparisons notionally; two operands can be "greater than", "less than",
    "equal" or "unordered". It then goes on to map these notional concepts to
    programming language boolean predicates.
    Ahh, interesting. Still though, does it give an explanation for such
    behaviour, or use cases? There must be some situation where blindly
    returning false is enough benefit to trump screwing up sorting.
    Well, the standard was written in the days of Fortran. You didn't really have
    generic sorting routines. You *could* implement whatever ordering you wanted
    because you *had* to implement the ordering yourself. You didn't have to use a
    limited boolean predicate.

    Basically, the boolean predicates have to return either True or False. Neither
    one is really satisfactory, but that's the constraint you're under.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Rhamphoryncus at Dec 8, 2008 at 9:10 pm

    On Dec 8, 1:04?pm, Robert Kern wrote:
    Rhamphoryncus wrote:
    On Dec 8, 11:54 am, Robert Kern wrote:
    Rhamphoryncus wrote:
    On Dec 7, 4:20 pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. ?They should
    raise an exception when compared. ?In fact, they should raise an
    exception when *created*. ?But that's not what we want. ?What we want
    is a dummy value that silently plods through our calculations. ?For a
    dummy value it seems a lot more sense to pick an arbitrary yet
    consistent sort order (I suggest just above -Inf), rather than quietly
    screwing up the sort.
    Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
    accommodate both requirements. Additionally, there is significant flexibility in
    trapping the signals.
    Right, but most of that's lower level. ?By the time it reaches Python
    we only care about quiet NaNs.
    No, signaling NaNs raise the exception that you are asking for. You're right
    that if you get a Python float object that is a NaN, it is probably going to be
    quiet, but signaling NaNs can affect Python in the way that you want.
    Regarding the mythical IEEE 754, although it's extremely rare to find
    quotations, I have one on just this subject. ?And it does NOT say "x
    == NaN gives false". ?It says it gives *unordered*. ?It is C and
    probably most other languages that turn that into false (as they want
    a dummy value, not an error.)
    http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
    Table 4 on page 9 of the standard is pretty clear on the subject. When the two
    operands are unordered, the operator == returns False. The standard defines how
    to do comparisons notionally; two operands can be "greater than", "less than",
    "equal" or "unordered". It then goes on to map these notional concepts to
    programming language boolean predicates.
    Ahh, interesting. ?Still though, does it give an explanation for such
    behaviour, or use cases? ?There must be some situation where blindly
    returning false is enough benefit to trump screwing up sorting.
    Well, the standard was written in the days of Fortran. You didn't really have
    generic sorting routines. You *could* implement whatever ordering you wanted
    because you *had* to implement the ordering yourself. You didn't have to use a
    limited boolean predicate.

    Basically, the boolean predicates have to return either True or False. Neither
    one is really satisfactory, but that's the constraint you're under.
    "We've always done it that way" is NOT a use case! Certainly, it's a
    factor, but it seems quite weak compared to the sort use case.

    I suppose what I'm hoping for is an small example program (one or a
    few functions) that needs the "always false" behaviour of NaN.
  • Robert Kern at Dec 8, 2008 at 9:51 pm

    Rhamphoryncus wrote:
    On Dec 8, 1:04 pm, Robert Kern wrote:
    Rhamphoryncus wrote:
    On Dec 8, 11:54 am, Robert Kern wrote:
    Rhamphoryncus wrote:
    On Dec 7, 4:20 pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with that
    one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. They should
    raise an exception when compared. In fact, they should raise an
    exception when *created*. But that's not what we want. What we want
    is a dummy value that silently plods through our calculations. For a
    dummy value it seems a lot more sense to pick an arbitrary yet
    consistent sort order (I suggest just above -Inf), rather than quietly
    screwing up the sort.
    Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
    accommodate both requirements. Additionally, there is significant flexibility in
    trapping the signals.
    Right, but most of that's lower level. By the time it reaches Python
    we only care about quiet NaNs.
    No, signaling NaNs raise the exception that you are asking for. You're right
    that if you get a Python float object that is a NaN, it is probably going to be
    quiet, but signaling NaNs can affect Python in the way that you want.
    Regarding the mythical IEEE 754, although it's extremely rare to find
    quotations, I have one on just this subject. And it does NOT say "x
    == NaN gives false". It says it gives *unordered*. It is C and
    probably most other languages that turn that into false (as they want
    a dummy value, not an error.)
    http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
    Table 4 on page 9 of the standard is pretty clear on the subject. When the two
    operands are unordered, the operator == returns False. The standard defines how
    to do comparisons notionally; two operands can be "greater than", "less than",
    "equal" or "unordered". It then goes on to map these notional concepts to
    programming language boolean predicates.
    Ahh, interesting. Still though, does it give an explanation for such
    behaviour, or use cases? There must be some situation where blindly
    returning false is enough benefit to trump screwing up sorting.
    Well, the standard was written in the days of Fortran. You didn't really have
    generic sorting routines. You *could* implement whatever ordering you wanted
    because you *had* to implement the ordering yourself. You didn't have to use a
    limited boolean predicate.

    Basically, the boolean predicates have to return either True or False. Neither
    one is really satisfactory, but that's the constraint you're under.
    "We've always done it that way" is NOT a use case! Certainly, it's a
    factor, but it seems quite weak compared to the sort use case.
    I didn't say it was. I was explaining that sorting was probably *not* a use case
    for the boolean predicates at the time of writing of the standard. In fact, it
    suggests implementing a Compare() function that returns "greater than", "less
    than", "equal" or "unordered" in addition to the boolean predicates. That Python
    eventually chose to use a generic boolean predicate as the basis of its sorting
    routine many years after the IEEE-754 standard is another matter entirely.

    In any case, the standard itself is quite short, and does not spend much time
    justifying itself in any detail.
    I suppose what I'm hoping for is an small example program (one or a
    few functions) that needs the "always false" behaviour of NaN.
    Steven D'Aprano gave one earlier in the thread. Additionally, (x!=x) is a simple
    test for NaNs if an IsNaN(x) function is not available. Really, though, the
    result falls out from the way that IEEE-754 constructed the logic of the
    system. It is not defined that (NaN==NaN) should return False, per se. Rather,
    all of the boolean predicates are defined in terms of that Compare(x,y)
    function. If that function returns "unordered", then (x==y) is False. It doesn't
    matter if one or both are NaNs; in either case, the result is "unordered".

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Rhamphoryncus at Dec 8, 2008 at 11:30 pm

    On Dec 8, 2:51?pm, Robert Kern wrote:
    Rhamphoryncus wrote:
    "We've always done it that way" is NOT a use case! ?Certainly, it's a
    factor, but it seems quite weak compared to the sort use case.
    I didn't say it was. I was explaining that sorting was probably *not* a use case
    for the boolean predicates at the time of writing of the standard. In fact, it
    suggests implementing a Compare() function that returns "greater than", "less
    than", "equal" or "unordered" in addition to the boolean predicates. That Python
    eventually chose to use a generic boolean predicate as the basis of its sorting
    routine many years after the IEEE-754 standard is another matter entirely.
    I interpret that to mean IEEE 754's semantics are for different
    circumstances and are inapplicable to Python.

    In any case, the standard itself is quite short, and does not spend much time
    justifying itself in any detail.
    A pity, as it is often invoked to explain language design.

    I suppose what I'm hoping for is an small example program (one or a
    few functions) that needs the "always false" behaviour of NaN.
    Steven D'Aprano gave one earlier in the thread.
    I see examples of behaviour, but no use cases.

    Additionally, (x!=x) is a simple
    test for NaNs if an IsNaN(x) function is not available.
    That's a trick to work around the lack of IsNaN(x). Again, not a use
    case.

    Really, though, the
    result falls out from the way that IEEE-754 constructed the logic of the
    system. It is not defined that (NaN==NaN) should return False, per se. Rather,
    all of the boolean predicates are defined in terms of that Compare(x,y)
    function. If that function returns "unordered", then (x==y) is False. It doesn't
    matter if one or both are NaNs; in either case, the result is "unordered".
    And if I arbitrarily dictate that NaN is a single value which is
    orderable, sorting just above -Infinity, then all the behaviour makes
    a lot more sense AND I fix sort.

    So you see the predicament I'm in. On the one hand we have a problem
    and an obvious solution. On the other hand we've got historical
    behaviour which everybody insists *must* remain, reasons unknown. It
    reeks of the Parable of the Monkeys.

    I think I should head over to one of the math groups and see if they
    can find a reason for it.
  • Steven D'Aprano at Dec 9, 2008 at 2:44 am

    On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote:

    On Dec 7, 4:20?pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as
    these examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with
    that one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. They should raise
    an exception when compared. In fact, they should raise an exception
    when *created*. But that's not what we want. What we want is a dummy
    value that silently plods through our calculations. For a dummy value
    it seems a lot more sense to pick an arbitrary yet consistent sort order
    (I suggest just above -Inf), rather than quietly screwing up the sort.

    Regarding the mythical IEEE 754,
    It's hardly mythical.

    http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumberF10933

    although it's extremely rare to find
    quotations, I have one on just this subject. And it does NOT say "x => NaN gives false". It says it gives *unordered*.

    Unordered means that none of the following is true:

    x > NaN
    x < NaN
    x == NaN


    It doesn't mean that comparing a NaN with something else is an error.


    --
    Steven
  • Rhamphoryncus at Dec 9, 2008 at 4:44 am

    On Dec 8, 7:44?pm, Steven D'Aprano wrote:
    On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote:
    On Dec 7, 4:20?pm, Steven D'Aprano <st... at REMOVE-THIS-
    cybersource.com.au> wrote:
    On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as
    these examples show.
    x = float('NaN')
    x == x
    False
    Blame IEEE for that one. Rich comparisons have nothing to do with
    that one.
    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Mathematically, NaNs shouldn't be comparable at all. ?They should raise
    an exception when compared. ?In fact, they should raise an exception
    when *created*. ?But that's not what we want. ?What we want is a dummy
    value that silently plods through our calculations. ?For a dummy value
    it seems a lot more sense to pick an arbitrary yet consistent sort order
    (I suggest just above -Inf), rather than quietly screwing up the sort.
    Regarding the mythical IEEE 754,
    It's hardly mythical.

    http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumberF10933
    I consider it to be mythical because most knowledge of it is
    indirect. Few who use floating point have the documents available to
    them. Requiring purchase/membership is the cause of this.

    although it's extremely rare to find
    quotations, I have one on just this subject. ?And it does NOT say "x => > NaN gives false". ?It says it gives *unordered*.
    Unordered means that none of the following is true:

    x > NaN
    x < NaN
    x == NaN

    It doesn't mean that comparing a NaN with something else is an error.
    Robert Kern already clarified that. My confusion was due to relying
    on second-hand knowledge.
  • Mark Wooding at Jan 6, 2009 at 12:55 am

    Steven D'Aprano wrote:

    There is nothing to blame them for. This is the correct behaviour. NaNs
    should *not* compare equal to themselves, that's mathematically
    incoherent.
    Indeed. The problem is a paucity of equality predicates. This is
    hardly surprising: Common Lisp has four general-purpose equality
    predicates (EQ, EQL, EQUAL and EQUALP), and many more type-specific ones
    (=, STRING=, STRING-EQUAL (yes, I know...), CHAR=, ...), and still
    doesn't really have enough. For example, EQUAL compares strings
    case-sensitively, but other arrays are compared by address; EQUALP will
    recurse into arbitrary arrays, but compares strings
    case-insensitively...

    For the purposes of this discussion, however, it has enough to be able
    to distinguish between

    * numerical comparisons, which (as you explain later) should /not/
    claim that two NaNs are equal, and

    * object comparisons, which clearly must declare an object equal to
    itself.

    For example, I had the following edifying conversation with SBCL.

    CL-USER> ;; Return NaNs rather than signalling errors.
    (sb-int:set-floating-point-modes :traps nil)
    ; No value
    CL-USER> (defconstant nan (/ 0.0 0.0))
    NAN
    CL-USER> (loop for func in '(eql equal equalp =)
    collect (list func (funcall func nan nan)))
    ((EQL T) (EQUAL T) (EQUALP T) (= NIL))
    CL-USER>

    That is, a NaN is EQL, EQUAL and EQUALP to itself, but not = to itself.
    (Due to the vagaries of EQ, a NaN might or might not be EQ to itself or
    other NaNs.)

    Python has a much more limited selection of equality predicates -- in
    fact, just == and is. The is operator is Python's equivalent of Lisp's
    EQ predicate: it compares objects by address. I can have a similar chat
    with Python.

    In [12]: nan = float('nan')

    In [13]: nan is nan
    Out[13]: True

    In [14]: nan == nan
    Out[14]: False

    In [16]: nan is float('nan')
    Out[16]: False

    Python numbers are the same as themselves reliably, unlike in Lisp. But
    there's no sensible way of asking whether something is `basically the
    same as' nan, like Lisp's EQL or EQUAL. I agree that the primary
    equality predicate for numbers must be the numerical comparison, and
    NaNs can't (sensibly) be numerically equal to themselves.

    Address comparisons are great when you're dealing with singletons, or
    when you carefully intern your objects. In other cases, you're left
    with ==. This puts a great deal of responsibility on the programmer of
    an == method to weigh carefully the potentially conflicting demands of
    compatibility (many other libraries just expect == to be an equality
    operator returning a straightforward truth value, and given that there
    isn't a separate dedicated equality operator, this isn't unreasonable),
    and doing something more domain-specifically useful.

    It's worth pointing out that numpy isn't unique in having == not return
    a straightforward truth value. The SAGE computer algebra system (and
    sympy, I believe) implement the == operator on algebraic formulae so as
    to construct equations. For example, the following is syntactically and
    semantically Python, with fancy libraries.

    sage: var('x') # x is now a variable
    x
    sage: solve(x**2 + 2*x - 4 == 1)
    [x == -sqrt(6) - 1, x == sqrt(6) - 1]

    (SAGE has some syntactic tweaks, such as ^ meaning the same as **, but I
    didn't use them.)

    I think this is an excellent use of the == operator -- but it does have
    some potential to interfere with other libraries which make assumptions
    about how == behaves. The SAGE developers have been clever here,
    though:

    sage: 2*x + 1 == (2 + 4*x)/2
    2*x + 1 == (4*x + 2)/2
    sage: bool(2*x + 1 == (2 + 4*x)/2)
    True
    sage: bool(2*x + 1 == (2 + 4*x)/3)
    False

    I think Python manages surprisingly well with its limited equality
    predicates. But the keyword there is `surprisingly' -- and it may not
    continue this trick forever.

    -- [mdw]
  • Rasmus Fogh at Dec 7, 2008 at 1:03 pm

    Jamed Stroud Wrote:
    Rasmus Fogh wrote:
    Dear All,
    For the first time I have come across a Python feature that seems
    completely wrong. After the introduction of rich comparisons, equality
    comparison does not have to return a truth value, and may indeed return
    nothing at all and throw an error instead. As a result, code like
    if foo == bar:
    or
    foo in alist
    cannot be relied on to work.
    This is clearly no accident. According to the documentation all
    comparison operators are allowed to return non-booleans, or to throw
    errors. There is
    explicitly no guarantee that x == x is True.
    I'm not a computer scientist, so my language and perspective on the
    topic may be a bit naive, but I'll try to demonstrate my caveman
    understanding example.
    First, here is why the ability to throw an error is a feature:
    class Apple(object):
    def __init__(self, appleness):
    self.appleness = appleness
    def __cmp__(self, other):
    assert isinstance(other, Apple), 'must compare apples to apples'
    return cmp(self.appleness, other.appleness)
    class Orange(object): pass
    Apple(42) == Orange()
    True, but that does not hold for __eq__, only for __cmp__, and
    for__gt__, __le__, etc.
    Consider:

    Class Apple(object):
    def __init__(self, appleness):
    self.appleness = appleness
    def __gt__(self, other):
    assert isinstance(other, Apple), 'must compare apples to apples'
    return (self.appleness > other.appleness)
    def __eq__(self, other):
    if isinstance(other, Apple):
    return (self.appleness == other.appleness)
    else:
    return False
    Second, consider that any value in python also evaluates to a truth
    value in boolean context.

    Third, every function returns something. A function's returning nothing
    is not a possibility in the python language. None is something but
    evaluates to False in boolean context.
    Indeed. The requirement would be not that return_value was a boolean, but
    that bool(return_value) was defined and gave the correct result. I
    understand that in some old Numeric/numpy version the numpy array __eq__
    function returned a non-empty array, so that
    bool(numarray1 == numarray2)
    was true for any pair of arguments, which is one way of breaking '=='.
    In current numpy, even
    bool(numarray1 == 1)
    throws an error, which is another way of breaking '=='.
    But surely you can define an equal/unequal classification for all
    types of object, if you want to?
    This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
    Even in the realm of pure mathematics, the generality of objects (i.e.
    numbers) can not be assumed.
    It sounds like that problem is simpler in computing. sqrt(32) evaluates to
    5.6568542494923806 on my computer. A complex number c with non-zero
    imaginary part would be unequal to sqrt(32) even if it so happened that
    c*c=2.

    Yours,

    Rasmus

    ---------------------------------------------------------------------------
    Dr. Rasmus H. Fogh Email: r.h.fogh at bioc.cam.ac.uk
    Dept. of Biochemistry, University of Cambridge,
    80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
  • Steven D'Aprano at Dec 7, 2008 at 2:44 pm

    On Sun, 07 Dec 2008 13:03:43 +0000, Rasmus Fogh wrote:

    Jamed Stroud Wrote:
    ...
    Second, consider that any value in python also evaluates to a truth
    value in boolean context.
    But bool(x) can fail too. So not every object in Python can be
    interpreted as a truth value.

    Third, every function returns something.
    Unless it doesn't return at all.

    A function's returning nothing
    is not a possibility in the python language. None is something but
    evaluates to False in boolean context.
    Indeed. The requirement would be not that return_value was a boolean,
    but that bool(return_value) was defined and gave the correct result.
    If __bool__ or __nonzero__ raises an exception, you would like Python to
    ignore the exception and return True or False. Which should it be? How do
    you know what the correct result should be?
    From the Zen of Python:
    "In the face of ambiguity, refuse the temptation to guess."


    All binary operators are ambiguous when dealing with vector or array
    operands. Should the operator operate on the array as a whole, or on each
    element? The numpy people have decided that element-wise equality testing
    is more useful for them, and this is their prerogative to do so. In fact,
    the move to rich comparisons was driven by the needs of numpy.

    http://www.python.org/dev/peps/pep-0207/

    It is a *VERY* important third-party library, and this was not the first
    and probably won't be the last time that their needs will move into
    Python the language.

    Python encourages such domain-specific behaviour. In fact, that's what
    operator-overloading is all about: classes can define what any operator
    means for *them*. There's no requirement that the infinity of potential
    classes must all define operators in a mutually compatible fashion, not
    even for comparison operators.

    For example, consider a class implementing one particular version of
    three-value logic. It isn't enough for == to only return True or False,
    because you also need Maybe:

    True == False => returns False
    True == True => returns True
    True == Maybe => returns Maybe
    etc.

    Or consider fuzzy logic, where instead of two truth values, you have a
    continuum of truth values between 0.0 and 1.0. What should comparing two
    such fuzzy values for equality return? A boolean True/False? Another
    fuzzy value?


    Another one from the Zen:

    "Special cases aren't special enough to break the rules."

    The rules are that classes can customize their behaviour, that methods
    can fail, and that Python should not try to guess what the correct value
    should have been in the event of such a failure. Equality is a special
    case, but it isn't so special that it needs to be an exception from those
    rules.

    If you really need a guaranteed-can't-fail[1] equality test, try
    something like this untested wrapper class:

    class EqualityWrapper(object):
    def __init__(self, obj):
    self.wrapped = obj
    def __eq__(self, other):
    try:
    return bool(self.wrapped == other)
    except Exception:
    return False # or maybe True?

    Now wrap all your data:

    data = [a list of arbitrary objects]
    data = map(EqualityWrapper, data)
    process(data)




    [1] Not a guarantee.

    --
    Steven
  • Rasmus Fogh at Dec 7, 2008 at 4:23 pm

    On Sun, 07 Dec 2008 13:03:43 +0000, Rasmus Fogh wrote:
    Jamed Stroud Wrote:
    ...
    Second, consider that any value in python also evaluates to a truth
    value in boolean context.
    But bool(x) can fail too. So not every object in Python can be
    interpreted as a truth value.
    Third, every function returns something.
    Unless it doesn't return at all.
    A function's returning nothing
    is not a possibility in the python language. None is something but
    evaluates to False in boolean context.
    Indeed. The requirement would be not that return_value was a boolean,
    but that bool(return_value) was defined and gave the correct result.
    If __bool__ or __nonzero__ raises an exception, you would like Python to
    ignore the exception and return True or False. Which should it be? How
    do you know what the correct result should be?
    From the Zen of Python:
    "In the face of ambiguity, refuse the temptation to guess."
    All binary operators are ambiguous when dealing with vector or array
    operands. Should the operator operate on the array as a whole, or on
    each element? The numpy people have decided that element-wise equality
    testing is more useful for them, and this is their prerogative to do so.
    In fact, the move to rich comparisons was driven by the needs of numpy.
    http://www.python.org/dev/peps/pep-0207/
    It is a *VERY* important third-party library, and this was not the first
    and probably won't be the last time that their needs will move into
    Python the language.
    Python encourages such domain-specific behaviour. In fact, that's what
    operator-overloading is all about: classes can define what any operator
    means for *them*. There's no requirement that the infinity of potential
    classes must all define operators in a mutually compatible fashion, not
    even for comparison operators.
    For example, consider a class implementing one particular version of
    three-value logic. It isn't enough for == to only return True or False,
    because you also need Maybe:
    True == False => returns False
    True == True => returns True
    True == Maybe => returns Maybe
    etc.
    Or consider fuzzy logic, where instead of two truth values, you have a
    continuum of truth values between 0.0 and 1.0. What should comparing two
    such fuzzy values for equality return? A boolean True/False? Another
    fuzzy value?
    Another one from the Zen:
    "Special cases aren't special enough to break the rules."
    The rules are that classes can customize their behaviour, that methods
    can fail, and that Python should not try to guess what the correct value
    should have been in the event of such a failure. Equality is a special
    case, but it isn't so special that it needs to be an exception from
    those rules.
    If you really need a guaranteed-can't-fail[1] equality test, try
    something like this untested wrapper class:
    class EqualityWrapper(object):
    def __init__(self, obj):
    self.wrapped = obj
    def __eq__(self, other):
    try:
    return bool(self.wrapped == other)
    except Exception:
    return False # or maybe True?
    Now wrap all your data:
    data = [a list of arbitrary objects]
    data = map(EqualityWrapper, data)
    process(data)
    [1] Not a guarantee.
    Well, lots to think about.

    Just to keep you from shooting at straw men:

    I would have liked it to be part of the design contract (a convention, if
    you like) that
    1) bool(x == y) should return a boolean and never throw an error
    2) x == x return True

    I do *not* say that bool(x) should never throw an error.
    I do *not* say that Python should guess a return value if an __eq__
    function throws an error, only that it should have been considered a bug,
    or at least bad form, for __eq__ functions to do so.

    What might be a sensible behaviour (unlike your proposed wrapper) would be
    the following:

    def eq(x, y):
    if x is y:
    return True
    else:
    try:
    return (x == y)
    except Exception:
    return False

    If is is possible to change the language, how about having two
    diferent functions, one for overloading the '==' operator, and another
    for testing list and set membership, dictionary key identity, etc.?
    For instance like this
    - Add a new function __equals__; x.__equals__(y) could default to
    bool(x.__eq__(y))
    - Estalish by convention that x.__equals__(y) must return a boolean and
    may not intentionally throw an error.
    - Establish by convention that 'x is y' implies 'x.__equals__(y)'
    in the sense that (not (x is y and not x.__equals__(y)) must always hold
    - Have the Python data structures call __equals__ when they want to
    compare objects internally (e.g. for 'x in alist', 'x in adict',
    'set(alist)', etc.
    - Provide an equals(x,y) built-in that calls the __equals__ function
    - numpy and others who (mis)use '==' for their own purposes could use
    def __equals__(self, other): return (self is other)


    For the float NaN case it looks like things are already behaving like
    this. For numpy objects you would not lose anything, since
    'numpyArray in alist' is broken anyway.

    I still think it is a bad choice that numpy got to write
    array1 == array2
    for their purposes, while everybody else has to use
    if equals(x, y):
    but at least both sides could get the behaviour they want.

    Yours,

    Rasmus

    ---------------------------------------------------------------------------
    Dr. Rasmus H. Fogh Email: r.h.fogh at bioc.cam.ac.uk
    Dept. of Biochemistry, University of Cambridge,
    80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
  • Mark Dickinson at Dec 7, 2008 at 6:14 pm

    On Dec 7, 4:23?pm, Rasmus Fogh wrote:

    If is is possible to change the language, how about having two
    diferent functions, one for overloading the '==' operator, and another
    for testing list and set membership, dictionary key identity, etc.?
    I've often thought that this would have made a lot of sense too,
    though
    I'd probably choose to spell the well-behaved structural equality "=="
    and the flexible numeric equality "eq" (a la Fortran). Hey, we could
    have *six* new keywords: eq, ne, le, lt, ge, gt!

    See the recent (September?) thread "Comparing float and decimal"
    for some of the fun that results from lack of transitivity of
    equality.

    But I think there's essentially no chance of Python changing to
    support this. And even if there were, Python's conflation of
    structural equality with numeric equality brings significant
    benefits in terms of readability of code, ease of learning,
    and general friendliness; it's only really troublesome in
    a few corner cases. Is the tradeoff worth it?

    So for me, this comes down to a case of 'practicality beats purity'.

    Mark
  • James Stroud at Dec 7, 2008 at 9:57 pm

    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Perhaps this should raise an exception? I think the problem is not with
    comparisons in general but with the fact that nan is type float:

    py> type(float('NaN'))
    <type 'float'>

    No float can be equal to nan, but nan is a float. How can something be
    not a number and a float at the same time? The illogicality of nan's
    type creates the possibility for the illogical results of comparisons to
    nan including comparing nan to itself.
    ll = [x]
    x in ll
    True
    x == ll[0]
    False
    But there is consistency on the basis of identity which is the test for
    containment (in):

    py> x is x
    True
    py> x in [x]
    True

    Identity and equality are two different concepts. Comparing identity to
    equality is like comparing apples to oranges ;o)
    import numpy
    y = numpy.zeros((3,))
    y
    array([ 0., 0., 0.])
    bool(y==y)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    But the equality test is not what fails here. It's the cast to bool that
    fails, which for numpy works like a unary ufunc. The designers of numpy
    thought that this would be a more desirable behavior. The test for
    equality likewise is a binary ufunc and the behavior was chosen in numpy
    for practical reasons. I don't know if you can overload the == operator
    in C, but if you can, you would be able to achieve the same behavior.
    ll1 = [y,1]
    y in ll1
    True
    ll2 = [1,y]
    y in ll2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    I think you could be safe calling this a bug with numpy. But the fact
    that someone can create a bug with a language is not a condemnation of
    the language. For example, C makes it real easy to crash a program by
    overrunning the limits of an array, but no one would suggest to remove
    arrays from C.
    Can anybody see a way this could be fixed (please)? I may well have to
    live with it, but I would really prefer not to.
    Your only hope is to somehow convince the language designers to remove
    the ability to overload == then get them to agree on what you think the
    proper behavior should be for comparisons. I think the probability of
    that happening is about zero, though, because such a change would run
    counter to the dynamic nature of the language.

    James


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
  • James Stroud at Dec 7, 2008 at 10:12 pm

    James Stroud wrote:
    [cast to bool] for numpy works like a unary ufunc.
    Scratch that. Not thinking and typing at same time.


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com
  • Acerimusdux at Dec 7, 2008 at 11:33 pm

    James Stroud wrote:
    <div class="moz-text-flowed" style="font-family: -moz-fixed">Rasmus
    Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Perhaps this should raise an exception? I think the problem is not
    with comparisons in general but with the fact that nan is type float:

    py> type(float('NaN'))
    <type 'float'>

    No float can be equal to nan, but nan is a float. How can something be
    not a number and a float at the same time? The illogicality of nan's
    type creates the possibility for the illogical results of comparisons
    to nan including comparing nan to itself.
    I initially thought that looked like a bug to me. But, this is
    apparently standard behavior required for "NaN". I'm only using
    Wikipedia as a reference here, but about 80% of the way down, under
    "standard operations":
    http://en.wikipedia.org/wiki/IEEE_754-1985

    "Comparison operations. NaN is treated specially in that NaN=NaN always
    returns false."

    Presumably since floating point calculations return "NaN" for some
    operations, and one "Nan" is usually not equal to another, this is the
    required behavior. So not a Python issue (though understandably a bit
    confusing).

    The array issue seems to be with one 3rd party library, and one can
    choose to use or not use their library, to ask them to change it, or
    even to decide to override their == operator, if one doesn't like the
    way it is designed.
  • Steven D'Aprano at Dec 7, 2008 at 11:51 pm

    On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

    Rasmus Fogh wrote:
    Current behaviour is both inconsistent and counterintuitive, as these
    examples show.
    x = float('NaN')
    x == x
    False
    Perhaps this should raise an exception?
    Why on earth would you want checking equality on NaN to raise an
    exception??? What benefit does it give?

    I think the problem is not with
    comparisons in general but with the fact that nan is type float:

    py> type(float('NaN'))
    <type 'float'>

    No float can be equal to nan, but nan is a float. How can something be
    not a number and a float at the same time?
    Because floats are not real numbers. They are *almost* numbers, they
    often (but not always) behave like numbers, but they're actually not
    numbers.

    The difference is subtle enough that it is easy to forget that floats are
    not numbers, but it's easy enough to find examples proving it:

    Some perfectly good numbers don't exist as floats:
    2**-10000 == 0.0
    True

    Try as you might, you can't get the number 0.1 *exactly* as a float:
    0.1
    0.10000000000000001


    For any numbers x and y not equal to zero, x+y != x. But that fails for
    floats:
    1001.0 + 1e99 == 1e99
    True

    The above is because of overflow. But even avoiding overflow doesn't
    solve the problem. With a little effort, you can also find examples of
    "ordinary sized" floats where (x+y)-y != x.
    0.9+0.1-0.9 == 0.1
    False


    import numpy
    y = numpy.zeros((3,))
    y
    array([ 0., 0., 0.])
    bool(y==y)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    But the equality test is not what fails here. It's the cast to bool that
    fails
    And it is right to do so, because it is ambiguous and the library
    designers rightly avoided the temptation of guessing what result is
    needed.

    ll1 = [y,1]
    y in ll1
    True
    ll2 = [1,y]
    y in ll2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    I think you could be safe calling this a bug with numpy.
    Only in the sense that there are special cases where the array elements
    are all true, or all false, and numpy *could* safely return a bool. But
    special cases are not special enough to break the rules. Better for the
    numpy caller to write this:

    a.all() # or any()

    instead of:

    try:
    bool(a)
    except ValueError:
    a.all()

    as they would need to do if numpy sometimes returned a bool and sometimes
    raised an exception.



    --
    Steven
  • James Stroud at Dec 8, 2008 at 12:39 am

    Steven D'Aprano wrote:
    On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

    Rasmus Fogh wrote:
    ll1 = [y,1]
    y in ll1
    True
    ll2 = [1,y]
    y in ll2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    I think you could be safe calling this a bug with numpy.
    Only in the sense that there are special cases where the array elements
    are all true, or all false, and numpy *could* safely return a bool. But
    special cases are not special enough to break the rules. Better for the
    numpy caller to write this:

    a.all() # or any()

    instead of:

    try:
    bool(a)
    except ValueError:
    a.all()

    as they would need to do if numpy sometimes returned a bool and sometimes
    raised an exception.
    I'm missing how a.all() solves the problem Rasmus describes, namely that
    the order of a python *list* affects the results of containment tests by
    numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
    results in his example. It still seems like a bug in numpy to me, even
    if too much other stuff is broken if you fix it (in which case it
    apparently becomes an "issue").

    James
  • Robert Kern at Dec 8, 2008 at 2:21 am

    James Stroud wrote:
    Steven D'Aprano wrote:
    On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

    Rasmus Fogh wrote:
    ll1 = [y,1]
    y in ll1
    True
    ll2 = [1,y]
    y in ll2
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is
    ambiguous. Use a.any() or a.all()
    I think you could be safe calling this a bug with numpy.
    Only in the sense that there are special cases where the array
    elements are all true, or all false, and numpy *could* safely return a
    bool. But special cases are not special enough to break the rules.
    Better for the numpy caller to write this:

    a.all() # or any()

    instead of:

    try:
    bool(a)
    except ValueError:
    a.all()

    as they would need to do if numpy sometimes returned a bool and
    sometimes raised an exception.
    I'm missing how a.all() solves the problem Rasmus describes, namely that
    the order of a python *list* affects the results of containment tests by
    numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
    results in his example. It still seems like a bug in numpy to me, even
    if too much other stuff is broken if you fix it (in which case it
    apparently becomes an "issue").
    It's an issue, if anything, not a bug. There is no consistent implementation of
    bool(some_array) that works in all cases. numpy's predecessor Numeric used to
    implement this as returning True if at least one element was non-zero. This
    works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not
    work well for bool(x==y) (which should be (x==y).all()), but many people got
    confused and thought that bool(x==y) worked. When we made numpy, we decided to
    explicitly not allow bool(some_array) so that people will not write buggy code
    like this again.

    The deficiency is in the feature of rich comparisons, not numpy's implementation
    of it. __eq__() is allowed to return non-booleans; however, there are some parts
    of Python's implementation like list.__contains__() that still expect the return
    value of __eq__() to be meaningfully cast to a boolean.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • Luis Zarrabeitia at Dec 10, 2008 at 10:58 pm

    On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
    The deficiency is in the feature of rich comparisons, not numpy's
    implementation of it. __eq__() is allowed to return non-booleans; however,
    there are some parts of Python's implementation like list.__contains__()
    that still expect the return value of __eq__() to be meaningfully cast to a
    boolean.
    list.__contains__, tuple.__contains__, the 'if' keyword...

    How do can you suggest to fix the list.__contains__ implementation?

    Should I wrap all my "if"s with this?:

    if isinstance(a, numpy.array) or isisntance(b,numpy.array):
    res = compare_numpy(a,b)
    elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
    res = compare_someotherclass(a,b)
    ...
    else:
    res = (a == b)
    if res:
    # do whatever

    --
    Luis Zarrabeitia (aka Kyrie)
    Fac. de Matem?tica y Computaci?n, UH.
    http://profesores.matcom.uh.cu/~kyrie
  • Steven D'Aprano at Dec 11, 2008 at 8:10 am

    On Wed, 10 Dec 2008 17:58:49 -0500, Luis Zarrabeitia wrote:
    On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
    The deficiency is in the feature of rich comparisons, not numpy's
    implementation of it. __eq__() is allowed to return non-booleans;
    however, there are some parts of Python's implementation like
    list.__contains__() that still expect the return value of __eq__() to
    be meaningfully cast to a boolean.
    list.__contains__, tuple.__contains__, the 'if' keyword...

    How do can you suggest to fix the list.__contains__ implementation?

    I suggest you don't, because I don't think it's broken. I think it's
    working as designed. It doesn't succeed with arbitrary data types which
    may be broken, buggy or incompatible with __contain__'s design, but
    that's okay, it's not supposed to.

    Should I wrap all my "if"s with this?:

    if isinstance(a, numpy.array) or isisntance(b,numpy.array):
    res = compare_numpy(a,b)
    elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
    res = compare_someotherclass(a,b)
    ...
    else:
    res = (a == b)
    if res:
    # do whatever
    No, inlining that code everywhere you have an if would be stupid. What
    you should do is write a single function equals(x, y) that does precisely
    what you want it to do, in whatever way you want, and then call it:

    if equals(a, b):

    Or, put your data inside a wrapper. If you read back over my earlier
    posts in this thread, I suggested a lightweight wrapper class you could
    use. You could make it even more useful by using delegation to make the
    wrapped class behave *exactly* like the original, except for __eq__.

    You don't even need to wrap every single item:

    def wrap_or_not(obj):
    if obj in list_of_bad_types_i_know_about:
    return EqualityWrapper(obj)
    return obj

    data = [1, 2, 3, BadData, 4]
    data = map(wrap_or_not, data)



    It isn't really that hard to deal with these things, once you give up the
    illusion that your code should automatically work with arbitrarily wacky
    data types that you don't control.


    --
    Steven
  • James Stroud at Dec 8, 2008 at 3:36 am

    Robert Kern wrote:
    James Stroud wrote:
    I'm missing how a.all() solves the problem Rasmus describes, namely
    that the order of a python *list* affects the results of containment
    tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
    different results in his example. It still seems like a bug in numpy
    to me, even if too much other stuff is broken if you fix it (in which
    case it apparently becomes an "issue").
    It's an issue, if anything, not a bug. There is no consistent
    implementation of bool(some_array) that works in all cases. numpy's
    predecessor Numeric used to implement this as returning True if at least
    one element was non-zero. This works well for bool(x!=y) (which is
    equivalent to (x!=y).any()) but does not work well for bool(x==y) (which
    should be (x==y).all()), but many people got confused and thought that
    bool(x==y) worked. When we made numpy, we decided to explicitly not
    allow bool(some_array) so that people will not write buggy code like
    this again.

    The deficiency is in the feature of rich comparisons, not numpy's
    implementation of it. __eq__() is allowed to return non-booleans;
    however, there are some parts of Python's implementation like
    list.__contains__() that still expect the return value of __eq__() to be
    meaningfully cast to a boolean.
    You have explained

    py> 112 = [1, y]
    py> y in 112
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is...

    but not

    py> ll1 = [y,1]
    py> y in ll1
    True

    It's this discrepancy that seems like a bug, not that a ValueError is
    raised in the former case, which is perfectly reasonable to me.


    All I can imagine is that something like the following lives in the
    bowels of the python code for list:

    def __contains__(self, other):
    foundit = False
    for i, v in enumerate(self):
    if i == 0:
    # evaluates to bool numpy array
    foundit = one_kind_of_test(v, other)
    else:
    # raises exception for numpy array
    foundit = another_kind_of_test(v, other)
    if foundit:
    break
    return foundit

    I'm trying to imagine some other way to get the results mentioned but I
    honestly can't. It's beyond me why someone would do such a thing, but
    perhaps it's an optimization of some sort.

    James
  • Robert Kern at Dec 8, 2008 at 5:04 am

    James Stroud wrote:
    Robert Kern wrote:
    James Stroud wrote:
    I'm missing how a.all() solves the problem Rasmus describes, namely
    that the order of a python *list* affects the results of containment
    tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to
    different results in his example. It still seems like a bug in numpy
    to me, even if too much other stuff is broken if you fix it (in which
    case it apparently becomes an "issue").
    It's an issue, if anything, not a bug. There is no consistent
    implementation of bool(some_array) that works in all cases. numpy's
    predecessor Numeric used to implement this as returning True if at
    least one element was non-zero. This works well for bool(x!=y) (which
    is equivalent to (x!=y).any()) but does not work well for bool(x==y)
    (which should be (x==y).all()), but many people got confused and
    thought that bool(x==y) worked. When we made numpy, we decided to
    explicitly not allow bool(some_array) so that people will not write
    buggy code like this again.

    The deficiency is in the feature of rich comparisons, not numpy's
    implementation of it. __eq__() is allowed to return non-booleans;
    however, there are some parts of Python's implementation like
    list.__contains__() that still expect the return value of __eq__() to
    be meaningfully cast to a boolean.
    You have explained

    py> 112 = [1, y]
    py> y in 112
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is...

    but not

    py> ll1 = [y,1]
    py> y in ll1
    True

    It's this discrepancy that seems like a bug, not that a ValueError is
    raised in the former case, which is perfectly reasonable to me.
    Nothing to do with numpy. list.__contains__() checks for identity with "is"
    before it goes to __eq__().

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • James Stroud at Dec 8, 2008 at 6:05 am

    Robert Kern wrote:
    James Stroud wrote:
    py> 112 = [1, y]
    py> y in 112
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is...

    but not

    py> ll1 = [y,1]
    py> y in ll1
    True

    It's this discrepancy that seems like a bug, not that a ValueError is
    raised in the former case, which is perfectly reasonable to me.
    Nothing to do with numpy. list.__contains__() checks for identity with
    "is" before it goes to __eq__().
    ...but only for the first element of the list:

    py> import numpy
    py> y = numpy.array([1,2,3])
    py> y
    array([1, 2, 3])
    py> y in [1, y]
    ------------------------------------------------------------
    Traceback (most recent call last):
    File "<ipython console>", line 1, in <module>
    <type 'exceptions.ValueError'>: The truth value of an array with more
    than one element is ambiguous. Use a.any() or a.all()
    py> y is [1, y][1]
    True

    I think it skips straight to __eq__ if the element is not the first in
    the list. That no one acknowledges this makes me feel like a conspiracy
    is afoot.
  • Robert Kern at Dec 8, 2008 at 7:13 am

    James Stroud wrote:
    Robert Kern wrote:
    James Stroud wrote:
    py> 112 = [1, y]
    py> y in 112
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: The truth value of an array with more than one element is...

    but not

    py> ll1 = [y,1]
    py> y in ll1
    True

    It's this discrepancy that seems like a bug, not that a ValueError is
    raised in the former case, which is perfectly reasonable to me.
    Nothing to do with numpy. list.__contains__() checks for identity with
    "is" before it goes to __eq__().
    ...but only for the first element of the list:

    py> import numpy
    py> y = numpy.array([1,2,3])
    py> y
    array([1, 2, 3])
    py> y in [1, y]
    ------------------------------------------------------------
    Traceback (most recent call last):
    File "<ipython console>", line 1, in <module>
    <type 'exceptions.ValueError'>: The truth value of an array with more
    than one element is ambiguous. Use a.any() or a.all()
    py> y is [1, y][1]
    True

    I think it skips straight to __eq__ if the element is not the first in
    the list.
    No, it doesn't skip straight to __eq__(). "y is 1" returns False, so (y==1) is
    checked. When y is a numpy array, this returns an array of bools.
    list.__contains__() tries to convert this array to a bool and
    ndarray.__nonzero__() raises the exception.

    list.__contains__() checks "is" then __eq__() for each element before moving on
    to the next element. It does not try "is" for all elements, then try __eq__()
    for all elements.
    That no one acknowledges this makes me feel like a conspiracy
    is afoot.
    I don't know what you think I'm not acknowledging.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
  • James Stroud at Dec 8, 2008 at 8:10 am

    Robert Kern wrote:
    James Stroud wrote:
    I think it skips straight to __eq__ if the element is not the first in
    the list.
    No, it doesn't skip straight to __eq__(). "y is 1" returns False, so
    (y==1) is checked. When y is a numpy array, this returns an array of
    bools. list.__contains__() tries to convert this array to a bool and
    ndarray.__nonzero__() raises the exception.

    list.__contains__() checks "is" then __eq__() for each element before
    moving on to the next element. It does not try "is" for all elements,
    then try __eq__() for all elements.
    Ok. Thanks for the explanation.
    That no one acknowledges this makes me feel like a conspiracy
    is afoot.
    I don't know what you think I'm not acknowledging.
    Sorry. That was a failed attempt at humor.

    James
  • Steven D'Aprano at Dec 8, 2008 at 3:39 am

    On Sun, 07 Dec 2008 16:23:59 +0000, Rasmus Fogh wrote:

    Just to keep you from shooting at straw men:

    I would have liked it to be part of the design contract (a convention,
    if you like) that
    1) bool(x == y) should return a boolean and never throw an error

    Can't be done without making bool a "magic function". If x==y raises an
    exception, bool() won't even be called. The only way around that would be
    for the Python compiler to recognise bool(x=y) and perform special magic.

    What if you did this?

    trueorfalse = bool # I don't like George Boole
    trueoffalse( [x][0].__class__.__getattr__('__dict__')['__eq__'](y) )


    Should that have special magic performed too? Just how much work must the
    compiler put in to special-casing bool?


    2) x == x return True
    Which goes against the IEEE 754 floating-point standard.

    http://grouper.ieee.org/groups/754/

    Python used to optimize x==x and always return True. This was removed
    because it caused problems.


    I do *not* say that bool(x) should never throw an error. I do *not* say
    that Python should guess a return value if an __eq__ function throws an
    error,
    But to get what you want, the above is implied.

    I suppose, just barely, that you could avoid making bool() magic and just
    make if magic. When the compiler sees "if expr": it could swallow all
    exceptions inside expr and force it to evaluate to True or False. (How?
    By guessing? Randomly?) This would cause many problems, but it could be
    done, and much easier than ensuring that bool(x) always succeeds.

    only that it should have been considered a bug, or at least bad
    form, for __eq__ functions to do so.

    It's certainly *unusual* for comparisons to return non-bools, but it's
    not bad form.

    What might be a sensible behaviour (unlike your proposed wrapper)
    What do you dislike about my wrapper class? Perhaps it is fixable.


    would be the following:

    def eq(x, y):
    if x is y:
    return True
    I've already mentioned NaNs. Sentinel values also sometimes need to
    compare not equal with themselves. Forcing them to compare equal will
    cause breakage.

    else:
    try:
    return (x == y)
    except Exception:
    return False
    Why False? Why not True? If an error occurs inside __eq__, how do you
    know that the correct result was False?

    class Broken(object):
    def __eq__(self, other):
    return Treu # oops, raises NameError



    --
    Steven
  • Mark Wooding at Jan 6, 2009 at 1:24 am

    Steven D'Aprano wrote:

    I've already mentioned NaNs. Sentinel values also sometimes need to
    compare not equal with themselves. Forcing them to compare equal will
    cause breakage.
    There's a conflict between such domain-specific considerations (NaNs,
    strange sentinels, SAGE's equations), and relatively natural assumptions
    about an == operator, such as it being an equivalence relation.

    I don't know how to resolve this conflict without introducing a new
    function which is (or at least strongly encourages developers to arrange
    for it to be) an equivalence relation.

    -- [mdw]
  • Steven D'Aprano at Jan 6, 2009 at 2:16 am

    On Tue, 06 Jan 2009 01:24:58 +0000, Mark Wooding wrote:

    Steven D'Aprano wrote:
    I've already mentioned NaNs. Sentinel values also sometimes need to
    compare not equal with themselves. Forcing them to compare equal will
    cause breakage.
    There's a conflict between such domain-specific considerations (NaNs,
    strange sentinels, SAGE's equations), and relatively natural assumptions
    about an == operator, such as it being an equivalence relation.
    Such assumptions only hold under particular domains though. You can't
    assume equality is an equivalence relation once you start thinking about
    arbitrary domains.

    I don't know how to resolve this conflict without introducing a new
    function which is (or at least strongly encourages developers to arrange
    for it to be) an equivalence relation.
    But there cannot be any such function which is a domain-independent
    equivalence relation, not if we're talking about arbitrarily wacky
    domains. Even something as straight-forward as "is" can't be an
    equivalence relation under a domain where identity isn't well-defined.


    --
    Steven
  • Mark Wooding at Jan 6, 2009 at 12:42 pm

    Steven D'Aprano wrote:

    Such assumptions only hold under particular domains though. You can't
    assume equality is an equivalence relation once you start thinking
    about arbitrary domains.
    From a formal mathematical point of view, equality /is/ an equivalence
    relation. If you have a relation on some domain, and it's not an
    equivalence relation, then it ain't the equality relation, and that's
    flat.
    But there cannot be any such function which is a domain-independent
    equivalence relation, not if we're talking about arbitrarily wacky
    domains.
    That looks like a claim which requires a proof to me. But it could also
    do with a definition of `domain', so I'll settle for one of those first.

    If we're dealing with sets (i.e., `domain's form a subclass of `sets')
    then the claim is clearly false, and equality (determined by comparison
    of elements) is indeed a domain-independent equivalence relation.
    Even something as straight-forward as "is" can't be an equivalence
    relation under a domain where identity isn't well-defined.
    You've completely lost me here. The Python `is' operator is (the
    characteristic function of) an equivalence relation on Python values:
    that's its definition. You could describe an extension of the `is'
    relation to a larger set of items, such that it fails to be an
    equivalence relation on that set, but you'd be (rightly) criticized for
    failing to preserve one of its two defining properties. (The other is
    that `is' makes distinctions between values which are at least as fine
    as any other method, and this property should also be extended .)

    Let me have another go.

    All Python objects are instances of `object' or of some more specific
    class. The `==' operator on `object' is (the characteristic function
    of) an equivalence relation. In, fact, it's the same as `is' -- but
    `==' can be overridden by subclasses, and subclasses are permitted --
    according to the interface definition -- to coarsen the relation. In
    fact, they're permitted to make it not be an equivalence class at all.

    I claim that this is a problem. I /agree/ that domain-specific
    predicates are useful, and can be sufficiently useful that they deserve
    the `==' name -- as well as floats and numpy, I've provided SAGE and
    sympy as examples myself. But I also believe that there are good
    reasons to want an `equivalence' operator (I'll write it as `=~', though
    I don't propose this as Python syntax -- see below) with the following
    properties:

    * `=~' is the characteristic function[1] of an equivalence relation,
    i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
    True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z

    * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
    x is y then x =~ y.

    A valuable property might be that x =~ y if x and y are
    indistinguishable without using `is'. That would mean immediately that
    'xyz' =~ 'xy' + 'z' (regardless of interning, because strings are
    immutable). But for tuples this would imply elementwise comparison,
    which may be expensive -- and, in the case of tuples manufactured by C
    extensions, nontrivial because manufactured tuples need not be acyclic.
    On the other hand, `==' is already recursive on tuples.

    We can envisage a collection of different relations, according to which
    distinguishing methods we're willing to disallow. For example, for
    numerical types, there are actually a number of interesting relations,
    according to whether you think the answers to the following questions
    are true or false.

    * Is 1 =~ 1/1? (Here, 1 is an integer, and 1/1 is a rational number;
    both are the multiplicative identities of their respective rings.
    I'd suggest that it doesn't seem very useful to say `no' here, but
    there might be reasons why one would want type(x) is type(y) if
    x =~ y.)

    * Is 1 =~ 1.0? (This is trickier. Numerically the values are equal;
    but the former is exact and the latter inexact, and this is a good
    reason to want a separation.)

    Essentially, these are asking whether `type' is a legitimate
    distinguisher, and I think that the answer, unhelpful as it may be, is
    `sometimes'.

    A third useful distinguishing technique is mutation. Given two
    singleton lists whose respective elements compare equivalent, I can
    mutate one of them to decide whether the other is in fact the same. Is
    this something which `=~' should distinguish? Again, the answer is
    probably `sometimes'.

    To summarize: we're left with at least three different characteristics
    which an equivalence predicate might have:

    * efficient (e.g., bounded recursion depth, works on circular values);
    * neglects irrelevant (to whom?) differences of type; and
    * neglects differences due to mutability.

    A predicate used to compare set elements or hash-table keys should
    probably /respect/ mutability. (Associating hashing with this
    predicate, rather than `==', would coherently allow mutable objects such
    as lists to be used as dictionary keys, though they'd be compared by
    address. I don't actually know how useful this would be, but suspect
    that it wouldn't.)

    Oh, before I go, let me make this very clear: I am /not/ proposing a
    language change. I think the right way to addres these problems is
    using existing mechanisms such as generic functions with multimethods.
    Syntax can come later if it seems sufficiently important.

    [1] I'll settle for it being a partial function, i.e., attempting to
    evaluate x =~ y might raise exceptions, e.g., if x is in some
    invalid state, or perhaps if one or both of x or y is circular,
    though it would be good to minimize such cases.

    -- [mdw]
  • Steven D'Aprano at Jan 6, 2009 at 11:10 pm

    On Tue, 06 Jan 2009 12:42:13 +0000, Mark Wooding wrote:

    Steven D'Aprano wrote:
    Such assumptions only hold under particular domains though. You can't
    assume equality is an equivalence relation once you start thinking
    about arbitrary domains.
    From a formal mathematical point of view, equality /is/ an equivalence
    relation. If you have a relation on some domain, and it's not an
    equivalence relation, then it ain't the equality relation, and that's
    flat.
    Okay, fair enough. In the formal mathematical sense, equality is always
    an equivalence relation. So there are certain domains which don't have
    equality, e.g. floating point, since nan != nan. Also Python objects,
    since x.__eq__(y) is not necessarily the same as y.__eq__(x).


    But there cannot be any such function which is a domain-independent
    equivalence relation, not if we're talking about arbitrarily wacky
    domains.
    That looks like a claim which requires a proof to me. But it could also
    do with a definition of `domain', so I'll settle for one of those first.
    I'm talking about domain in the sense of "a particular problem domain".
    That is, the model, data and operations used to solve a problem. I don't
    know that I can be more formal than that.

    To prove my claim, all you need is two domains with a mutually
    incompatible definition of equality. That's not so difficult, surely? How
    about equality of integers, versus equality of integers modulo some N?


    If we're dealing with sets (i.e., `domain's form a subclass of `sets')
    then the claim is clearly false, and equality (determined by comparison
    of elements) is indeed a domain-independent equivalence relation.
    It isn't domain-independent in my sense, because you have specified one
    specific domain, namely set equality.

    Even something as straight-forward as "is" can't be an equivalence
    relation under a domain where identity isn't well-defined.
    You've completely lost me here. The Python `is' operator is (the
    characteristic function of) an equivalence relation on Python values:
    that's its definition.
    Yes, that's because identity is well-defined in Python. I'm saying that
    if identity isn't well-defined, then neither is the 'is' operator, and
    therefore it isn't an equivalence relation. That shouldn't be
    controversial.


    All Python objects are instances of `object' or of some more specific
    class. The `==' operator on `object' is (the characteristic function
    of) an equivalence relation. In, fact, it's the same as `is' -- but
    `==' can be overridden by subclasses, and subclasses are permitted --
    according to the interface definition -- to coarsen the relation. In
    fact, they're permitted to make it not be an equivalence class at all.

    I claim that this is a problem.
    It *can* be a problem, if you insist on using == on arbitrary types while
    still expecting it to be an equivalence relation.

    If you drop the requirement that it remain an e-r, then you can apply ==
    to arbitrary types. And if you limit yourself to non-arbitrary types,
    then you can safely use (say) any strings you like, and == will remain an
    e-r.



    I /agree/ that domain-specific
    predicates are useful, and can be sufficiently useful that they deserve
    the `==' name -- as well as floats and numpy, I've provided SAGE and
    sympy as examples myself. But I also believe that there are good
    reasons to want an `equivalence' operator (I'll write it as `=~', though
    I don't propose this as Python syntax -- see below) with the following
    properties:

    * `=~' is the characteristic function[1] of an equivalence relation,
    i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
    True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z

    * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
    x is y then x =~ y.

    Ah, but you can't have such a generic e-r that applies across all problem
    domains. Consider:

    Let's denote regular, case-sensitive strings using "abc", and special,
    case-insensitive strings using i"abc". So for regular strings, equality
    is an e-r; for case-insensitive strings, equality is also an e-r (I
    trust that the truth of this is obvious). But if you try to use equality
    on *both* regular and case-insensitive strings, it fails to be an e-r:

    i"abc" =~ "ABC" returns True if you use the case-insensitive definition
    of equality, but returns False if you use the case-sensitive definition.
    There is no single definition of equality that is *simultaneously* case-
    sensitive and case-insensitive.

    A valuable property might be that x =~ y if x and y are
    indistinguishable without using `is'.
    That's a little strong, because it implies that equality must look at
    *everything* about a particular object, not just whatever bits of data
    are relevant for the problem domain.

    For example, consider storing data in a dict.
    D1 = {-1: 0, -2: 0}
    D2 = {-2: 0}
    D2[-1] = 0
    D1 == D2
    True


    We certainly want D1 and D2 to be equal. But their history is different,
    and that makes their internal details different, which has detectable
    consequences:
    D1
    {-2: 0, -1: 0}
    D2
    {-1: 0, -2: 0}


    The same happens with trees. Given a tree structure defined as:

    (payload, left-subtree, right-subtree)

    do you want the following two trees to be equal?

    ('b', ('a', None, None), ('c', None, None))

    ('a', None, ('b', None, ('c', None, None)))

    Unless I've made a silly mistake, not only are the payloads of the two
    trees equal, but so are the in-order representation of both. Only the
    specific order the nodes are stored in differ, and that may not be
    important for the specific problem you are trying to solve.

    There may be problem domains where the order of elements in a list (or
    tree structure) *is* important, and other problem domains where order is
    irrelevant. One single relation can't cover all such conflicting
    requirements.



    --
    Steven
  • Mark Wooding at Jan 7, 2009 at 1:23 am

    Steven D'Aprano wrote:

    To prove my claim, all you need is two domains with a mutually
    incompatible definition of equality. That's not so difficult, surely? How
    about equality of integers, versus equality of integers modulo some N?
    No, that's not an example. The integers modulo N form a ring Z/NZ of
    residue classes. Such residue classes are distinct from the integers --
    e.g., an integer 3 (say) is not the same as the set 3 + NZ { ..., 3 - 2N,
    3 - N, 3, 3 + N, 3 + 2N, ... } -- but there is a homomorphism from Z
    to Z/NZ under which 3 + NZ is the image of 3.

    If we decide to define the == operator such that 3 == 3 + NZ and 3 + N
    == 3 + NZ then == is not an equivalence relation (in particular,
    transitivity fails). But that's just an artifact of the definition. If
    we distinguish 3 from 3 + NZ then everything is fine. 3 + NZ == (3 + N)
    + NZ correctly, but 3 != 3 + N, and all is well.

    Here, at least, the problem is not that == as an equivalence relation
    fails in some particular domain -- because in both Z and Z/NZ it can be
    a perfectly fine equivalence relation -- but that it can potentially
    fail on the boundaries between domains. Easy answer: don't mess it up
    at the boundaries.

    Proposition. Let U, U' be disjoint sets, and let E, E' be equivalence
    relations on U, U' respectively. Define E^ on U union U' as E^ = E
    union E', i.e.,

    E^(x, y) iff x in U and y in U and E(x, y) or
    x in U' and y in U' and E'(x, y)

    Then E^ is an equivalence relation.

    Proof. Reflexivity and symmetry are trivial; transitivity follows from
    disjointness of U and U'.
    It *can* be a problem, if you insist on using == on arbitrary types
    while still expecting it to be an equivalence relation.
    Unfortunately, from the surrounding discussion, it seems that container
    types particularly want to be able to contain arbitrary objects, and the
    failure of == to be a equivalence relation makes this fail. The problem
    is that objects with wacky == operators are still more or less quacking
    like the more usual kinds of ducks; but they turn out to taste very
    different.
    Let's denote regular, case-sensitive strings using "abc", and special,
    case-insensitive strings using i"abc". So for regular strings, equality
    is an e-r; for case-insensitive strings, equality is also an e-r (I
    trust that the truth of this is obvious). But if you try to use equality
    on *both* regular and case-insensitive strings, it fails to be an e-r:

    i"abc" =~ "ABC" returns True if you use the case-insensitive definition
    of equality, but returns False if you use the case-sensitive definition.
    There is no single definition of equality that is *simultaneously* case-
    sensitive and case-insensitive.
    A case-sensitive string is /not the same/ as a case-insensitive string.
    One's a duck, the other's a goose. I'd claim here that i"abc" =~ "ABC"
    must be False, because i"abc" =~ "abc" must be false also! To define it
    otherwise leads to the incoherence you describe. But the above
    proposition provides an easy answer.
    A valuable property might be that x =~ y if x and y are
    indistinguishable without using `is'.
    That's a little strong, because it implies that equality must look at
    *everything* about a particular object, not just whatever bits of data
    are relevant for the problem domain.
    Yes. That's one of the reasons that =~ isn't the same as ==.

    I've been thinking on my feet in this thread, so I haven't thought
    everything through. And as I mention below, there are /many/ useful
    equality predicates on values. As I didn't mention (but hope is
    obvious) having a massively-parametrized equality predicate is daft, and
    providing enough to suit every possible application equally so. But we
    might be able to do well enough with just one or two -- or maybe by just
    leaving things as they are.
    For example, consider storing data in a dict.
    D1 = {-1: 0, -2: 0}
    D2 = {-2: 0}
    D2[-1] = 0
    D1 == D2
    True


    We certainly want D1 and D2 to be equal.
    Do we? If we're using my `indistinguishable without using ``is'''
    criterion from above, then D1 and D2 are certainly different! To detect
    the difference, mutate one and see if the other changes:

    def distinct_dictionaries_p(D1, D2):
    """
    Decide whether D1 and D2 are the same dictionary or not.
    Not threadsafe.
    """
    magic = []
    more_magic = [magic]
    old = D1.get('mumble', more_magic)
    D1['mumble'] = magic
    result = D2.get('mumble', more_magic) is magic
    if old is more_magic:
    del D1['mumble']
    else:
    D1['mumble'] = old
    return result

    But that criterion was a suggestion -- a way of defining a coherent
    equivalence relation on the whole of the Python value space which is
    coarser than `is' and maybe more useful. My primary purpose in
    proposing it was to stimulate discussion: what /do/ we want from
    equality predicates? We already have `is', which is too fine-grained to
    be widely useful: it distinguishes between different instances of the
    number 500000, for example, and I can't for the life of me see why
    that's a useful behaviour. (The `is' operator is a fine thing, and I
    wouldn't want it any other way: it trades away some useful semantics for
    the sake of speed, and that was the /right/ decision.)

    My criterion succeeds in distinguishing 1 from 1.0 (they have different
    types), which may be considered good. It doesn't distinguish a quiet
    NaN from another quiet NaN: that's definitely good. (It'd be bogus for
    a numeric equality operator, but we've already got one of those, so we
    don't need to define another.) But you're probably right: it's still
    too fine-grained for some purposes.
    But their history is different, and that makes their internal details
    different, which has detectable consequences:
    D1
    {-2: 0, -1: 0}
    D2
    {-1: 0, -2: 0}
    So in this case, `str' also works as a distinguisher. Fine.
    There may be problem domains where the order of elements in a list (or
    tree structure) *is* important, and other problem domains where order is
    irrelevant. One single relation can't cover all such conflicting
    requirements.
    Absolutely. This is why Common Lisp provides four(!) out of the box and
    it still isn't enough. Python provides one (`is') and a half (`==' when
    it's behaving) is actually coping remarkably well considering. But this
    /is/ causing problems, and so thinking about solutions seems reasonable.

    I'm not trying to change the language. I don't have a pet feature I
    want added. I do think the discussion is interesting and worthwhile,
    though.

    -- [mdw]
  • Steven D'Aprano at Jan 7, 2009 at 9:26 am

    On Wed, 07 Jan 2009 01:23:19 +0000, Mark Wooding wrote:

    A case-sensitive string is /not the same/ as a case-insensitive string.
    One's a duck, the other's a goose. I'd claim here that i"abc" =~ "ABC"
    must be False, because i"abc" =~ "abc" must be false also! To define it
    otherwise leads to the incoherence you describe.
    It's only incoherent if you need equality to be an equivalence relation.
    If you don't, it is perfectly reasonable to declare that i"abc" equals
    "abc".


    --
    Steven

Related Discussions

People

Translate

site design / logo © 2022 Grokbase