FAQ
Reading PEP-3000, one proposed change caught my attention. I've
checked the Python Wiki, and here it is:

-> Raise an exception when making comparisons between two
incongruent types (other than equality and inequality.)
-> Reason: such comparisons do not make sense and are especially
confusing to new users of Python.

While I agree that having comparison between values of different types
(classes?) may be confusing, it is useful to define the sorting order.
I thought that the idea (since 2.1, and even more with 2.4) is the
sort ordering would rely on the rich comparison methods.

For example: today's lists may contain objects of arbitrary types and
can be sorted; even if the actual ordering may seem arbitrary, it
works for most purposes. My question is: If Python 3.0 abolishes
comparison between arbitrary types, how will generic sorting be
handled?

My guess is that it will not be possible unless you provide a generic
__cmp__ comparison function (that was the pre-2.1 method, I think).
The underlying assumption is that heterogeneous lists are not supposed
to be sorted, unless the programmer really knows what he's doing.
(BTW, for all purposes the same can be said about the use of the new
key argument provided by 2.4).

p.s. The definitive answer to this question (if any) should be
included in PEP3000.

--
Carlos Ribeiro
Consultoria em Projetos
blog: http://rascunhosrotos.blogspot.com
blog: http://pythonnotes.blogspot.com
mail: carribeiro at gmail.com
mail: carribeiro at yahoo.com

Search Discussions

  • Batista, Facundo at Sep 21, 2004 at 5:17 pm
    [Carlos Ribeiro]

    #- For example: today's lists may contain objects of arbitrary types and
    #- can be sorted; even if the actual ordering may seem arbitrary, it
    #- works for most purposes. My question is: If Python 3.0 abolishes
    #- comparison between arbitrary types, how will generic sorting be
    #- handled?
    #-
    #- My guess is that it will not be possible unless you provide a generic
    #- __cmp__ comparison function (that was the pre-2.1 method, I think).
    #- The underlying assumption is that heterogeneous lists are
    #- not supposed
    #- to be sorted, unless the programmer really knows what he's doing.
    #- (BTW, for all purposes the same can be said about the use of the new
    #- key argument provided by 2.4).

    My guess is that if you have heterogeneous list, and want to sort them,
    you'll need to pass a function to ``sort()``, which will care about data
    types and do the right thing.

    . Facundo

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://mail.python.org/pipermail/python-list/attachments/20040921/af3dcdcd/attachment.htm
  • Steven Bethard at Sep 21, 2004 at 5:24 pm

    Carlos Ribeiro <carribeiro <at> gmail.com> writes:
    For example: today's lists may contain objects of arbitrary types and
    can be sorted; even if the actual ordering may seem arbitrary, it
    works for most purposes. My question is: If Python 3.0 abolishes
    comparison between arbitrary types, how will generic sorting be
    handled?
    Could you give an example of a list that you'd like to do this to? I'm still
    having trouble imagining a list of disparate types that I call sort on...

    Thanks,

    Steve
  • Phil Frost at Sep 21, 2004 at 5:34 pm
    What about binary trees? Currently it's possible to implement a binary
    tree that is as general as a dict, but restricting comparisons to like
    types would make that impossible.
    On Tue, Sep 21, 2004 at 05:24:48PM +0000, Steven Bethard wrote:
    Carlos Ribeiro <carribeiro <at> gmail.com> writes:
    For example: today's lists may contain objects of arbitrary types and
    can be sorted; even if the actual ordering may seem arbitrary, it
    works for most purposes. My question is: If Python 3.0 abolishes
    comparison between arbitrary types, how will generic sorting be
    handled?
    Could you give an example of a list that you'd like to do this to? I'm still
    having trouble imagining a list of disparate types that I call sort on...

    Thanks,

    Steve
  • Steven Bethard at Sep 21, 2004 at 5:42 pm

    Phil Frost <indigo <at> bitglue.com> writes:
    On Tue, Sep 21, 2004 at 05:24:48PM +0000, I wrote:
    Could you give an example of a list that you'd like to do this to? I'm
    still having trouble imagining a list of disparate types that I call sort
    on...
    What about binary trees? Currently it's possible to implement a binary
    tree that is as general as a dict, but restricting comparisons to like
    types would make that impossible.
    Is there a good use case for binary trees with incompatible types at the nodes?

    I'm not sure I follow your comparison here anyway, since you can't sort a
    dict... Could you clarify?

    Thanks,

    STeve
  • Carlos Ribeiro at Sep 21, 2004 at 6:25 pm

    On Tue, 21 Sep 2004 17:42:35 +0000 (UTC), Steven Bethard wrote:
    I'm not sure I follow your comparison here anyway, since you can't sort a
    dict... Could you clarify?
    Any hashable object can be used as a key. That's the point. If you're
    using a balanced binary tree as a high performance object storage with
    bounded search time (log N), then you're going to need comparisons to
    work.

    Anyway, the discussion so far gave me an idea: sortable objects could
    implement the __key__ function to return a sortable value. Here it is:

    1) comparison would work between a few types -- None, booleans,
    numbers, and strings, in this order. It's an arbritrary but
    determinist ordering, and does not deal with things such as complex
    nmbers.

    2) other objects would implement the __key__function for comparison.
    __key__ would have to return one of the built-in, sortable types, as
    described above.

    This is a workable solution to preserve the current behavior (multiple
    objects can be sorted). However, it has a disadvantage in that it's
    not orthogonal -- but neither is restricting comparisons.

    Just to make it clear, here it is how it works now:
    a = [ 3.5, -1.0, "", (0,1), None, "z"]
    a.sort()
    a
    [None, -1.0, 3.5, '', 'z', (0, 1)]

    None comes first, then numbers, then strings, then tuples. And sort
    simply works. BTW, it even sorts tuples in a deterministic fashion --
    is it going to be lost in Python 3.0 too?

    Now that we're talking about tuples, another example:
    (0,1) < (1,0)
    True
    (0,1) < [1,0]
    False
    [0,1] < [1,0]
    True

    Lists come before sequences -- don't know if it does make sense, the
    "correct" thing for sorting seems to consider then equal as far as
    relative ordering is concerned; that would make (0,1) < [1,0] -> True.
    But then, would it make (0,0) be the same as [0,0]? Of course not --
    one is a list, the other one is a tuple. All this can be simply put
    as:

    a) Relative ordering (or partial ordering) is one thing;
    b) Magnitude comparison is another thing, that sometimes coincide with
    relative ordering;
    c) Equality/inequality is yet another thing.

    ...and we're trying to solve all three with the same set of operators.
    But please, don't try to propose anything more complex, it's not going
    to help here :-)

    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Phil Frost at Sep 21, 2004 at 6:29 pm

    On Tue, Sep 21, 2004 at 05:42:35PM +0000, Steven Bethard wrote:
    Phil Frost <indigo <at> bitglue.com> writes:
    On Tue, Sep 21, 2004 at 05:24:48PM +0000, I wrote:
    Could you give an example of a list that you'd like to do this to? I'm
    still having trouble imagining a list of disparate types that I call sort
    on...
    What about binary trees? Currently it's possible to implement a binary
    tree that is as general as a dict, but restricting comparisons to like
    types would make that impossible.
    Is there a good use case for binary trees with incompatible types at the nodes?

    I'm not sure I follow your comparison here anyway, since you can't sort a
    dict... Could you clarify?

    Thanks,

    STeve
    That's the point. Dicts can't be sorted, but binary trees *must* be.
    There are at least two good reasons to use a binary tree:

    - you need a mapping that is sorted by key
    - the worst case O(n) complexity of a hash table is not acceptable

    The right question here is, "is there a reason for mappings with
    heterogeneous key types?" It's not something that I do often, but it's
    something that's important to have. Dynamic typing is a good thing.
  • Steven Bethard at Sep 21, 2004 at 6:44 pm

    Phil Frost <indigo <at> bitglue.com> writes:
    That's the point. Dicts can't be sorted, but binary trees *must* be.
    Ahh, I see. You're suggesting using binary trees as an implementation of a
    mapping. Binary trees have so many uses, I wasn't really clear what you were
    suggesting they be used as. (After I sent my mail, I had (incorrectly)
    determined that you intended them as sets...)
    It's not something that I do often, but it's something that's important to
    have.
    It does come down to a question nearly identical to the one I asked before:
    (1) "is there a good use case for wanting to sort a list containing
    incompatible types?" (my previous question)
    (2) "is there a good use case for wanting to make a mapping with keys that
    have incompatible types?" (my question to you)

    To some degree, (1) has already been answered to my satisfaction by Carlos
    Ribeiro's spreadsheet example. If you could give me a real world example of
    when you'd want to do (2), I might be more convinced...

    Thanks,

    Steve
  • Alex Martelli at Sep 22, 2004 at 7:20 am
    Steven Bethard wrote:
    ...
    (2) "is there a good use case for wanting to make a mapping with keys that
    have incompatible types?" (my question to you)

    To some degree, (1) has already been answered to my satisfaction by Carlos
    Ribeiro's spreadsheet example. If you could give me a real world example of
    when you'd want to do (2), I might be more convinced...
    Does my example of 'tuples as concrete representations of expressions'
    which I posted to another subthread of this thread help? Say several
    such expressions are coming in to be evaluated over the same context
    (assignment of values to free variables), I want to memoize the
    expressions that I have already computed this time (to avoid recomputing
    in a situation where the same expression or subexpression is likely to
    recur ofter) and the natural way to do it is:
    if expr in known_expr: return known_expr[expr]
    value = known_expr[expr] = do_compute(expr)
    return value

    In this case a dict would do just as well as a BTree, but it's easy to
    think of a slight variant where expressions would not easily be made
    hashable, but might still easily use some arbitrary comparison.


    Alex
  • Steven Bethard at Sep 22, 2004 at 8:36 am

    Alex Martelli <aleaxit <at> yahoo.com> writes:

    I wrote:
    ...
    (2) "is there a good use case for wanting to make a mapping with keys that
    have incompatible types?" (my question to you)

    To some degree, (1) has already been answered to my satisfaction by Carlos
    Ribeiro's spreadsheet example. If you could give me a real world example
    of when you'd want to do (2), I might be more convinced...
    Does my example of 'tuples as concrete representations of expressions'
    which I posted to another subthread of this thread help?
    Yeah, thanks. It's still definitely not the kind of computation that 99% of
    users are going to want to do, but if you wanted to memoize such tuples and
    you also expected hashing said tuples to be more expensive than cmp'ing them,
    I can see that the mapping you replace dict with would need to be able to sort
    disparate types.

    'Course rather than wrap your expression tuples in a class to define the
    __cmp__ function, it seems like a more reasonable solution might be to make
    the arbitrary cmp decision in the mapping class (the one implemented as a
    BTree). If __cmp__ starts raising TypeErrors, your code could do something
    like:

    def insert(self, val):
    try:
    c = cmp(self.val, val)
    except TypeError:
    c = -1 # self.val and val were not of the same type
    # make some arbitrary decision here
    ... # insert according to cmp result

    Does take more work than it would now -- forces the arbitrary decision onto
    the BTree writer, instead of doing it automatically in cmp... How much more
    work really depends on how complicated your 'arbitrary' scheme is...

    Steve
  • Alex Martelli at Sep 22, 2004 at 9:03 am
    Steven Bethard wrote:
    ...
    BTree). If __cmp__ starts raising TypeErrors, your code could do something
    like:

    def insert(self, val):
    try:
    c = cmp(self.val, val)
    except TypeError:
    c = -1 # self.val and val were not of the same type
    # make some arbitrary decision here
    ... # insert according to cmp result
    Nope, this would probably lead to subtle and hard to debug errors, since
    both a<b and b<a would appear to be true for some heterogeneous values
    of a and b, violating a crucial semantic constraint of the < operator.

    The fact that some such problems were observed in Python itself was used
    as an argument to justify not doing comparisons among het types in
    Python; I argue that pushing such subtle responsibilities down to Python
    _users_ is no progress.

    An acceptable compromise might be to have a universal_compare function
    available in some module, already carefully coded to avoid the subtle
    traps. I believe a universal_lt might in fact suffice, and that it
    would only need to ensure <'s semantic constraints (specifically their
    connection with ==) on types where == "does nothing tricky"... all
    built-in types, but only _some_ user-coded types, namely the sensible
    ones. If one codes an __eq__ which randomly returns True or False, for
    example, there is no way universal_whatever can sensibly cooperate with
    it to ensure (e.g.) that a==b implies not a<b and not b<a and VV!-)


    Alex
  • Carlos Ribeiro at Sep 22, 2004 at 12:21 pm

    On Wed, 22 Sep 2004 11:03:51 +0200, Alex Martelli wrote:
    The fact that some such problems were observed in Python itself was used
    as an argument to justify not doing comparisons among het types in
    Python; I argue that pushing such subtle responsibilities down to Python
    _users_ is no progress.
    That pretty much sums my argument very well (Thanks Alex!). I think
    that sort must simply work. *If* some type of ordering between
    heterogeneous items is defined by the Python sort() behavior, then
    it's much easier for any user (novice or expert) to simply refer to
    that ordering when trying to understand how does sort() works. Look at
    what happens if this problem is moved to the user side:

    -- extra care will be needed by the part of the programmer, because
    sort() can raise TypeErrors in situations where it doesn't raise today
    (even a *very simple case*, that is to have None values in the list,
    as the ones returned by SQL queries);

    -- worse performance, because a generic __cmp__ function written is
    Python is bound to be orders of magnitude slower than the native
    implementation;

    -- in the lack of a standard, default ordering, each and every
    programmer will define its own ordergin when it comes to managing
    heterogeneous data. If the overloaded function is exported, the user
    will be left to guess about the ordering. In a way, it violates the
    TOOWTDI motto.

    Talking about real applications:

    -- SQL applications can return null values for non-initialized values.
    Try ordering that.

    -- OLE applications that return variants are bound to the same
    problem. They may return null values, or any other type, and Python
    will convert the type automatically. If these values are fed to a list
    and then sorted, again, there may be problems.

    -- The long tuples mentioned by Alex can be simply long strings.
    Internationalization text, for instance -- to compute the hash, one
    must iterate over the entire string; a balanced tree-based design will
    compare only the first few characters in most cases. Hashing strings
    is probably fast (a very tight loop, I assum), but for some size of
    string is bound to start to get slower than binary comparisons
    mentioned.

    -- A heap can't be used anymore to treat heterogeneous objects.

    And finally:

    I think that many people are worried about what are really corner
    cases -- sorting complex numbers or other complex data structures.
    That's not what worries me, because in this situation we can safely
    assume that the application is complex enough to deserve sort()
    customization. *What I'm worried about is the lack of default ordering
    for fundamental types*. Another issue that was raised is the ordering
    between strings and numbers. Well, this particular case can be
    discussed. As it is today, it's not that bad -- some people may forget
    to convert numbers to strings, or be confused by the final ordering,
    but at least *there's a standard ordering* that works for everyone
    else.

    In conclusion (I'm repeating myself again):

    -- ordering, for sort purposes, is one thing -- it's *information
    management* at work..
    -- mathemathical ordering is another thing.

    The two concepts are *very similar*, but are still *different*, as far
    as their real applications are concerned. In the case of sorting, what
    matters is to have a unique and well defined ordering that can be used
    for *information management* purposes. On the other hand, rich
    comparisons are already being used for other purposes, valid in the
    mathematical sense, that have nothing to do with ordering. Is it or
    not clear?


    p.s. It's good to have this discussion so far in advance to Py3K. This
    way we can avoid having it at Py3K alpha, with very little time and a
    small chance to win the argument against a commited patch.

    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Alex Martelli at Sep 22, 2004 at 12:43 pm
    Carlos Ribeiro wrote:
    ...
    Python; I argue that pushing such subtle responsibilities down to Python
    _users_ is no progress.
    That pretty much sums my argument very well (Thanks Alex!). I think
    You're welcome.
    that sort must simply work. *If* some type of ordering between
    I think an acceptable compromise would be for the user to have to do:

    import universal_comparator
    arbitrary_list.sort(cmp=universal_comparator.cmp)

    in order to ensure sort "simply works". Actually, I hope in Python 3K
    sort will take just an 'LT', not requiring a full 'CMP' (I believe the
    single operator < with the proper semantics is sufficient), but that's a
    separate issue. The point is, IMHO, that requiring universal
    comparability _is_ rare enough that "comparing anything at all" need not
    be the DEFAULT -- but is definitely not rare enough that it can be
    simply ignored or pushed down to the users. A universal comparison
    function importable from a suitable module seems about right to me.


    Alex
  • Peter Otten at Sep 22, 2004 at 9:15 am

    Steven Bethard wrote:

    'Course rather than wrap your expression tuples in a class to define the
    __cmp__ function, it seems like a more reasonable solution might be to
    make the arbitrary cmp decision in the mapping class (the one implemented
    as a BTree).
    Yes, and that would give you a lot more flexibility, as sorting by an
    attribute is almost as common as a "natural" sort (Not counting lists
    consisting entirely of items of the same type).
    If __cmp__ starts raising TypeErrors, your code could do
    something like:

    def insert(self, val):
    try:
    c = cmp(self.val, val)
    except TypeError:
    c = -1 # self.val and val were not of the same type
    # make some arbitrary decision here
    ... # insert according to cmp result
    I'd rather pass the comparison function to __init__(self, cmp=cmp) and then
    do

    def insert(self, value):
    cmp = self.cmp
    # use cmp to insert value in the appropriate place

    You could even provide a classical_compare that compares values like today's
    default. I don't think you can seriously call self.cmp() instead of cmp()
    more work. If you are concerned about the extra slot for the cmp attribute
    of your custom BTree, put the default into the class.

    Peter
  • Andrew Dalke at Sep 21, 2004 at 8:02 pm
    Phil Frost wrote:
    On Tue, Sep 21, 2004 at 05:42:35PM +0000, Steven Bethard wrote:

    Phil Frost <indigo <at> bitglue.com> writes:
    On Tue, Sep 21, 2004 at 05:24:48PM +0000, I wrote:

    Could you give an example of a list that you'd like to do this to? I'm
    still having trouble imagining a list of disparate types that I call sort
    on...
    What about binary trees? Currently it's possible to implement a binary
    tree that is as general as a dict, but restricting comparisons to like
    types would make that impossible.
    Is there a good use case for binary trees with incompatible types at the nodes?

    I'm not sure I follow your comparison here anyway, since you can't sort a
    dict... Could you clarify?

    Thanks,

    STeve

    That's the point. Dicts can't be sorted, but binary trees *must* be.
    There are at least two good reasons to use a binary tree:

    - you need a mapping that is sorted by key
    - the worst case O(n) complexity of a hash table is not acceptable

    The right question here is, "is there a reason for mappings with
    heterogeneous key types?" It's not something that I do often, but it's
    something that's important to have. Dynamic typing is a good thing.
  • Andrew Dalke at Sep 21, 2004 at 8:04 pm
    [Sorry, hit 'Send' by accident]

    Phil Frost wrote:
    The right question here is, "is there a reason for mappings with
    heterogeneous key types?" It's not something that I do often, but it's
    something that's important to have. Dynamic typing is a good thing.
    Though it's a different question. mapping keys only
    need to define __hash__ and __eq__. That's easier than
    defining an ordering.

    Andrew
    dalke at dalkescientific.com
  • Carlos Ribeiro at Sep 21, 2004 at 8:54 pm

    On Tue, 21 Sep 2004 20:04:07 GMT, Andrew Dalke wrote:
    Phil Frost wrote:
    The right question here is, "is there a reason for mappings with
    heterogeneous key types?" It's not something that I do often, but it's
    something that's important to have. Dynamic typing is a good thing.
    Though it's a different question. mapping keys only
    need to define __hash__ and __eq__. That's easier than
    defining an ordering.
    Given all the arguments, I'm changing my approach to the sorting
    problem. My proposal is that sortable objects should provide a __key__
    method. sort() already accepts a key function to be passed. As of
    today, there are 4 different ways to customize sorting behavior:

    1) passing a comparison function to sort();
    2) passing a key generation function to sort();
    3) implementing __cmp__;
    4) implementing rich comparison methods.

    If the current sorting behavior is to be changed -- as proposed for
    Py3K --, I think that relying on methods such as (3) and (4) is not a
    good idea; __cmp__ is already on the way to deprecation, and rich
    comparisons are already being extended for other uses that have
    nothing to do with sorting. I think that a __key__ function is better
    than (2) (it looks more pythonic to me). Alternative (1) -- the
    comparison function -- may still be useful for some situations, but
    it's not needed if __key__ is implemented.

    (BTW -- in Py3K, __key__ could be used to generate the hash value for
    objects that don't suply a __hash__ method. -- but that's another
    idea, just for the record)

    In my proposal, __key__can assume one of the following values:

    1) None
    2) Boolean (True or False)
    3) Number (int or float)
    4) String (Unicode, of course -- that's another beast to sort properly)
    5) Sequence (list or tuple) composed of the types listed here

    The ordering would be the same presented in the list above -- None is
    the smallest value possible, then booleans, numbers, and so on. It's a
    reasonable ordering, for most possible applications. It doesn't
    attempt to guess at overly complex values or data structures.

    There is at least one case that it's not so complex, it's relatively
    common, and it's properly handled by today's sort() and by the
    proposal above. If you retrieve a list of values from a database, some
    rows may contain null values (mapped to None). If you try to sort them
    in Py3K, you're out of luck, unless you provide an alternative sorting
    method. That's not necessary, in my opinion.

    It's a simple proposal. What do you think?


    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Andrew Dalke at Sep 21, 2004 at 9:29 pm

    Carlos Ribeiro wrote:
    Given all the arguments, I'm changing my approach to the sorting
    problem. My proposal is that sortable objects should provide a __key__
    method.
    class Filename:
    def __init__(self, filename):
    self.filename = filename
    def __cmp__(self, other):
    x = cmp(os.path.getsize(self.filename),
    os.path.getsize(other.filename))
    if x != 0:
    return x
    my_file = open(self.filename, "rb")
    other_file = open(other.filename, "rb")
    while 1:
    my_read = xfile.read(8192)
    other_read = yfile.read(8192)
    x = cmp(my_read, other_read)
    if x != 0:
    return x
    if not other_read:
    return 1
    if not my_read:
    return -1

    filenames = map(Filename,
    ["/etc/passwd", "/home/dalke/file.txt", ...])
    filenames.sort()
    sorted_filenames = [x.filename for x in filenames]

    How would I do that with a __key__? The only
    solution would be to read the contents of all the
    files into memory.

    Andrew
    dalke at dalkescientific.com
  • Aahz at Sep 22, 2004 at 12:46 am
    In article <mailman.3683.1095800082.5135.python-list at python.org>,
    Carlos Ribeiro wrote:
    Given all the arguments, I'm changing my approach to the sorting
    problem. My proposal is that sortable objects should provide a __key__
    method. sort() already accepts a key function to be passed. As of
    today, there are 4 different ways to customize sorting behavior:

    1) passing a comparison function to sort();
    2) passing a key generation function to sort();
    3) implementing __cmp__;
    4) implementing rich comparison methods.

    If the current sorting behavior is to be changed -- as proposed for
    Py3K --, I think that relying on methods such as (3) and (4) is not a
    good idea; __cmp__ is already on the way to deprecation, and rich
    comparisons are already being extended for other uses that have
    nothing to do with sorting.
    AFAIK, there are no plans to officially deprecate __cmp__; there are too
    many use cases for needing a three-way comparison, and when it's
    expensive to compare objects, you want to do it only once.
    --
    Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/

    "A foolish consistency is the hobgoblin of little minds, adored by little
    statesmen and philosophers and divines." --Ralph Waldo Emerson
  • Alex Martelli at Sep 22, 2004 at 7:10 am

    Andrew Dalke wrote:

    Phil Frost wrote:
    The right question here is, "is there a reason for mappings with
    heterogeneous key types?" It's not something that I do often, but it's
    something that's important to have. Dynamic typing is a good thing.
    Though it's a different question. mapping keys only
    need to define __hash__ and __eq__. That's easier than
    defining an ordering.
    A dict is just one specific kind of mapping, one IMPLEMENTATION of the
    mapping interface. Currently the only built-in one, but definitely not
    the only one that can be useful. A BTree and other order-based
    mechanisms might well provide implementations of the mapping interface
    that are preferable in some circumstances (e.g., frequent need to access
    the sequence of keys in a predictable order).

    ((Performance may be better with a comparison-based implementation of a
    mapping, for example when comparisons may typically be performed WAY
    faster than computations of hash values. Say that the keys are tuples
    that tend to be extremely long AND tend to differ within the first few
    items. When such a tuple computes its hash value, it nevertheless needs
    to step through all items. But comparisons between such tuples
    typically get solved within the first few items, as soon as
    corresponding items that differ are met -- which saves the computational
    cost of continuing to step through tuples to the bitter end.))

    Prohibiting ordering comparisons between heterogenous types can make
    sense only if mappings which have keys of both types are a bad thing,
    just like Phil says. The fact that other implementations of mappings
    might still be possible (e.g., a dict) does not per se justify the
    prohibition of implementations such as BTrees, which have their own
    advantages.

    One example I have used are tuples representing expressions. One such
    tuple might be, say:

    ('+', 'foo', 'bar')

    and another might be:

    ('+', ('*', 'foo', 2), 'bar', 'baz')

    I guess these tuples, and their items, would be seen as being
    'heterogeneous' by any language mechanism -- they are 'homogeneous' only
    at a somewhat abstract level. Sure, I could wrap such tuples and each
    of their 'nodes' (items) into an instance of some darned class which
    internally holds an operator and a tuple of operands which can be
    numbers, strings (names of free variables), other such nodes. But one
    of Python's advantages used to be the ability to avoid such gyrations in
    term of language requirements -- the ability to use direct concrete
    representations when appropriate, without the language twisting your arm
    to make you wrap them up in order to be able to pass them uniformly as
    arguments, use them as keys in a dict OR BTree, etc, etc. You only did
    the wrapping up into abstractions if and when you WANTED to (much like,
    say, in Lisp or Scheme -- you start with plain lists, then at some point
    you decide that (car expression) is not a good way to access the
    operator so you switch to a more abstract (operator-of expression), &c).

    Exactly what's gained by forbidding this kind of nice optional usage is
    very murky indeed to me. Why is the "burden of proof" that it IS nice
    to have the option of representing expressions this way, for example,
    being suddenly placed on the shoulders of those who have long been doing
    so, and want to keep that option even when order comparisons are wanted
    between such representations?! Let those who argue it's dangerous and
    bad for us, to use concrete representations whenever you may need
    comparisons, prove to us why our years-long habit was insane, please...!


    Alex
  • Jeremy Bowers at Sep 21, 2004 at 6:45 pm

    On Tue, 21 Sep 2004 17:42:35 +0000, Steven Bethard wrote:
    Is there a good use case for binary trees with incompatible types at the nodes?
    See the ZODB's BTree-based storage.
  • Steven Bethard at Sep 21, 2004 at 6:55 pm

    Jeremy Bowers <jerf <at> jerf.org> writes:
    On Tue, 21 Sep 2004 17:42:35 +0000, Steven Bethard wrote:
    Is there a good use case for binary trees with incompatible types at the
    nodes?
    See the ZODB's BTree-based storage.
    So, I googled for this, and it looked to me like just a BTree implementation --
    not a use case. But since you didn't tell me what the use case was, I may
    not have googled for the right thing...

    Note that I'm not asking if they *allow* a user to store incompatible types.
    I'm asking for a case when a user *wants* to store incompatible types.

    Steve
  • Tim Peters at Sep 21, 2004 at 7:19 pm
    [Steven Bethard]
    Is there a good use case for binary trees with incompatible types
    at the nodes?

    [Jeremy Bowers]
    See the ZODB's BTree-based storage.
    If you really look at them, you'll eventually read the "Total Ordering
    and Persistence" section of the BTree docs:

    <http://zope.org/Wikis/ZODB/FrontPage/guide/node6.html#SECTION000631000000000000000>

    It's strongly recommended there that people use a single key type per
    BTree instance. Most of that part of the docs is explaining how many
    nasty ways there are to get into bad trouble by being cleverer than
    that.
  • Carlos Ribeiro at Sep 21, 2004 at 6:07 pm

    On Tue, 21 Sep 2004 17:24:48 +0000 (UTC), Steven Bethard wrote:
    Could you give an example of a list that you'd like to do this to? I'm still
    having trouble imagining a list of disparate types that I call sort on...
    Assume that you're implementing a spreadsheet like application in
    Python. The user fills a column with arbitrary data, and asks for it
    to be sorted. What is the sorting order? Excel, for instance, defines
    an ordering (it's arbitrary, but it's deterministic).

    Let me put it another way:

    sort() should work regardless of the list elements, and return a
    reasonable result, even if not strictly correct in the numerical
    sense.

    And my last argument, and probably the most controversial one, and the
    one that will see me tortured, killed and nailed to the city door for
    visitors to see what happen with people that pushes the social limits
    :-) Mathemathically speaking: sort() is not a topological sort (which
    works with partial ordering); it implements total ordering (any two
    members of the set can be compared). The set, in this particular case,
    is a Python list, that *can* contain arbitrary data. So it does not
    make sense (in my not-so-humble opinion) for sort to impose
    restrictions based on the list element type.

    (BTW, if we extend this reasoning, the same could be said for other
    types of functions that work over sets -- sum() should ignore
    non-numeric values, etc. But that's another philosophical battle)

    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Steven Bethard at Sep 21, 2004 at 6:29 pm

    On Tue, 21 Sep 2004 15:07:30 -0300, Carlos Ribeiro wrote:
    Assume that you're implementing a spreadsheet like application in
    Python. The user fills a column with arbitrary data, and asks for it
    to be sorted. What is the sorting order? Excel, for instance, defines
    an ordering (it's arbitrary, but it's deterministic).
    Thanks, that helps. You can always have end users asking you to sort
    things that shouldn't be sorted...

    Presumably you could change how sort works so that it first stratifies
    the list by type, and then sorts for each type. The types would then
    be ordered arbitrarily (perhaps by the id of their type?)... Might
    slow down the normal-case sort though...

    Hmm...
    The set, in this particular case,
    is a Python list, that *can* contain arbitrary data. So it does not
    make sense (in my not-so-humble opinion) for sort to impose
    restrictions based on the list element type.
    Yeah, by restricting comparisons, we'd be basically saying that sort
    is only defined for lists that take the form list<some_type>.
    (BTW, if we extend this reasoning, the same could be said for other
    types of functions that work over sets -- sum() should ignore
    non-numeric values, etc. But that's another philosophical battle)
    Currently sum is only defined for iterable<number_type>. Of course
    the vast majority of Python functions have type restrictions on their
    arguments (and will raise TypeErrors if these are violated). So I
    suspect we don't have to fall down the slippery slope and make all
    functions fully generic.

    Steve
    --
    You can wordify anything if you just verb it.
    - Bucky Katt, Get Fuzzy
  • Andrew Dalke at Sep 21, 2004 at 8:34 pm

    Steven Bethard wrote:
    Presumably you could change how sort works so that it first stratifies
    the list by type, and then sorts for each type. The types would then
    be ordered arbitrarily (perhaps by the id of their type?)... Might
    slow down the normal-case sort though...
    You've just about explained what Python does with the
    types themselves cannot be compared.

    It took a lot of work to get mostly correct. Consider
    an old bug which was seen when doing

    5 < [4]
    [4] < 3L
    3L < 4

    Is 5 < [4]? They can't be compared directly so Python
    used to compare type names. Is "int" < "list"? Yes.

    Is [4] < 3L? Those instances can't be compared but the
    type names can, so that is "list" < "long"? Yes.

    Is 3L < 4? Yes, because those two *can* be compared
    directly.

    Therefore I've just shown that 5 < 4. Any sort that
    falls back to comparisons based on type will have this
    problem because the sort ordering defined on instances
    does not need to agree with the ordering defined on
    types.

    As I recall, Python "fixed" it by using "" as the type
    name when doing a comparison that involves a numeric.
    It's still a hack and there are ways to get around it.

    Yeah, by restricting comparisons, we'd be basically saying that sort
    is only defined for lists that take the form list<some_type>.
    or where a compare function can be passed to .sort(), or
    where (in 2.4) a keys function can be passed to get the
    objects used for the comparison. Why is this a problem?

    Python's philosophy includes
    "In the face of ambiguity, refuse the temptation to guess."

    If you want sort to guess, tell it how to guess.


    Andrew
    dalke at dalkescientific.com
  • Steven Bethard at Sep 21, 2004 at 8:43 pm

    Andrew Dalke <adalke <at> mindspring.com> writes:
    Yeah, by restricting comparisons, we'd be basically saying that sort
    is only defined for lists that take the form list<some_type>.
    or where a compare function can be passed to .sort(), or
    where (in 2.4) a keys function can be passed to get the
    objects used for the comparison. Why is this a problem?
    It's not, for me at least -- I'm all for restricting sort in this way. I'm
    just trying to understand what problems other people are trying to solve that
    they think they'd need other behavior from sort, and in what ways they can
    work around the proposed limitations.

    Steve
  • Istvan Albert at Sep 21, 2004 at 6:26 pm

    Carlos Ribeiro wrote:

    sort() should work regardless of the list elements, and return a
    reasonable result, even if not strictly correct in the numerical
    sense.
    If the objects cannot be compared then there is no
    reasonable result. Getting them back in whatever order is not one.
    You're better off not sorting then. If half of your objects are
    sortable and the rest are not what should the result be?

    I don't think it is a bad idea disallowing the sort
    in those circumstances.

    Istvan.
  • Carlos Ribeiro at Sep 21, 2004 at 6:41 pm

    On Tue, 21 Sep 2004 14:26:54 -0400, Istvan Albert wrote:
    If the objects cannot be compared then there is no
    reasonable result. Getting them back in whatever order is not one.
    You're better off not sorting then. If half of your objects are
    sortable and the rest are not what should the result be?
    Well, there is a sizeable chunk of mathemathical theory dealing with
    sorting things that can't be directly compared -- google for
    topological sorting. It's used, for example, in graph theory. But
    that's off topic, I've mentioned it just to point to you that it's
    really dangerous to make assertions regarding what's reasonable or not
    with regards to sorting...


    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Cameron Laird at Sep 22, 2004 at 1:08 am
    In article <mailman.3672.1095792078.5135.python-list at python.org>,
    Carlos Ribeiro wrote:
    On Tue, 21 Sep 2004 14:26:54 -0400, Istvan Albert
    wrote:
    If the objects cannot be compared then there is no
    reasonable result. Getting them back in whatever order is not one.
    You're better off not sorting then. If half of your objects are
    sortable and the rest are not what should the result be?
    Well, there is a sizeable chunk of mathemathical theory dealing with
    sorting things that can't be directly compared -- google for
    topological sorting. It's used, for example, in graph theory. But
    that's off topic, I've mentioned it just to point to you that it's
    really dangerous to make assertions regarding what's reasonable or not
    with regards to sorting...
    .
    .
    .
    I thought I understood this thread, but now I'm *really* confused.
    Yes, mathematicians talk about partial orders. Mr. Ribeiro, I know
    you have good ideas. Are you saying that you do want Python to
    implement topological orders over all input sequences? In principle,
    I guess that's feasible. It strikes me as a specialty item; we're
    all probably best off to build in sorting as it is now, and leave
    toposorts to those who need them. I don't particularly think they'd
    confuse the masses. I just figure Pythonia's energy is better ap-
    plied elsewhere.
  • Carlos Ribeiro at Sep 22, 2004 at 1:53 am

    On Wed, 22 Sep 2004 01:08:05 GMT, Cameron Laird wrote:
    In article <mailman.3672.1095792078.5135.python-list at python.org>,
    Carlos Ribeiro wrote:
    On Tue, 21 Sep 2004 14:26:54 -0400, Istvan Albert
    wrote:
    If the objects cannot be compared then there is no
    reasonable result. Getting them back in whatever order is not one.
    You're better off not sorting then. If half of your objects are
    sortable and the rest are not what should the result be?
    Well, there is a sizeable chunk of mathemathical theory dealing with
    sorting things that can't be directly compared -- google for
    topological sorting. It's used, for example, in graph theory. But
    that's off topic, I've mentioned it just to point to you that it's
    really dangerous to make assertions regarding what's reasonable or not
    with regards to sorting...
    .
    .
    .
    I thought I understood this thread, but now I'm *really* confused.
    Yes, mathematicians talk about partial orders. Mr. Ribeiro, I know
    you have good ideas. Are you saying that you do want Python to
    implement topological orders over all input sequences? In principle,
    I guess that's feasible. It strikes me as a specialty item; we're
    all probably best off to build in sorting as it is now, and leave
    toposorts to those who need them. I don't particularly think they'd
    confuse the masses. I just figure Pythonia's energy is better ap-
    plied elsewhere.
    No, I was not talking seriously. I was partly joking with Albert's assertion:

    "If the objects cannot be compared then there is no reasonable result."

    ..because in topological sorting it's indeed possible to sort a set
    where some items can't be directly compared, and the result is
    reasonable. But I did not intend that to be taken as a serious
    proposition for Python.

    (in fact, I've stated that's "really dangerous to make assertions
    regarding what's reasonable or not with regards to sorting". I should
    have listened to my own advice :-) )

    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Andrew Dalke at Sep 21, 2004 at 8:01 pm
    Carlos Ribeiro suggested the following as a use case for
    requiring that any two Python objects be compare-able:
    Assume that you're implementing a spreadsheet like application in
    Python. The user fills a column with arbitrary data, and asks for it
    to be sorted. What is the sorting order? Excel, for instance, defines
    an ordering (it's arbitrary, but it's deterministic).
    In that case they most likely have a CellData object which
    can be compared with other CellData objects. To the list
    it's a bunch of homogenous objects.

    To sort then either the CellData objects define how to
    compare two of themselves, or there's a way to generate
    a value which used as the sort key.
    sort() should work regardless of the list elements, and return a
    reasonable result, even if not strictly correct in the numerical
    sense.
    L = [open("/etc/passwd"), {"A": 1}, urllib.urlopen("http://python.org"),
    IOException(), Tkinter.Toplevel(), PIL.Image.open("my.jpg")]

    L.sort()

    What's a reasonable sort here?

    All solutions eventually end up comparing object ids when
    everything else fails. But that breaks another reasonable
    argument which is that

    L1 = [ .. some list with items that can't be compared .. ]
    L1.sort()
    L2 = pickle.loads(pickle.dumps(L1))
    L2.sort()
    assert L1 == L2

    This doesn't work because the reconstituted list will
    have different object ids, which might be in a different order.

    What should this do?

    class NoCompare:
    def __cmp__(self, other): raise SystemExit("no compare")
    __lt__ = __gt__ = __eq__ = __neq__ = .... = __cmp

    L3 = [NoCompare(), NoCompare(), NoCompare()]
    L3.sort()


    The set, in this particular case,
    is a Python list, that *can* contain arbitrary data. So it does not
    make sense (in my not-so-humble opinion) for sort to impose
    restrictions based on the list element type.
    By that definition you require that all other list methods
    can work on all data. Consider
    class NoCompare:
    ... def __eq__(self, other): raise AssertionError("no compare")
    ...
    L = [1, 3, NoCompare(), 5]
    3 in L
    True
    5 in L
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "<stdin>", line 2, in __eq__
    AssertionError: no compare
    >>>

    the "__contains__" method imposes a restriction that types
    can be compared for equality.

    (BTW, if we extend this reasoning, the same could be said for other
    types of functions that work over sets -- sum() should ignore
    non-numeric values, etc. But that's another philosophical battle)
    I think my example is more relevant. Sum() explicitly
    says it only works on numeric types and gives justification
    for why that choice was made.

    Andrew
    dalke at dalkescientific.com
  • Batista, Facundo at Sep 21, 2004 at 6:14 pm
    [Carlos Ribeiro]

    #- Assume that you're implementing a spreadsheet like application in
    #- Python. The user fills a column with arbitrary data, and asks for it
    #- to be sorted. What is the sorting order? Excel, for instance, defines
    #- an ordering (it's arbitrary, but it's deterministic).

    You could also do that passing a function to sort().


    #- sort() should work regardless of the list elements, and return a
    #- reasonable result, even if not strictly correct in the numerical
    #- sense.

    Define "reasonable result" with different datatypes.

    . Facundo
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://mail.python.org/pipermail/python-list/attachments/20040921/4f497e31/attachment.html
  • Carlos Ribeiro at Sep 21, 2004 at 6:35 pm
    ----- Original Message -----
    From: Batista, Facundo <fbatista at unifon.com.ar>
    You could also do that passing a function to sort().
    I know it, and that's exactly my question -- if this is going to be
    the way to do it in Python 3000. Today (2.4) there are FOUR ways to
    make it work: passing a compare function as an argument to sort(),
    passing a key funciton as an argument to sort(), implementing a
    __cmp__ function, or implementing the rich comparison methods. If the
    main goal of Py3K is to have only one obvious way to do it, what one
    is it going to be? I think that the best way should be the simplest
    one: make sort() work whatever is passed to it in a reasonably way,
    and decide to have one preferred way to extend its behavior.
    Define "reasonable result" with different datatypes.
    Now that's a harder question :-) Any ordering for multiple types is
    going to be arbitrary -- but it still may be considered reasonable if
    it reflects our common sense when it comes to ordering. In some cases,
    it's still a matter of choice, but once decided, it's deterministic,
    and that's what it takes for sorting.

    --
    http://mail.python.org/mailman/listinfo/python-list





    --
    Carlos Ribeiro
    Consultoria em Projetos
    blog: http://rascunhosrotos.blogspot.com
    blog: http://pythonnotes.blogspot.com
    mail: carribeiro at gmail.com
    mail: carribeiro at yahoo.com
  • Andrew Dalke at Sep 21, 2004 at 8:54 pm

    Carlos Ribeiro wrote:

    I know it, and that's exactly my question -- if this is going to be
    the way to do it in Python 3000. Today (2.4) there are FOUR ways to
    make it work: passing a compare function as an argument to sort(),
    passing a key funciton as an argument to sort(), implementing a
    __cmp__ function, or implementing the rich comparison methods. If the
    main goal of Py3K is to have only one obvious way to do it, what one
    is it going to be? I think that the best way should be the simplest
    one: make sort() work whatever is passed to it in a reasonably way,
    and decide to have one preferred way to extend its behavior.
    Get rid of __cmp__.

    Then there will be:
    one native way to compare two objects to each other,
    one way to override how to compare two objects
    one way to do a Schwartzian transform

    And regarding the last, "practicality beats purity."
    Now that's a harder question :-) Any ordering for multiple types is
    going to be arbitrary -- but it still may be considered reasonable if
    it reflects our common sense when it comes to ordering. In some cases,
    it's still a matter of choice, but once decided, it's deterministic,
    and that's what it takes for sorting.
    So long as you allow user-defined comparisons then there's
    no guarantee that it's consistent, much less deterministic.

    I can define __cmp__ to do whatever I want.
    class Strange:
    ... def __cmp__(self, other):
    ... if isinstance(other, float): return -1
    ... else: return 1
    ...
    x = Strange()
    10 < x < 5.0
    True
    >>>

    or for something non-deterministic

    class WaitCompare:
    def __init__(self, val): self.val = val
    def __cmp__(self, other):
    while 1:
    page = urllib.urlopen("http://whitehouse.gov/").read()
    if page.find(self.val):
    break
    return cmp(self.val, other)

    L = [WaitCompare("Andrew Dalke"),
    WaitCompare("America's New Communist Leaders"),
    WaitCompare("Life discovered on Mercury")]
    L.sort()


    Andrew
    dalke at dalkescientific.com
  • Steven Bethard at Sep 21, 2004 at 9:11 pm

    Andrew Dalke <adalke <at> mindspring.com> writes:
    Get rid of __cmp__.
    Do you think sort should lose its cmp keyword argument too? Seems that might
    also address Carlos Ribeiro's 4-ways-to-customize-sort problem:

    Carlos Ribeiro wrote:
    As of
    today, there are 4 different ways to customize sorting behavior:

    1) passing a comparison function to sort();
    2) passing a key generation function to sort();
    3) implementing __cmp__;
    4) implementing rich comparison methods.
    There are things you can do with the cmp keyword arugment that you can't do
    with the key keyword argument... Though I'm not sure how many of these things
    that you *can* do you actually *want* to do...

    Steve
  • Alex Martelli at Sep 23, 2004 at 1:28 pm
    Steven Bethard wrote:
    ...
    today, there are 4 different ways to customize sorting behavior:

    1) passing a comparison function to sort();
    2) passing a key generation function to sort();
    3) implementing __cmp__;
    4) implementing rich comparison methods.
    There are things you can do with the cmp keyword arugment that you can't do
    with the key keyword argument... Though I'm not sure how many of these things
    that you *can* do you actually *want* to do...
    Considering that key= can be any kind of callable, I don't think it's
    true that there are things you can do with cmp= that you can't do with
    key=, taking the argument to the letter. Consider, for example
    (warning: untested...):

    def masquerade_a_cmp_as_a_key(acmp):
    class called_as_key(object):
    def __init__(self, obj): self.obj = obj
    def __cmp__(self, other): return acmp(self.obj, other.obj)
    return called_as_key

    now, anylist.sort(key=masquerade_a_cmp_as_a_key(foo)) should do just the
    same thing (albeit a tad slower) as anylist.sort(cmp=foo), I believe.


    Alex
  • Steven Bethard at Sep 23, 2004 at 3:36 pm
    Alex Martelli <aleaxit <at> yahoo.com> writes:
    <snip code masquerading cmp as key>
    now, anylist.sort(key=masquerade_a_cmp_as_a_key(foo)) should do just the
    same thing (albeit a tad slower) as anylist.sort(cmp=foo), I believe.
    Good point. It also makes the better point that trying to use key= when you
    really want cmp= would be pretty awkward. And examples like Andrew Dalke's
    file comparator (that compares file size before comparing content) would be
    more awkward to write this way. (If nothing else, he'd have to use your
    wrapper function.)

    I'm convinced. ;)

    STeve
  • Alex Martelli at Sep 23, 2004 at 12:28 pm
    Carlos Ribeiro wrote:
    ...
    You could also do that passing a function to sort().
    I know it, and that's exactly my question -- if this is going to be
    the way to do it in Python 3000. Today (2.4) there are FOUR ways to
    make it work: passing a compare function as an argument to sort(),
    passing a key funciton as an argument to sort(), implementing a
    __cmp__ function, or implementing the rich comparison methods. If the
    What you can do with key= and with cmp= are not exactly the same things.
    key= normally allows WAY faster sorts, when applicable, thanks to
    decorate-sort-undecorate... but that only really holds when the key is a
    simple structure of elementary types using builtin comparisons.
    Besides, speed isn't all: sometimes extracting a key is conceptually
    simpler, sometimes it's simpler to express the desired comparison in
    operational terms. Speed isn't everything: simplicity matters, too.

    A simplification that I think can be made is to have sorting hinge
    strictly on __lt__, the < operator. That will, however, no doubt still
    leave indirect avenues open. For example, it would be somewhat
    surprising if object didn't have a default __lt__ delegating to __cmp__,
    so it would still be possible, in your own types, to define either
    __lt__, or __cmp__. Much like the distinctions between __add__,
    __iadd__, __radd__, I suspect these conveniences WILL remain.
    main goal of Py3K is to have only one obvious way to do it, what one
    _Preferably_ only one, just like it's always been the case in Python.

    But practicality beats purity. You will still be allowed, I predict, to
    sum two numbers by a+b, b+a, sum([a,b]), operator.add(a,b), and other
    ways yet; trying to forbid some of these would be impractical.
    is it going to be? I think that the best way should be the simplest
    one: make sort() work whatever is passed to it in a reasonably way,
    Errors should not pass silently, unless explicitly silenced. MOST of
    the time, a poor hapless user trying to compare 23 with "42", just like
    one trying to sum these objects, is making a mistake. Handing such
    likely errors as if they weren't errors is NOT doing the poor hapless
    user good service. Thus, I find it quite reasonable that the _default_
    behavior of sort be that desired well over 90% of the time, for
    comparisons happening within sort just as well as for others: raise an
    exception if I'm trying to compare strings with numbers (and the like).
    For that less-than-10% of the time where I really _want_ to compare
    _anything_, passing some kind of explicit indicator such as a keyword
    parameter is just fine. And I think the right way to do it is to have
    an "universal_lt" off in a module somewhere and pass THAT when required.

    It's hard to gauge exactly how frequently I need universal comparisons
    (rather than type-specific ones), but I'd guess _maybe_ 5%, if that.
    Sacrificing error detection in 95% of the cases for the sake of removing
    the need to be explicit about the other 5% of the cases would be a very
    grave design error, in my opinion. Making that "other 5%" any more
    problematic for the user than passing a simple keyword parameter or the
    like would also be a mistake, but perhaps one of lesser importance than
    what I think you're proposing -- universal comparisons _by default_.
    and decide to have one preferred way to extend its behavior.
    A key= returning a simple structure (tuple or list), made up of
    elementary items to compare 'naturally' by built-in means, is the
    preferred way _when applicable_, for speed and simplicity, but it's not
    always so applicable; therefore, sometimes other ways to tweak (not
    necessarily 'extend'!) sorting will be better.

    Asking for "one preferred way" to apply ALL of the times is like asking
    for "one preferred arithmetic operator" -- can't you see it WILL depend
    on what it IS that you're trying to do?! If I had to nominate one
    operator as "preferred" it would be addition -- most frequently, it
    serves your typical purposes best -- but it's not all THAT infrequent to
    meet a problem that's best solved by dividing instead, is it?

    Define "reasonable result" with different datatypes.
    Now that's a harder question :-) Any ordering for multiple types is
    going to be arbitrary -- but it still may be considered reasonable if
    it reflects our common sense when it comes to ordering. In some cases,
    That's probably too difficult, since there is no guarantee that our
    common sense (were it to be determined) would be mathematically
    consistent. I can see how "by common sense" one might want a list made
    of characters to compare consistently with other sequences made of
    characters, such as tuples, strings, arrays, for example -- but it can't
    be mathematically consistent with other "common sense" ideas about
    comparing other groups of sequences.

    "Common sense" would tell most people that 23<'42'<99, but that is
    another example of where consistency probably can't be reached. Etc,
    etc.


    Alex
  • Robert Brewer at Sep 21, 2004 at 6:25 pm

    Carlos Ribeiro wrote:
    On Tue, 21 Sep 2004 17:24:48 +0000 (UTC), Steven Bethard
    wrote:
    Could you give an example of a list that you'd like to do
    this to? I'm still
    having trouble imagining a list of disparate types that I
    call sort on...

    Assume that you're implementing a spreadsheet like application in
    Python. The user fills a column with arbitrary data, and asks for it
    to be sorted. What is the sorting order? Excel, for instance, defines
    an ordering (it's arbitrary, but it's deterministic).
    ...which should tell you it's application-specific, and shouldn't be in
    the core. "In the face of ambiguity, refuse the temptation to guess."


    Robert Brewer
    MIS
    Amor Ministries
    fumanchu at amor.org
  • Delaney, Timothy C (Timothy) at Sep 21, 2004 at 10:13 pm

    Carlos Ribeiro wrote:

    Just to make it clear, here it is how it works now:
    a = [ 3.5, -1.0, "", (0,1), None, "z"]
    a.sort()
    a
    [None, -1.0, 3.5, '', 'z', (0, 1)]
    Add a complex to that list and watch the result.

    Not all lists are sortable *now*. The change in Python 3.0 is to make it
    much more obvious that you *can't* just sort a list of unknown types.

    Tim Delaney
  • David Bolen at Sep 22, 2004 at 5:56 am

    "Delaney, Timothy C (Timothy)" <tdelaney at avaya.com> writes:

    Carlos Ribeiro wrote:
    Just to make it clear, here it is how it works now:
    a = [ 3.5, -1.0, "", (0,1), None, "z"]
    a.sort()
    a
    [None, -1.0, 3.5, '', 'z', (0, 1)]
    Add a complex to that list and watch the result.
    Or any user defined type that performs operations in the comparison
    functions that might end up raising an exception or rejecting some
    comparisons.
    Not all lists are sortable *now*. The change in Python 3.0 is to make it
    much more obvious that you *can't* just sort a list of unknown types.
    While I used to treat the "sort heterogeneous lists" as an attractive
    quality of the inter-type comparisons, I've pretty much come around to
    thinking that it's not worth it, particular since as you say, with
    rich comparisons you can't guarantee such behavior any more anyway.

    But what really emphasized it for me is having seen multiple new
    Python programmers get burned by Python silently comparing strings to
    numbers (where they forgot to convert user input as a string into a
    numeric type), but not in the way they expect, which can be a subtle
    failure. I think an exception in such cases is actually more Pythonic
    since it always feels like a mini-wart when I explain it to them.
    I've wondered if the legality of inter-type comparisons was really a
    design intention, or more a consequence of the inability of the
    earlier interpreter versions to have exceptions raised during the
    comparison process.

    -- David

Related Discussions

People

Translate

site design / logo © 2022 Grokbase