FAQ
Before the Python 3.5 feature freeze, I should step-up and
formally reject PEP 455 for "Adding a key-transforming
dictionary to collections".


I had completed an involved review effort a long time ago
and I apologize for the delay in making the pronouncement.


What made it a interesting choice from the outset is that the
idea of a "transformation" is an enticing concept that seems
full of possibility. I spent a good deal of time exploring
what could be done with it but found that it mostly fell short
of its promise.


There were many issues. Here are some that were at the top:


* Most use cases don't need or want the reverse lookup feature
   (what is wanted is a set of one-way canonicalization functions).
   Those that do would want to have a choice of what is saved
   (first stored, last stored, n most recent, a set of all inputs,
   a list of all inputs, nothing, etc). In database terms, it
   models a many-to-one table (the canonicalization or
   transformation function) with the one being a primary key into
   another possibly surjective table of two columns (the
   key/value store). A surjection into another surjection isn't
   inherently reversible in a useful way, nor does it seem to be a
   common way to model data.


* People are creative at coming up with using cases for the TD
   but then find that the resulting code is less clear, slower,
   less intuitive, more memory intensive, and harder to debug than
   just using a plain dict with a function call before the lookup:
   d[func(key)]. It was challenging to find any existing code
   that would be made better by the availability of the TD.


* The TD seems to be all about combining data scrubbing
   (case-folding, unicode canonicalization, type-folding, object
   identity, unit-conversion, or finding a canonical member of an
   equivalence class) with a mapping (looking-up a value for a
   given key). Those two operations are conceptually orthogonal.
   The former doesn't get easier when hidden behind a mapping API
   and the latter loses the flexibility of choosing your preferred
   mapping (an ordereddict, a persistentdict, a chainmap, etc) and
   the flexibility of establishing your own rules for whether and
   how to do a reverse lookup.




Raymond Hettinger




P.S. Besides the core conceptual issues listed above, there
are a number of smaller issues with the TD that surfaced
during design review sessions. In no particular order, here
are a few of the observations:


* It seems to require above average skill to figure-out what
   can be used as a transform function. It is more
   expert-friendly than beginner friendly. It takes a little
   while to get used to it. It wasn't self-evident that
   transformations happen both when a key is stored and again
   when it is looked-up (contrast this with key-functions for
   sorting which are called at most once per key).


* The name, TransformDict, suggests that it might transform the
   value instead of the key or that it might transform the
   dictionary into something else. The name TransformDict is so
   general that it would be hard to discover when faced with a
   specific problem. The name also limits perception of what
   could be done with it (i.e. a function that logs accesses
   but doesn't actually change the key).


* The tool doesn't self describe itself well. Looking at the
   help(), or the __repr__(), or the tooltips did not provide
   much insight or clarity. The dir() shows many of the
   _abc implementation details rather than the API itself.


* The original key is stored and if you change it, the change
   isn't stored. The _original dict is private (perhaps to
   reduce the risk of putting the TD in an inconsistent state)
   but this limits access to the stored data.


* The TD is unsuitable for bijections because the API is
   inherently biased with a rich group of operators and methods
   for forward lookup but has only one method for reverse lookup.


* The reverse feature is hard to find (getitem vs __getitem__)
   and its output pair is surprising and a bit awkward to use.
   It provides only one accessor method rather that the full
   dict API that would be given by a second dictionary. The
   API hides the fact that there are two underlying dictionaries.


* It was surprising that when d[k] failed, it failed with
   transformation exception rather than a KeyError, violating
   the expectations of the calling code (for example, if the
   transformation function is int(), the call d["12"]
   transforms to d[12] and either succeeds in returning a value
   or in raising a KeyError, but the call d["12.0"] fails with
   a TypeError). The latter issue limits its substitutability
   into existing code that expects real mappings and for
   exposing to end-users as if it were a normal dictionary.


* There were other issues with dict invariants as well and
   these affected substitutability in a sometimes subtle way.
   For example, the TD does not work with __missing__().
   Also, "k in td" does not imply that "k in list(td.keys())".


* The API is at odds with wanting to access the transformations.
   You pay a transformation cost both when storing and when
   looking up, but you can't access the transformed value itself.
   For example, if the transformation is a function that scrubs
   hand entered mailing addresses and puts them into a standard
   format with standard abbreviations, you have no way of getting
   back to the cleaned-up address.


* One design reviewer summarized her thoughts like this:
   "There is a learning curve to be climbed to figure out what
   it does, how to use it, and what the applications [are].
   But, the [working out the same] examplea with plain dicts
   requires only basic knowledge." -- Patricia

Search Discussions

  • Guido van Rossum at May 14, 2015 at 2:41 pm
    Thanks for this thorough review, Raymond! Especially the user research is
    amazing.


      And thanks for Antoine for writing the PEP -- you never know how an idea
    pans out until you've tried it.


    --Guido


    On Thu, May 14, 2015 at 7:29 AM, Raymond Hettinger wrote:

    Before the Python 3.5 feature freeze, I should step-up and
    formally reject PEP 455 for "Adding a key-transforming
    dictionary to collections".

    I had completed an involved review effort a long time ago
    and I apologize for the delay in making the pronouncement.

    What made it a interesting choice from the outset is that the
    idea of a "transformation" is an enticing concept that seems
    full of possibility. I spent a good deal of time exploring
    what could be done with it but found that it mostly fell short
    of its promise.

    There were many issues. Here are some that were at the top:

    * Most use cases don't need or want the reverse lookup feature
    (what is wanted is a set of one-way canonicalization functions).
    Those that do would want to have a choice of what is saved
    (first stored, last stored, n most recent, a set of all inputs,
    a list of all inputs, nothing, etc). In database terms, it
    models a many-to-one table (the canonicalization or
    transformation function) with the one being a primary key into
    another possibly surjective table of two columns (the
    key/value store). A surjection into another surjection isn't
    inherently reversible in a useful way, nor does it seem to be a
    common way to model data.

    * People are creative at coming up with using cases for the TD
    but then find that the resulting code is less clear, slower,
    less intuitive, more memory intensive, and harder to debug than
    just using a plain dict with a function call before the lookup:
    d[func(key)]. It was challenging to find any existing code
    that would be made better by the availability of the TD.

    * The TD seems to be all about combining data scrubbing
    (case-folding, unicode canonicalization, type-folding, object
    identity, unit-conversion, or finding a canonical member of an
    equivalence class) with a mapping (looking-up a value for a
    given key). Those two operations are conceptually orthogonal.
    The former doesn't get easier when hidden behind a mapping API
    and the latter loses the flexibility of choosing your preferred
    mapping (an ordereddict, a persistentdict, a chainmap, etc) and
    the flexibility of establishing your own rules for whether and
    how to do a reverse lookup.


    Raymond Hettinger


    P.S. Besides the core conceptual issues listed above, there
    are a number of smaller issues with the TD that surfaced
    during design review sessions. In no particular order, here
    are a few of the observations:

    * It seems to require above average skill to figure-out what
    can be used as a transform function. It is more
    expert-friendly than beginner friendly. It takes a little
    while to get used to it. It wasn't self-evident that
    transformations happen both when a key is stored and again
    when it is looked-up (contrast this with key-functions for
    sorting which are called at most once per key).

    * The name, TransformDict, suggests that it might transform the
    value instead of the key or that it might transform the
    dictionary into something else. The name TransformDict is so
    general that it would be hard to discover when faced with a
    specific problem. The name also limits perception of what
    could be done with it (i.e. a function that logs accesses
    but doesn't actually change the key).

    * The tool doesn't self describe itself well. Looking at the
    help(), or the __repr__(), or the tooltips did not provide
    much insight or clarity. The dir() shows many of the
    _abc implementation details rather than the API itself.

    * The original key is stored and if you change it, the change
    isn't stored. The _original dict is private (perhaps to
    reduce the risk of putting the TD in an inconsistent state)
    but this limits access to the stored data.

    * The TD is unsuitable for bijections because the API is
    inherently biased with a rich group of operators and methods
    for forward lookup but has only one method for reverse lookup.

    * The reverse feature is hard to find (getitem vs __getitem__)
    and its output pair is surprising and a bit awkward to use.
    It provides only one accessor method rather that the full
    dict API that would be given by a second dictionary. The
    API hides the fact that there are two underlying dictionaries.

    * It was surprising that when d[k] failed, it failed with
    transformation exception rather than a KeyError, violating
    the expectations of the calling code (for example, if the
    transformation function is int(), the call d["12"]
    transforms to d[12] and either succeeds in returning a value
    or in raising a KeyError, but the call d["12.0"] fails with
    a TypeError). The latter issue limits its substitutability
    into existing code that expects real mappings and for
    exposing to end-users as if it were a normal dictionary.

    * There were other issues with dict invariants as well and
    these affected substitutability in a sometimes subtle way.
    For example, the TD does not work with __missing__().
    Also, "k in td" does not imply that "k in list(td.keys())".

    * The API is at odds with wanting to access the transformations.
    You pay a transformation cost both when storing and when
    looking up, but you can't access the transformed value itself.
    For example, if the transformation is a function that scrubs
    hand entered mailing addresses and puts them into a standard
    format with standard abbreviations, you have no way of getting
    back to the cleaned-up address.

    * One design reviewer summarized her thoughts like this:
    "There is a learning curve to be climbed to figure out what
    it does, how to use it, and what the applications [are].
    But, the [working out the same] examplea with plain dicts
    requires only basic knowledge." -- Patricia
    _______________________________________________
    Python-Dev mailing list
    Python-Dev at python.org
    https://mail.python.org/mailman/listinfo/python-dev
    Unsubscribe:
    https://mail.python.org/mailman/options/python-dev/guido%40python.org





    --
    --Guido van Rossum (python.org/~guido)
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-dev/attachments/20150514/be3bf4d0/attachment.html>
  • Nick Coghlan at May 14, 2015 at 4:56 pm

    On 15 May 2015 at 00:41, Guido van Rossum wrote:
    Thanks for this thorough review, Raymond! Especially the user research is
    amazing.

    And thanks for Antoine for writing the PEP -- you never know how an idea
    pans out until you've tried it.

    Hear, hear! I thought the TransformDict idea sounded interesting when
    Antoine proposed it, but Raymond's rationale for the rejection makes a
    great deal of sense.


    Regards,
    Nick.


    --
    Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-dev @
categoriespython
postedMay 14, '15 at 2:29p
activeMay 14, '15 at 4:56p
posts3
users3
websitepython.org

People

Translate

site design / logo © 2017 Grokbase