FAQ
I've gotten lots of feedback on the itertools module
but have not heard a peep about the new sets module.

* Are you overjoyed/outraged by the choice of | and &
as set operators (instead of + and *)?

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

* Is there a compelling need for additional set methods like
Set.powerset() and Set.isdisjoint(s) or are the current
offerings sufficient?

* Does the performance meet your expectations?

* Do you care that sets can only contain hashable elements?

* How about the design constraint that the argument to most
set methods must be another Set (as opposed to any iterable)?

* Are the docs clear? Can you suggest improvements?

* Are sets helpful in your daily work or does the need arise
only rarely?


User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).


Raymond Hettinger

Search Discussions

  • Troels Therkelsen at Aug 12, 2003 at 8:56 am

    In article <Jp%Za.256$jw4.238 at nwrdny03.gnilink.net>, Raymond Hettinger wrote:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.
    I would have to say that while I have looked at the sets module and read its
    documentation, I have not used it much (more below).
    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I actually prefer the | & syntax because it makes more intuitive sense to
    me, as the operators work identically to how they do when using them as
    binary number operators.

    [snip]
    * Do you care that sets can only contain hashable elements?
    No.

    [snip]
    * Are the docs clear? Can you suggest improvements?
    The docs for the sets module, like most of the Python docs, are very good.
    The example helps, too.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    I rarely need sets explicitly, but sometimes need the logic that they offer.
    For example, when wanting to concantenate two lists so that the resulting list
    only has unique elements, that can be done with set logic. However, as it
    is another module written in Python, and I only need the logic I usually don't
    go through with the overhead of creating the Set classes, etc. If it was
    better integrated (ie., a set() builtin type constructor like int(), str(),
    etc) then I would feel less of a reservation against using sets. I know this
    reservation isn't founded in facts, but more in the feeling of trusting
    builtin types more than custom classes provided by a module.
    User feedback is essential to determining the future direction
    of sets (whether it will be implemented in C, change API,
    and/or be given supporting language syntax).
    Don't know if my feedback provided above helps, but you asked for it ;-)

    Regards,

    Troels Therkelsen
  • Carl Banks at Aug 12, 2003 at 9:09 am

    Raymond Hettinger wrote:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I slightly favor | and &.

    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?

    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    I imagine isdisjoint would be useful, although it's easy enough to use
    bool(s&t).

    * Does the performance meet your expectations?

    * Do you care that sets can only contain hashable elements?

    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?

    * Are the docs clear? Can you suggest improvements?
    Yeah: in the library reference, the table entry for s.union(t) should
    say "synonym of s|t" instead of repeating the description. This is
    especially true because it's not clear from a simple glance whether
    "s.union(t)" goes with "s|t" or "s&t", because it sits right between
    the two. Better yet, I would change it to a three-column table
    (operation, synonym, result).

    * Are sets helpful in your daily work or does the need arise
    only rarely?
    I haven't used sets yet (or Python 2.3), but I expect to use them a
    lot. However, I imagine my typical use would be efficient testing for
    membership. I have maybe half a dozen places where I use a dictionary
    for that now.



    --
    CARL BANKS
    "You don't run Microsoft Windows. Microsoft Windows runs you."
  • Raymond Hettinger at Aug 16, 2003 at 12:39 am
    * Are the docs clear? Can you suggest improvements?
    [Carl Banks]
    Yeah: in the library reference, the table entry for s.union(t) should
    say "synonym of s|t" instead of repeating the description. This is
    especially true because it's not clear from a simple glance whether
    "s.union(t)" goes with "s|t" or "s&t", because it sits right between
    the two. Better yet, I would change it to a three-column table
    (operation, synonym, result).
    Done!


    Raymond Hettinger
  • Jimmy Retzlaff at Aug 12, 2003 at 11:15 am

    Raymond Hettinger (vze4rx4y at verizon.net) wrote:
    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I never even thought about it until I saw this question. Either set of
    operators makes sense to me.
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    I have not used a set of sets.
    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    It's been sufficient for my needs so far.
    * Does the performance meet your expectations?
    My expectations have been met for everything I've thrown at it so far.
    The biggest set I've dealt with so far was several hundred thousand
    tuples, each containing 2 floats. For massive data sets (tens of
    millions) requiring high performance I've used kjBuckets in the past. I
    haven't had a chance to compare the performance of kjBuckets and sets
    yet, but kjBuckets will be going away for me if the performance of sets
    is even close.
    * Do you care that sets can only contain hashable elements?
    Nope, everything I tend to use is either numeric/string types or tuples
    of those types.
    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?
    This caught me the first time or two, but it immediately seemed
    reasonable.
    * Are the docs clear? Can you suggest improvements?
    I haven't read them. I sucked sets.py out of CVS a while ago for use
    with Python 2.2.x and so I used the source as docs. I haven't even
    looked in the source for quite a while as things seem pretty obvious to
    me at this point.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    The need for them arises a couple times a week for me. I do a fair
    amount of data manipulation in SQL databases with Python. There are
    things that are natural in SQL and other things that are natural in
    Python. Having an easy to use sets type affords a greater overlap
    between SQL and Python which gives me more choices when deciding how to
    manipulate my data.

    Thanks to everyone involved for the good work.

    Jimmy
  • Istvan Albert at Aug 12, 2003 at 4:24 pm
    Raymond Hettinger wrote:

    First of all, thanks for the work on it, I need to use sets
    in my work all the time. I had written my own
    (simplistic) implementation but that adds another layer
    of headaches when distributing programs since then
    I have to distribute multiple modules.

    Sometimes I ended up with a little set function in every
    big module. Pretty silly. For me sets are a greatly useful
    addition.
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    One pattern that I constantly need is to remove duplicates from
    a sequence. I don't know if this an often enough used pattern to
    warrant an API change, for me it would be most useful if I could
    get the contents of a set as a sequence right away, without having to
    explicitly code it.
    * Are you overjoyed/outraged by the choice of | and & as
    set operators (instead of + and *)?
    I think that since you have have - as a difference operator it
    would make sense to also have + as a union operator. Takes nothing
    away from |. The & operator is the right one, * would not be appropriate
    IMO.
    * Do you care that sets can only contain hashable elements?
    I don't really care, on the other hand, it might be better to call the
    class HashSet, so that it conveys right away that it uses hashing
    to store the elements.
    * Are the docs clear? Can you suggest improvements?
    I wondered whether it would be better to specify the immutability
    of the class at the constructor level.

    Then there is the update method. It feels a little bit redundant
    since there is an add() method that seems to be doing the same thing
    only that add() adds only one element at a time.
    Would it be possible to have add() handle all additions, iterable or
    not, then scrap update() altogether.

    Then just by looking at the docs, it feels a little bit confusing to
    have discard() and remove() do essentially the same thing but only one
    of them raising an exception. Which one? I already forgot. I don't know
    which one I would prefer though.

    Another aspect that I did not understand, what is difference between
    update() and union_update().

    The long winded method names, such as difference_update() also feel
    redundant when one can achieve the same thing with the -= operator. I
    would drop these and instead show in the docs how to accomplish these
    with the operators. Would considerably cut down on the documentation,
    and apparent complexity.

    I'm a big fan of having the minimal number of methods as long it is
    easy to obtain the result.

    For example methods like x.issubset(y) is the same as bool(x-y) so may
    not be all that necessary, just a thought.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    I use them very often and they are extremely useful.

    thanks again,

    Istvan.
  • Terry Reedy at Aug 15, 2003 at 3:50 pm
    "Raymond Hettinger" <vze4rx4y at verizon.net> wrote in message
    news:3b__a.9694$u%2.7778 at nwrdny02.gnilink.net...
    "Istvan Albert"
    Then just by looking at the docs, it feels a little bit
    confusing to
    have discard() and remove() do essentially the same thing but only
    one
    of them raising an exception. Which one? I already forgot. I don't
    know
    which one I would prefer though.
    I agree that this is confusing -- like having both str.find and
    str.index. I would prefer one delete function with an optional param
    'silent' to switch its 'not there' response from the default (either
    True or False, according to what seems to be the more common usage) to
    the other choice. (I know, I should have read draft more carefully
    and commented last fall -- but this seems like the sort of redundancy
    that Guido wants to remove in 3.0.)

    Terry J. Reedy
  • Christos TZOTZIOY Georgiou at Aug 20, 2003 at 3:15 am
    On Tue, 12 Aug 2003 12:24:40 -0400, rumours say that Istvan Albert
    <ialbert at mailblocks.com> might have written:
    Then there is the update method. It feels a little bit redundant
    since there is an add() method that seems to be doing the same thing
    only that add() adds only one element at a time.
    Would it be possible to have add() handle all additions, iterable or
    not, then scrap update() altogether.
    If you have a Set set, then these two are quite different:

    set.add("hello")
    set.update("hello")
    --
    TZOTZIOY, I speak England very best,
    Microsoft Security Alert: the Matrix began as open source.
  • Radovan Garabik at Aug 13, 2003 at 7:24 am

    Raymond Hettinger wrote:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I would prefer to have + in addition to |. I do not care
    about *, but IMHO + is so intuitive and natural that
    it is a pity not to have it (you can add lists, tuples,
    strings with +, but not sets???)
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    So far my code used dictionaries with values set to None,
    I expect that I will use sets soon, it will be more logical
    and the code more readable.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    daily work, production code, but so far I used lists or
    dictionaries instead.
    I am somewhat afraid of performance, until setrs are implemented
    in C, though

    --
    -----------------------------------------------------------
    Radovan Garab?k http://melkor.dnp.fmph.uniba.sk/~garabik/ |
    __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
    -----------------------------------------------------------
    Antivirus alert: file .signature infected by signature virus.
    Hi! I'm a signature virus! Copy me into your signature file to help me spread!
  • Russell E. Owen at Aug 13, 2003 at 4:49 pm
    In article <bhcp3q$10918l$1 at ID-89407.news.uni-berlin.de>,
    garabik-news-2002-02 at kassiopeia.juls.savba.sk (Radovan Garabik) wrote:
    Raymond Hettinger wrote:
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    So far my code used dictionaries with values set to None,
    I expect that I will use sets soon, it will be more logical
    and the code more readable.
    Same here.

    I don't rely on sets heavily (I do have a few implemented as
    dictionaries with value=None) and am not yet ready to make my users
    upgrade to Python 2.3.

    I suspect the upgrade issue will significantly slow the incorporation of
    sets and the other new modules, but that over time they're likely to
    become quite popular. I am certainly looking forward to using sets and
    csv.

    I think it'd speed the adoption of new modules if they were explicitly
    written to be compatible with one previous generation of Python (and
    documented as such) so users could manually include them with their code
    until the current generation of Python had a bit more time to be adopted.

    I'm not saying this should be a rule, only suggesting it as a useful
    goal. Presumably it'd be easy with some modules and not worth the work
    in some cases.

    -- Russell
  • Skip Montanaro at Aug 13, 2003 at 6:07 pm
    Russell> I suspect the upgrade issue will significantly slow the
    Russell> incorporation of sets and the other new modules, but that over
    Russell> time they're likely to become quite popular. I am certainly
    Russell> looking forward to using sets and csv.

    The csv module (and the _csv module which underpins it) should work with
    2.2.3. If they don't, please file a bug report.

    Russell> I think it'd speed the adoption of new modules if they were
    Russell> explicitly written to be compatible with one previous
    Russell> generation of Python (and documented as such) so users could
    Russell> manually include them with their code until the current
    Russell> generation of Python had a bit more time to be adopted.

    That was the intention with the csv module. I wonder if some limitations to
    use of sets with 2.2.x could be gotten around by adding a __future__ import?
    Maybe itertools is also needed.

    Russell> I'm not saying this should be a rule, only suggesting it as a
    Russell> useful goal. Presumably it'd be easy with some modules and not
    Russell> worth the work in some cases.

    Yes, that's a worthwhile goal.

    Skip
  • Raymond Hettinger at Aug 14, 2003 at 5:45 am
    "Skip Montanaro" <skip at pobox.com> wrote in message
    news:mailman.1060798102.26105.python-list at python.org...
    Russell> I suspect the upgrade issue will significantly slow the
    Russell> incorporation of sets and the other new modules, but that over
    Russell> time they're likely to become quite popular. I am certainly
    Russell> looking forward to using sets and csv.

    The csv module (and the _csv module which underpins it) should work with
    2.2.3. If they don't, please file a bug report.

    Russell> I think it'd speed the adoption of new modules if they were
    Russell> explicitly written to be compatible with one previous
    Russell> generation of Python (and documented as such) so users could
    Russell> manually include them with their code until the current
    Russell> generation of Python had a bit more time to be adopted.

    That was the intention with the csv module. I wonder if some limitations to
    use of sets with 2.2.x could be gotten around by adding a __future__ import?
    Maybe itertools is also needed.
    In the documentation for the itertools module, I intensionally included
    pure python versions of each tool that make backporting easy. You
    can cut and paste the documentation into a module with
    from __future__ import generators and have a Py2.2 version of
    itertools that would enable the sets module to run just fine.

    Still, why not upgrade to Py2.3? The bug fixes were all ported to 2.2.3
    and into Py2.3 so that the essential differences are the new modules
    and some minor language improvements.


    Raymond Hettinger
  • Bob Gailer at Aug 14, 2003 at 4:32 pm
    After giving blanket approval to the docs I now add:

    I have a mission to set some new guidelines for Python documentation.
    Perhaps this is a good place to start.
    Example - currently we have:

    class Set( [iterable])
    Constructs a new empty Set object. If the optional iterable parameter is
    supplied, updates the set with elements obtained from iteration. All of the
    elements in iterable should be immutable or be transformable to an
    immutable using the protocol described in section
    <http://www.python.org/doc/current/lib/immutable-transforms.html#immutable-transforms>5.12.3.


    Problems:
    The result of Set appears to be an empty Set object. The fact that it might
    be filled is hidden in the parameter description.
    The parameter description itself is hidden in the paragraph, making it
    harder to find, especially when the reader is in a hurry.

    Some suggested guidelines to improve readability and understandability:
    1 - label each paragraph so we know what it is about
    2 - have a function paragraph that briefly but completely describes the
    function
    3 - have labeled sections for things that can be so grouped (e.g. parameters)
    4 - start the description of each thing in a new paragraph.

    Example:

    class Set( [iterable])
    function: Constructs a new empty Set object and optionally fills it.
    parameters:
    iterable [optional] if supplied, updates the set with elements
    obtained from
    iteration. All of the elements in iterable should be immutable or be
    transformable to an immutable using the protocol described in
    section
    <http://www.python.org/doc/current/lib/immutable-transforms.html#immutable-transforms>5.12.3.


    What do you think? If this layout is appealing, let's use the set docs as a
    starting point to model this approach. I for one am willing to apply this
    model to the rest ot the set docs, and help update other docs, but not all
    of them.

    BTW I also have a problem with the term "Common uses". "Common" suggests
    that these are better, or more frequent. I suggest "Some examples of
    application of sets".

    I also agree with the suggestion that operations that are synonymous be so
    indicated in the table.

    Bob Gailer
    bgailer at alum.rpi.edu
    303 442 2625

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://mail.python.org/pipermail/python-list/attachments/20030814/0a86afab/attachment.html
    -------------- next part --------------

    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003
  • Raymond Hettinger at Aug 15, 2003 at 9:46 pm
    "Russell E. Owen"
    I don't rely on sets heavily (I do have a few implemented as
    dictionaries with value=None) and am not yet ready to make my users
    upgrade to Python 2.3.

    I suspect the upgrade issue will significantly slow the incorporation of
    sets and the other new modules, but that over time they're likely to
    become quite popular. I am certainly looking forward to using sets and
    csv.

    I think it'd speed the adoption of new modules if they were explicitly
    written to be compatible with one previous generation of Python (and
    documented as such) so users could manually include them with their code
    until the current generation of Python had a bit more time to be adopted.
    Wish granted!

    The sets module now will run under Py2.2.
    It should be available for download from CVS after 24 hours:
    http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Lib/sets.p
    y


    Raymond Hettinger
  • Chris Reedy at Aug 13, 2003 at 5:34 pm
    Raymond -

    Well now that you ask ...

    Raymond Hettinger wrote:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I think that choice appeals to me more than + and * (which are already
    more overloaded than I would like). I haven't seen any suggestions that
    I liked better.
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    Yes I need it (desperately). Generally works as I need. However, see
    more comments below.
    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    I haven't felt the need yet. So far I've been satisfied with:

    if x & y:

    as opposed to

    if x.isdisjoint(y)
    * Does the performance meet your expectations?
    So far. However, so far, I haven't been trying meet any demanding
    performance requirements.
    * Do you care that sets can only contain hashable elements?
    This is an interesting question. In particular, I have found myself on
    more than one occasion doing the following:

    for x in interesting_objects:
    x.foo = Set()
    while something_to_do:
    somex.foo |= something_I_just_computed
    for x in interesting_objects:
    x.foo = ImmutableSet(x.foo)
    build_some_more_sets(somex.foo)

    I'm not sure whether I like having to go back and change all my sets to
    Immutable ones after I've finished the computation. (Or whether I just
    ought to make x.foo immutable all the time.) I did appreciate the
    ImmutableSet type, since it allows me to flag to myself that I don't
    expect a set to change further.
    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?
    In some cases I've run into that. Since I can create a set with any
    iterable I've been able to do:

    set op Set(iterable)

    I think I might be interested in using a general iterable if that would
    get me some advantage (maybe significantly faster).
    * Are the docs clear? Can you suggest improvements?
    No problems here. However, my background is math, and I've never had
    problems with documentation (I started my career learning IBM mainframe
    assembly language programming from the reference manuals) so I don't
    think I'm a good test case.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    I'm working on a project where they are critical. If it hadn't been
    supplied I would have implemented one myself. I was using the backported
    version of the set module with 2.2 before 2.3 came out.
    User feedback is essential to determining the future direction
    of sets (whether it will be implemented in C, change API,
    and/or be given supporting language syntax).


    Raymond Hettinger
    Chris



    -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
    http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
    -----== Over 80,000 Newsgroups - 16 Different Servers! =-----
  • Andrew Dalke at Aug 13, 2003 at 7:30 pm

    Raymond Hettinger:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.
    I've only just started to use them. Plus, I didn't realize
    there was a need for feedback.
    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I read some mention of using "|" instead of "+", so I knew
    to use it. I would have liked +, but not *. I know the logic
    for thinking * but & doesn't have the other connotations
    * has (like [1] * 2, "a"*9)
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    Necessary is such a strong word. After all, I've been using
    Python for a long time without a set.

    It's been fun to use. I still haven't got the balance right, so
    I'll start with a set then have to back it out because I need
    an ordered list instead, or that I need a value so switch to
    a dict.

    I really, really like using ImmutableSet as a dictionary key.
    The other solution is
    unique_d = {} # or: dict.from_keys(data, 1)
    for x in data:
    unique_d[x] = 1
    keys = unique.keys()
    keys.sort()
    dict_key = tuple(keys)

    which makes the assumption that my objects can be sorted.
    (__lt__ is a stronger requirement than __eq__)

    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    I haven't had enough experience to say. I haven't needed the
    first, and think I just did "if not x&y" for the second. I could
    see choosing an "isdisjoint" for its more explicit description of
    intent and its better performance.

    What I did need yesterday was a way to get the elements only
    in the first set, only in the intersection, and only in the second

    def threeway(st1, st2):
    return st1-st2, st1&st2, st2-st1

    It's easy to write - I just didn't like the 3-fold comparisons of
    the elements in the sets.
    * Does the performance meet your expectations?
    Haven't gotten to the stage where I'm worried about performance.
    * Do you care that sets can only contain hashable elements?
    Not a bit. I was using dicts before.
    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?
    I did think about

    st2 = st1 + ["some", "other", "values"]

    but decided I didn't like it, so didn't try it out to see if it worked.
    I'm not sure what I don't like about it. I think it's because sets
    lose order, so they aren't *quite* iterator enough to justify working
    transparently with iterators. But IANG ("I am not Guido" ) and
    know that my language intuition is only grade B.
    * Are the docs clear? Can you suggest improvements?
    I haven't read the docs closely, mostly just AMK's notes, changelog,
    and discussions here. I would like if help(sets) gave more
    information, since I mostly learn interactively. OTOH, what help
    does give is too much. The signatures for all three classes are
    present, which means screen upon screen of __special__
    __methods__. If the top had a summary of operators

    x&y x.intersection(y) -- intersection of two sets
    x|y x.union(y) -- union of two sets
    ...
    or the example code from the docs then it would be more helpful.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    I'm still experimenting. Eg, will I use

    unique = dict.from_keys(data).keys()

    or

    import sets
    ...
    unique = list(sets.Set(data))

    ? The former fits in well with pre-set Pythonic thought but the
    second looks cleaner.
    User feedback is essential to determining the future direction
    of sets (whether it will be implemented in C, change API,
    and/or be given supporting language syntax).
    It took a while to get used to "st.remove(x)" - a couple times
    I did "del st[x]", which is what I would do for a dict. It's
    probably appropriate that it not be allowed, as "st[x]" doesn't
    and shouldn't work, but again, IANG so I'll just point out
    that the del felt natural to me.

    Andrew
    dalke at dalkescientific.com
  • Gary Feldman at Aug 13, 2003 at 9:16 pm

    On Tue, 12 Aug 2003 06:02:17 GMT, "Raymond Hettinger" wrote:


    * Are the docs clear? Can you suggest improvements?
    I haven't used them yet, but since I'm working my way through
    the docs in general, I thought I'd check them out and comment.

    5.12
    If the optional iterable parameter is supplied, updates the set with
    elements obtained from iteration.

    Iteration over what? Presumably "iterable." However, this being
    a reference manual, my expectations are a bias in favor of
    being complete to the point of pedantry. People shouldn't be
    forced to derive inferences like this. I think it should read
    "...elements obtained from iterating over iterable."

    Also, I'd like to see "iterable must be <some type spec>",
    though this is a general flaw in the Python doc and is perhaps
    biased by my C/C++ background where you'd never dream
    of doing a reference manual without explicitly indicating the
    types of every parameter.

    5.12.1
    so none of the following are true: a<b, a==b, or a>b

    A positive phrasing is always preferable to a negative one, and
    it's always more helpful to say what things do return rather than
    what they don't. In this case, I'd say "so each of the following
    returns false: ...."

    5.12.1
    s.remove(x) remove x from set s
    s.discard(x) removes x from set s if present

    The implication being that remove does something else if
    x isn't in s. Again, my bias towards total completeness is
    that it should say (presuming my intuition is correct)
    "remove x from set s; throws an exception if x is not a
    member of s". (Again, this is a more general problem than
    just here, though I admit that there's an awful lot of C++
    documentation that doesn't document the exceptions
    that are thrown.)

    Personally, I have hard time imagining where I'd want
    that. If I really cared, I could check beforehand, so I think
    I'd just always use discard.

    Ditto for s.pop(), i.e. what does it do if s is empty?

    s.issubset(t) test whether every element in s is in t;

    I'd prefer "Tests whether s is a subset of t," and likewise
    for issuperset. I think it's fair that the reader is familiar
    with sets, so the only real question is whether s.is...(t) means
    s is t or t is s. Or state both. But being forced to parse a
    clumsy English phrase to figure it out isn't clear enough.

    5.12.2
    engineering_management = engineers & programmers

    Shouldn't this be engineers & management? Also (and
    here's my pedantry again) if you're using the plural form
    for engineers and programmers, then it should be
    managers, not management.

    Please don't take my pedantry as criticism. It's more just
    a data point - this is the one way experienced engineer
    expects to see things.

    Gary

    PS I suppose I should mention my strongest pet peeve
    with the Python documentation, which is the practice of
    putting the member functions on a different page than
    the class overview. But that's not your issue, either.
  • Andrew Dalke at Aug 14, 2003 at 12:24 am

    Gary Feldman:
    Also, I'd like to see "iterable must be <some type spec>",
    though this is a general flaw in the Python doc and is perhaps
    biased by my C/C++ background where you'd never dream
    of doing a reference manual without explicitly indicating the
    types of every parameter.
    Python uses what is sometimes called "duck typing" (meaning,
    if it quacks like a duck...). Lots of objects are iterable - strings,
    lists, sets, dict (keys), and user-defined classes. Since you
    prefer C++, think of Python more akin to templates. Templates
    expect the objects templated on to have certain properties (can
    be "+"ed, can be deferenced, has a method named "xyz") and
    not that they have given types.
    Personally, I have hard time imagining where I'd want
    [remove]. If I really cared, I could check beforehand, so I think
    I'd just always use discard.
    I'm the other way around. I find it hard to imagine where
    I would call discard. If I want to remove an element from a
    set then I want to know right away if that element isn't there.
    It's been handy for tracking down bugs in my code.
    5.12.2
    engineering_management = engineers & programmers
    Actually, I don't like that example because there is too
    much text to read through to see the actual symbols used.
    PS I suppose I should mention my strongest pet peeve
    with the Python documentation, which is the practice of
    putting the member functions on a different page than
    the class overview. But that's not your issue, either.
    And I confess that I like to see everything on one page
    and not split up between several pages. That way I can
    use my browser's search facility.

    Andrew
    dalke at dalkescientific.com
  • Raymond Hettinger at Aug 16, 2003 at 12:48 am
    "Gary Feldman"
    * Are the docs clear? Can you suggest improvements?
    I haven't used them yet, but since I'm working my way through
    the docs in general, I thought I'd check them out and comment.
    All of the issues you found have been fixed (except for the discussion of
    what an iterable parameter means -- that will be addressed elsewhere).


    Raymond Hettinger
  • Bob Gailer at Aug 13, 2003 at 10:59 pm

    At 06:02 AM 8/12/2003 +0000, Raymond Hettinger wrote:
    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)? Its OK.
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently powerful?
    Necessary? No. Desirable? Yes Powerful? Yes
    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current offerings
    sufficient? Its OK
    * Does the performance meet your expectations?
    Not tested
    * Do you care that sets can only contain hashable elements? No
    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?
    Be nice to support iterable also.
    * Are the docs clear? Can you suggest improvements?
    VERY. A model of how I'd like to see the rest of Python docs.
    * Are sets helpful in your daily work or does the need arise only rarely?
    Hard to say,

    Bob Gailer
    bgailer at alum.rpi.edu
    303 442 2625
    -------------- next part --------------

    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003
  • Delaney, Timothy C (Timothy) at Aug 13, 2003 at 11:35 pm

    From: Gary Feldman [mailto:gafStopSpamData at ziplink.stopallspam.net]

    Also, I'd like to see "iterable must be <some type spec>",
    though this is a general flaw in the Python doc and is perhaps
    biased by my C/C++ background where you'd never dream
    of doing a reference manual without explicitly indicating the
    types of every parameter.
    "iterable" *is* <some type spec>. If something is "iterable" it has a well-defined interface - specifically:

    iter(iterable)

    returns an "iterator", which also has a well-defined interface - specifically that it implements:

    next()
    __iter__() returning self

    Tim Delaney
  • Gary Feldman at Aug 14, 2003 at 1:08 am

    On Thu, 14 Aug 2003 09:35:24 +1000, "Delaney, Timothy C (Timothy)" wrote:


    "iterable" *is* <some type spec>. If something is "iterable" it has a well-defined interface - specifically:
    I get what you mean, but let me point out that "iterable" does not appear
    in the index of either the Language Reference or Library Reference, nor do
    I see it in the introduction. Actually, the introduction would really
    benefit from a description of all the conventions used in the manual. As
    it stands, one has to come across this usage several times to realize it's
    a convention (and that's not the way many people use reference manuals).

    What would be really cool, and probably easy, is to just make sure that
    every occurrence of "iterable" is a link to a page that describes that
    well-defined interface.

    This is still beyond the scope of the question about the Set documentation.
    I appreciate the responses, but I'm not sure whether this is the right time
    to continue this sort of discussion.

    Gary
  • Skip Montanaro at Aug 14, 2003 at 3:44 am
    Gary> I get what you mean, but let me point out that "iterable" does not
    Gary> appear in the index of either the Language Reference or Library
    Gary> Reference, nor do I see it in the introduction.

    That's a documentation bug then. The concrete notion of an "iterable"
    arrived on the Python scene relatively recently, and the docs apparently
    haven't quite caught up in that respect.

    Skip
  • Skip Montanaro at Aug 14, 2003 at 4:07 am
    Gary> I get what you mean, but let me point out that "iterable" does not
    Gary> appear in the index of either the Language Reference or Library
    Gary> Reference, nor do I see it in the introduction.

    Skip> That's a documentation bug then.

    See the patch I just submitted at

    http://python.org/sf/788509

    It suggests that a glossary is needed and includes one with a single entry
    for "iterable". Feel free to attach comments to it.

    Skip
  • Skip Montanaro at Aug 14, 2003 at 4:25 am
    Skip> See the patch I just submitted at

    Skip> http://python.org/sf/788509

    Skip> .... Feel free to attach comments to it.

    I should point out that one of the most consistent needs on the development
    side of things is for people to review bug reports and patches. Such review
    can take many forms from, "I tried the patch. I worked for me on the
    SnarfBlatt 803" to critiques of writing, enhancing content (like attaching
    more glossary entries to the above patch) or submitting code which you feel
    addresses a bug or patch submission. Don't worry if you're not a C
    programmer or a LaTeX whiz. You can focus on Python language items and
    submit doc enhancements as plain text.

    How might you keep up-to-date on new bug reports and patches? It just so
    happens that every Sunday morning at 7AM a cron job runs which summarizes
    new bug reports and patch submissions for the Python project on SourceForge.
    That report is sent to the python-dev mailing list. Here's an example:

    http://mail.python.org/pipermail/python-dev/2003-August/037561.html

    If, like me, you suffer from a little bit rot and won't remember to visit
    the python-dev mailing list archives every now and again, let me know and
    I'll add you to the distribution. If enough people are interested, I'll
    start a Mailman-managed list expressly to catch that mailing.

    Skip
  • Gary Feldman at Aug 18, 2003 at 5:01 pm

    On Wed, 13 Aug 2003 23:07:18 -0500, Skip Montanaro wrote:
    Skip> That's a documentation bug then.

    See the patch I just submitted at

    http://python.org/sf/788509
    I appreciate your doing this, and my next step (while I wait for answers to
    my recently posted question) is to get into the bug system.

    But in thinking about this, I'm still unconvinced about the original
    wording. It says "... elements obtained from iteration." Iteration, in
    this context, is still an abstract concept. It describes the how. But as
    you've already pointed out, the how is implicit in the name of the
    argument. Just from a grammatical/semantic point of view, it ought to
    describe the what, i.e. "... elements obtained from this parameter," though
    "... updates the set by iterating over this parameter" works, too.

    In other words, I find it easier to infer that iteration is being used than
    to infer the source of iteration.

    Many thanks, especially for the patience being shown,

    Gary
  • Raymond Hettinger at Aug 14, 2003 at 6:01 am
    "iterable" *is* <some type spec>. If something is "iterable" it has a
    well-defined interface - specifically:
    I get what you mean, but let me point out that "iterable" does not appear
    in the index of either the Language Reference or Library Reference, nor do
    I see it in the introduction. Actually, the introduction would really
    benefit from a description of all the conventions used in the manual. As
    it stands, one has to come across this usage several times to realize it's
    a convention (and that's not the way many people use reference manuals).

    What would be really cool, and probably easy, is to just make sure that
    every occurrence of "iterable" is a link to a page that describes that
    well-defined interface.

    This is still beyond the scope of the question about the Set documentation.
    I appreciate the responses, but I'm not sure whether this is the right time
    to continue this sort of discussion.
    It is exactly the right time and place.
    I frequently make documentation fixes
    based on the comp.lang.python discussions.


    Each fresh pair of eyes sees something new.
    Achieving excellent documentation is a journey
    and not a destination.


    Raymond Hettinger
  • Michael Hudson at Aug 14, 2003 at 1:06 pm

    "Raymond Hettinger" <vze4rx4y at verizon.net> writes:

    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    I'd actually rather sets didn't overload any operators at all, but
    appreciate that this may be a minority position.
    and & is the only sane choice, however.
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    I don't use them as much as I should, I suspect.
    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    I've not reached for something and not found it there yet.
    * Does the performance meet your expectations?
    My uses so far have not had even the faintest of performance demands,
    so, yes.
    * Do you care that sets can only contain hashable elements?
    Not yet.

    Cheers,
    mwh

    --
    The website looks like the Hi-Score sheet from a Bullshit Bingo
    tournament. -- Dan Holdsworth, asr
  • David Mertz at Aug 15, 2003 at 1:53 am

    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    I confess that I have not used sets for anything beyond testing. I love
    the concept, but I just haven't had the need yet (especially in
    something where I want to require 2.3).

    The mention of Set.powerset() above is quite interesting to me. It
    feels both exciting and dangerous :-).

    As we all know, the size of the powerset of S, for len(S)==N, is 2**N.
    Seems like it would be really easy to run into some long runtimes and
    memory usage. Then again, power set really is a fundamental operation
    on sets. And making users rewrite it each time is error prone.

    So I think I would advocate it, but with a fairly harsh warning in the
    documentation about complexity issues.

    Yours, David...

    --
    mertz@ | The specter of free information is haunting the `Net! All the
    gnosis | powers of IP- and crypto-tyranny have entered into an unholy
    .cx | alliance...ideas have nothing to lose but their chains. Unite
    against "intellectual property" and anti-privacy regimes!
    -------------------------------------------------------------------------
    X-Shameless-Plug: Buy Text Processing in Python: http://tinyurl.com/jskh
  • Gerrit Holl at Aug 15, 2003 at 8:49 pm

    Raymond Hettinger wrote:
    Subject: Py2.3: Feedback on Sets
    * Do you care that sets can only contain hashable elements?
    This is the only disadvantage for me.

    For the rest, I am happy about it. I am already using it a lot
    on places where I used lists before, but where a Set is much
    better (no order, no duplicates, it really *is* a set)
    User feedback is essential to determining the future direction
    of sets (whether it will be implemented in C, change API,
    and/or be given supporting language syntax).
    I really like them. I would also like to be able to do
    {elem for elem in set if foo(elem)} to construct a subset.

    Gerrit.

    --
    255. If he sublet the man's yoke of oxen or steal the seed-corn,
    planting nothing in the field, he shall be convicted, and for each one
    hundred gan he shall pay sixty gur of corn.
    -- 1780 BC, Hammurabi, Code of Law
    --
    Asperger Syndroom - een persoonlijke benadering:
    http://people.nl.linux.org/~gerrit/
    Het zijn tijden om je zelf met politiek te bemoeien:
    http://www.sp.nl/
  • John Smith at Aug 17, 2003 at 5:16 pm
    Suggestion: How about adding Set.isProperSubset() and
    Set.isProperSuperset()?

    Thanks for this wonderful module. I've been working on data mining and
    machine
    learning area using Python. Set operations are very important to me.

    "Raymond Hettinger" <vze4rx4y at verizon.net> wrote in message
    news:Jp%Za.256$jw4.238 at nwrdny03.gnilink.net...
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?

    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?

    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?

    * Does the performance meet your expectations?

    * Do you care that sets can only contain hashable elements?

    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?

    * Are the docs clear? Can you suggest improvements?

    * Are sets helpful in your daily work or does the need arise
    only rarely?


    User feedback is essential to determining the future direction
    of sets (whether it will be implemented in C, change API,
    and/or be given supporting language syntax).


    Raymond Hettinger


  • Raymond Hettinger at Aug 17, 2003 at 6:03 pm
    "John Smith"
    Suggestion: How about adding Set.isProperSubset() and
    Set.isProperSuperset()?
    We have them in operator form: a<b a>b
    Spelling them out did not seem to add much value.
    This is doubly true because some people read it
    as s.isProperSubsetOf(t) and others read it as
    s.hasTheProperSubset(t).


    Raymond Hettinger
    Thanks for this wonderful module. I've been working on data mining and
    machine
    learning area using Python. Set operations are very important to me.
    Great. You'll love it even more when I implement it in C.



    Raymond Hettinger
  • Beni Cherniavsky at Aug 17, 2003 at 7:37 pm

    In comp.lang.python, you wrote:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    ``&`` and ``|`` are good. In math too, the union/intersection operators are
    very similar to the or/and opearators. The later are overloaded as
    addition/mulitplication in boolead algebra only because the or/and operators
    are so similar and inconvenient. This overloading is quite misguided because
    these operators don't define a field or even a ring (xor/and would do that).
    * Is the support for sets of sets necessary for your work
    and, if so, then is the implementation sufficiently
    powerful?
    I used sets very heavily in my last project, including nested sets. The
    implementation is powerful enough. The concept of sets themselves became too
    weak at some moment (I needed to associate data with each element) but I got
    so used to the powerful union/intersection/etc. operations that I
    implemented__ them over dicts...

    __ http://www.technion.ac.il/~cben/python/dsets.py

    I suspect that the set/dict boundary will be crossed quite frequently by
    people; it's hard to know beforehand that you'll never want to asssociate
    values with the keys. Besides, set operators are a very powerful and
    high-level way to manipulate dicts. So perhaps it'd be a good idea to
    integrate sets with dicts more along the lines of my dsets module. So that
    dicts grow in power as well.

    The biggest lesson from my dsets module is that for set operations on dicts,
    you really want to choose how to handle collisions (both sets have the same
    key so two values contend for the result). And you want to choose it
    per-operation. It's OK if you can only do it on named methods but not on
    overloaded operators (there is no good syntax for that).

    My dsets module treats iterables (including sets) as dsets whose values are
    `None`. This direction also suggests that sets could be true dicts: when you
    omit the value in a dict initializer, it defaults to `None`. But after some
    thinking I think it would be suboptimal. One reason: it means you should to
    choose a collision function even when working with sets whose value are
    meaningless, which makes little sense. I now think it'd be better to separate
    them (by presence of the __getitem__ method) and only give meaning to
    collision functions if both arguments have values. So:

    - ``<set> & <set>`` means what it means now.
    - ``<dict> & <set>`` means subset of dict.
    - ``<dict>.intersection(<dict>, <collision_func>) means what it means in
    dsets.

    But there are more complications:

    - ``<dict> | <set>`` must mean union of sets because for keys not in <dict>
    you have no value.
    - Should ``<dict> | <dict>``, ``<dict> & <dict>``, etc. raise exception on
    collisions (as now) or simply return sets?

    I'll try to release a working design soon. Feedback most welcome.
    * Is there a compelling need for additional set methods like
    Set.powerset() and Set.isdisjoint(s) or are the current
    offerings sufficient?
    Generating the full powerset seems pretty useless. `isdisjoint` sounds like a
    good idea since it can be implemented more effeciently than ``not a & b``.
    * Does the performance meet your expectations?
    I think so. Almost half of my program's time is spent in set.py according to
    the profiler but when I optimized the sets module (and my dsets) with psyco,
    things got several percent slower. Is this a good sign ?-)
    * Do you care that sets can only contain hashable elements?
    No more than I care for dicts containing only hashable keys. How else could
    you define it?

    What I do like is the auto-copying of mutable sets when used a set items; I
    wouldn't mind something like this to be extended into other Python types
    (primarily dicts).
    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?
    I'd go for any iterable where it's enough and require __contains__ where
    that's needed. I think together this should cover all method's needs. I
    think it's better not to add artificail checks, so that other set-alike types
    can be used with sets (primarily mapping or set-alike types).
    * Are the docs clear? Can you suggest improvements?
    It's been a lot of time since I had to read them which is probably a good
    sign.
    * Are sets helpful in your daily work or does the need arise
    only rarely?
    Helpful. Since they were released on 2.2, I feel free to import sets whenever
    it reflects my meaning clearer than dicts, even when I don't use any fancy
    operations. I find the code more readable this way.

    And sets of sets (or ImmutableSets as dict keys) are very helpful when you
    need them.

    --
    Beni Cherniavsky <cben at tx.technion.ac.il>
  • Raymond Hettinger at Aug 18, 2003 at 1:54 am
    [Beni Cherniavsky]
    Generating the full powerset seems pretty useless. `isdisjoint` sounds like a
    good idea since it can be implemented more effeciently than ``not a & b``.
    True enough. It is easy to putogether an early-out algorithm using itertools.
    So, it can be done a C speed. The question is whether there
    are sufficient use cases to warrant expanding an already fat API.

    * Does the performance meet your expectations?
    I think so. Almost half of my program's time is spent in set.py according to
    the profiler but when I optimized the sets module (and my dsets) with psyco,
    things got several percent slower. Is this a good sign ?-)
    It's a very good sign.

    Already, most methods run at C speed thanks to itertools and dicts.
    Try directing pysco to BaseSet._update() and BaseSet.__contains__().
    Those are the two most important methods that do not run at C speed.

    * Do you care that sets can only contain hashable elements?
    No more than I care for dicts containing only hashable keys. How else could
    you define it?
    The hard way. Keep a separate ordered list for elements that define ordering
    operations but not a hash value. Insert and Search using a binary search.

    Keep a third list for elements that only support equality comparision.
    Use linear search for that one.

    This complicates the heck out of the code but would expand the range of
    things storable in sets. The only real life use case I can think of is using
    a third party package that supplied objects without a hash code.

    What I do like is the auto-copying of mutable sets when used a set items; I
    wouldn't mind something like this to be extended into other Python types
    (primarily dicts).
    The protocol would also work for dicts, but the key to getting it to work
    is installing as_immutable methods in the objects that are being stored.
    Something that touches that much code would also need compelling
    use cases to justify it. But yes, it can be done.

    * How about the design constraint that the argument to most
    set methods must be another Set (as opposed to any iterable)?
    I'd go for any iterable where it's enough and require __contains__ where
    that's needed. I think together this should cover all method's needs. I
    think it's better not to add artificail checks, so that other set-alike types
    can be used with sets (primarily mapping or set-alike types).
    Wish granted.

    I've relaxed the requirements that the arguments be a set and now any iterable
    will do. This will come out in Py2.3.1.

    * Are the docs clear? Can you suggest improvements?
    It's been a lot of time since I had to read them which is probably a good
    sign.
    Great!

    * Are sets helpful in your daily work or does the need arise
    only rarely?
    Helpful. Since they were released on 2.2, I feel free to import sets whenever
    it reflects my meaning clearer than dicts, even when I don't use any fancy
    operations. I find the code more readable this way.
    Me too.

    And sets of sets (or ImmutableSets as dict keys) are very helpful when you
    need them.
    Good. I've gotten almost no replies from people using sets of sets.


    Raymond Hettinger
  • Tim Peters at Aug 18, 2003 at 3:01 am
    [Beni Cherniavsky]
    `isdisjoint` sounds like a good idea since it can be implemented
    more effeciently than ``not a & b``.
    [Raymond Hettinger]
    True enough. It is easy to putogether an early-out algorithm using
    itertools. So, it can be done a C speed. The question is whether
    there are sufficient use cases to warrant expanding an already fat API.
    Early-out algorithms consume time to determine when the early-out condition
    obtains, and so can be a net loss (via consuming more time testing for
    early-out than is saved by getting out early). See the long comment block
    before _siftup in heapq.py for a classic case where early-out usually loses
    big. For isdisjoint() it's muddier (depends on whether your data is such
    that most sets fed to it are or aren't disjoint; if they're usually
    disjoint, code testing for early-out is a waste of time).

    ...
    * Do you care that sets can only contain hashable elements?
    No more than I care for dicts containing only hashable keys. How
    else could you define it?
    The hard way. Keep a separate ordered list for elements that
    define ordering operations but not a hash value. Insert and Search
    using a binary search.

    Keep a third list for elements that only support equality comparision.
    Use linear search for that one.

    This complicates the heck out of the code but would expand the range
    of things storable in sets. The only real life use case I can think
    of is using a third party package that supplied objects without a hash
    code.
    Part of Zope's ZODB (persistent object-oriented database) is an elaborate
    BTree package, supplying many flavors of containers built on more-or-less
    (depending on how much you squint <wink>) classical disk-based B+ trees.
    Sets of C integers and sets of Python objects are some of the container
    types supported. BTrees require a total ordering among keys, but don't use
    hash codes. Most operations on B+-tree based sets take time logarithmic in
    the number of elements; this includes insertion and deletion, where using a
    straight sorted list instead would take linear time (much worse).

    There are effective ways to use hashing schemes for disk-based containers
    too. The strongest reason for using ordered B+ trees in ZODB is that
    searches in database apps very often want to find everything in a *range* of
    key values, and traversing a B+ tree in sorted order is trivial (it's just a
    left-to-right traversal over a linked list of the leaf buckets; a pair of
    log-time searches is done first to find the range's endpoints). So, e.g.,
    where a Python dict has

    .iterkeys()

    a ZODB BTree has

    .iterkeys(min=None, max=None, excludemin=False, excludemax=False)

    A prime use case for an ordered implementation of sets is that adaptive
    strategies such as those used in 2.3's list.sort() can greatly speed union
    and intersection operations on sets of non-random data. For example, what's
    the intersection of

    {1, 2, 3, ..., 1000000}

    and

    {2000000, 2000001, 2000002, ..., 3000000}

    ?

    If sets are stored in sorted order, a quite general set intersection
    algorithm can easily deduce that the intersection is empty after just a few
    comparisons. Indeed, the inspiration for 2.3's sometimes-amazing
    list.sort() came from reading

    "Adaptive Set Intersections, Unions, and Differences" (2000)
    Erik D. Demaine, Alejandro L?pez-Ortiz, J. Ian Munro

    (easy to find via Google). BTW, there's an extensive discussion of ways to
    adaptively minimize the cost of early-out gimmicks when they're not paying
    in Python's Objects/listsort.txt.
  • Beni Cherniavsky at Aug 20, 2003 at 8:10 am

    Raymond Hettinger wrote on 2003-08-17:

    [Beni Cherniavsky]
    And sets of sets (or ImmutableSets as dict keys) are very helpful when you
    need them.
    Good. I've gotten almost no replies from people using sets of sets.
    For the record, here is the central data-type of my program
    (recognition of lines from a bitmap):

    pixel ::= (x, y) tuple
    node ::= set of pixels
    graph.nodes = {node: set of neighbour nodes} dict
    graph.edges = {set of 2 nodes: set of pixels} dict

    So I have two dicts, one mapping sets to sets of sets and the other
    mapping sets of sets to sets ;-). The the fact that sets can be
    dictionary keys is perhaps even more useful than sets being set
    elements.

    P.S. I wonder what happens if I do::

    s = Set([1, 2])
    s.add(s)

    how badly will operations on it break then? Perhaps the docs should
    mention that circular set structures are a bad idea (see the quote in
    my signature ;).

    --
    Beni Cherniavsky <cben at tx.technion.ac.il>

    A word of warning about matrices - *each column must have the same
    number of elements in it*. The world will end if you get this wrong.
    -- EQN user manual, Brian W. Kernighan & Lorinda L. Cherry
  • Christos TZOTZIOY Georgiou at Aug 22, 2003 at 7:19 am
    On Wed, 20 Aug 2003 11:10:58 +0300 (IDT), rumours say that Beni
    Cherniavsky <cben at techunix.technion.ac.il> might have written:
    P.S. I wonder what happens if I do::

    s = Set([1, 2])
    s.add(s)

    how badly will operations on it break then?
    They won't. Sets are mutable, so adding a Set to itself actually adds
    an ImmutableSet copy constructed on the fly.
    --
    TZOTZIOY, I speak England very best,
    Microsoft Security Alert: the Matrix began as open source.
  • Just at Aug 17, 2003 at 8:28 pm
    In article <mailman.1061149114.2138.python-list at python.org>,
    cben at techunix.technion.ac.il (Beni Cherniavsky) wrote:
    What I do like is the auto-copying of mutable sets when used a set items; I
    wouldn't mind something like this to be extended into other Python types
    (primarily dicts).
    Me too. I once proposed this on python-dev but there didn't seem to be
    much interest:

    http://mail.python.org/pipermail/python-dev/2003-February/033043.html

    (I must admit, the second case I mention there has more or less gone
    away in the meantime, leaving only the C implementation for sets...)

    Just
  • Carl Banks at Aug 18, 2003 at 3:20 am

    Beni Cherniavsky wrote:
    In comp.lang.python, you wrote:
    I've gotten lots of feedback on the itertools module
    but have not heard a peep about the new sets module.

    * Are you overjoyed/outraged by the choice of | and &
    as set operators (instead of + and *)?
    ``&`` and ``|`` are good. In math too, the union/intersection operators are
    very similar to the or/and opearators. The later are overloaded as
    addition/mulitplication in boolead algebra only because the or/and operators
    are so similar and inconvenient. This overloading is quite misguided because
    these operators don't define a field or even a ring (xor/and would do that).

    Well, if overloading + and * to represent a non-field or non-ring is
    misguided, then + and * for lists and strings is also misguided, since
    they don't define a field or ring either.

    Having said that, it does feel wrong to use "+" for sets to me. For
    container types, I want len(s)+len(t) == len(s+t), but this would not
    hold for sets. That is why I slightly favor & and |.


    --
    CARL BANKS http://www.aerojockey.com/software
    "You don't run Microsoft Windows. Microsoft Windows runs you."
  • Michael Hudson at Aug 18, 2003 at 1:36 pm

    Carl Banks <imbosol at aerojockey.com> writes:

    Well, if overloading + and * to represent a non-field or non-ring is
    misguided, then + and * for lists and strings is also misguided,
    since they don't define a field or ring either.
    Yup! Well, *, anyway. I'd prefer a Haskell-style ++ for
    concatenation, but + isn't totally silly (strings and lists are
    monoids, at least :-). list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).

    Cheers,
    mwh

    --
    ... Windows proponents tell you that it will solve things that
    your Unix system people keep telling you are hard. The Unix
    people are right: they are hard, and Windows does not solve
    them, ... -- Tim Bradshaw, comp.lang.lisp
  • Borcis at Aug 18, 2003 at 3:09 pm

    Michael Hudson wrote:
    + isn't totally silly (strings and lists are
    monoids, at least :-). list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).
    But strings/lists form a non-commutative monoid, so list * list
    would be better imho, ans a corresponding list ** int
  • Michael Hudson at Aug 18, 2003 at 3:08 pm

    Borcis <borcis at users.ch> writes:

    Michael Hudson wrote:
    + isn't totally silly (strings and lists are
    monoids, at least :-). list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).
    But strings/lists form a non-commutative monoid, so list * list
    would be better imho, ans a corresponding list ** int
    Hmm, you're right of course,, but for some reason I prefer +. Maybe
    it's the commutation with len()?

    Anyhoo, this is all pointless gassing :-)

    Cheers,
    mwh

    --
    My hat is lined with tinfoil for protection in the unlikely event
    that the droid gets his PowerPoint presentation working.
    -- Alan W. Frame, alt.sysadmin.recovery
  • Borcis at Aug 18, 2003 at 3:39 pm

    Michael Hudson wrote:
    + isn't totally silly (strings and lists are
    monoids, at least :-). list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).
    But strings/lists form a non-commutative monoid, so list * list
    would be better imho, and a corresponding list ** int
    Hmm, you're right of course,, but for some reason I prefer +. Maybe
    it's the commutation with len()?
    renaming it to log() would make the *2+ homomorphism natural.
    Anyhoo, this is all pointless gassing :-)
    Right.
  • David Eppstein at Aug 18, 2003 at 4:18 pm
    In article <7h3d6f3143s.fsf at pc150.maths.bris.ac.uk>,
    Michael Hudson wrote:
    Yup! Well, *, anyway. I'd prefer a Haskell-style ++ for
    concatenation, but + isn't totally silly (strings and lists are
    monoids, at least :-). list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).
    Multiplication by (nonnegative) integers is a pretty standard thing to
    do in monoids, and means close to what Python's list*int syntax does:
    add the thing to itself that many times.

    I'm not sure why multiplying a list by a negative number produces the
    empty list instead of an exception, though.

    --
    David Eppstein http://www.ics.uci.edu/~eppstein/
    Univ. of California, Irvine, School of Information & Computer Science
  • Borcis at Aug 18, 2003 at 6:27 pm

    David Eppstein wrote:
    Multiplication by (nonnegative) integers is a pretty standard thing to
    do in monoids, and means close to what Python's list*int syntax does:
    add the thing to itself that many times.
    "God is a superfluous hypothesis, I need just an initial condition"

    >
    I'm not sure why multiplying a list by a negative number produces the
    empty list instead of an exception, though.
    >

    If it is illogical for it not to raise an exception, then in an appropriate
    universe the theory is logically permitted that some other value than the
    empty list is more logical as a result, than the empty list itself.

    The-anthropy-of-the-universe-is-everrising-ly-yours-ly
  • David Eppstein at Aug 18, 2003 at 6:43 pm

    In article <3F411A93.70807 at users.ch>, Borcis wrote:

    I'm not sure why multiplying a list by a negative number produces the
    empty list instead of an exception, though.
    If it is illogical for it not to raise an exception, then in an appropriate
    universe the theory is logically permitted that some other value than the
    empty list is more logical as a result, than the empty list itself.
    The appropriate theory for this being the free group generated by all
    possible Python objects?

    --
    David Eppstein http://www.ics.uci.edu/~eppstein/
    Univ. of California, Irvine, School of Information & Computer Science
  • Bob Gailer at Aug 18, 2003 at 9:46 pm

    I'm not sure why multiplying a list by a negative number produces the
    empty list instead of an exception, though.
    Actually the list has a negative # of elements in it. It is just hard to
    show this using repr. But each of these elements uses a negative amount of
    memory, so there is more RAM available for the rest of your program. You
    just have to avoid specifying a negative # too large in magnitude, or you
    will get a memory underflow.

    Bob Gailer
    bgailer at alum.rpi.edu
    303 442 2625
    -------------- next part --------------

    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003
  • Andrew Dalke at Aug 18, 2003 at 11:37 pm

    Bob Gailer:
    You just have to avoid specifying a negative # too large in magnitude,
    or you will get a memory underflow.
    Actually, there's a limit in how much negative memory you can get
    d*(-sys.maxint-1)
    []
    d*(-sys.maxint-2)
    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    OverflowError: long int too large to convert to int
    >>>

    Andrew
    dalke at dalkescientific.com
  • Rick Thomas at Aug 28, 2003 at 4:58 am
    list * int is by analogy with vector space operations. Vector1 * Scalar
    -> Vector2

    Rick


    Michael Hudson wrote:
    list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).
  • Seo Sanghyeon at Aug 28, 2003 at 5:29 am

    Michael Hudson wrote:
    list * int, OTOH, seems a bit smelly to me
    (though I'm not sure what a better spelling would be...).
    Rick Thomas wrote:
    list * int is by analogy with vector space operations.
    Vector1 * Scalar -> Vector2
    Huh? list * int does repetition. How does it relate to scalar
    multiplication of vectors?

    Seo Sanghyeon
  • Chad Netzer at Aug 28, 2003 at 6:14 am

    On Wed, 2003-08-27 at 22:29, Seo Sanghyeon wrote:
    Huh? list * int does repetition. How does it relate to scalar
    multiplication of vectors?
    They both preserve rank. ;)

    --

    Chad Netzer

Related Discussions

People

Translate

site design / logo © 2022 Grokbase