FAQ
I'm learning Python by reading David Beazley's "Python Essential Reference"
book and writing a few toy programs. To get a feel for hashes and sorting,
I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending by name. For the above data, the output should be:

kevin : 3
jock : 2
andrew : 1
bill : 1
fred : 1
freddy : 1

Here's my first attempt:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freq = {}
for name in names.split():
freq[name] = 1 + freq.get(name, 0)
deco = zip([-x for x in freq.values()], freq.keys())
deco.sort()
for v, k in deco:
print "%-10s: %d" % (k, -v)

I'm interested to learn how more experienced Python folks would solve
this little problem. Though I've read about the DSU Python sorting idiom,
I'm not sure I've strictly applied it above ... and the -x hack above to
achieve a descending sort feels a bit odd to me, though I couldn't think
of a better way to do it.

I also have a few specific questions. Instead of:

for name in names.split():
freq[name] = 1 + freq.get(name, 0)

I might try:

for name in names.split():
try:
freq[name] += 1
except KeyError:
freq[name] = 1

Which is preferred?

Ditto for:

deco = zip([-x for x in freq.values()], freq.keys())

versus:

deco = zip(map(operator.neg, freq.values()), freq.keys())

Finally, I might replace:

for v, k in deco:
print "%-10s: %d" % (k, -v)

with:

print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

Any feedback on good Python style, performance tips, good books
to read, etc. is appreciated.

Thanks,
/-\


Make the switch to the world's best email. Get the new Yahoo!7 Mail now. www.yahoo7.com.au/worldsbestemail

Search Discussions

  • Fredrik Lundh at Jan 9, 2008 at 11:33 am

    Andrew Savige wrote:


    Here's my first attempt:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = {}
    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)
    deco = zip([-x for x in freq.values()], freq.keys())
    deco.sort()
    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    I'm interested to learn how more experienced Python folks would solve
    this little problem. Though I've read about the DSU Python sorting idiom,
    I'm not sure I've strictly applied it above ... and the -x hack above to
    achieve a descending sort feels a bit odd to me, though I couldn't think
    of a better way to do it.
    sort takes a reverse flag in recent versions, so you can do a reverse
    sort as:

    deco.sort(reverse=True)

    in older versions, just do:

    deco.sort()
    deco.reverse() # this is fast!

    also note that recent versions also provide a "sorted" function that
    returns the sorted list, and both "sort" and "sorted" now allow you to
    pass in a "key" function that's used to generate a sort key for each
    item. taking that into account, you can simply write:

    # sort items on descending count
    deco = sorted(freq.items(), key=lambda x: -x[1])

    simplifying the print statement is left as an exercise.
    I also have a few specific questions. Instead of:

    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)

    I might try:

    for name in names.split():
    try:
    freq[name] += 1
    except KeyError:
    freq[name] = 1

    Which is preferred?
    for simple scripts and small datasets, always the former.

    for performance-critical production code, it depends on how often you
    expect "name" to be present in the dictionary (setting up a try/except
    is cheap, but raising and catching one is relatively costly).
    Ditto for:

    deco = zip([-x for x in freq.values()], freq.keys())

    versus:

    deco = zip(map(operator.neg, freq.values()), freq.keys())
    using zip/keys/values to emulate items is a bit questionable. if you
    need to restructure the contents of a dictionary, I usually prefer items
    (or iteritems, where suitable) and tuple indexing/unpacking in a list
    comprehension (or generator expression, where suitable).
    Finally, I might replace:

    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    with:

    print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)
    why?

    </F>
  • Ant at Jan 9, 2008 at 11:40 am

    I'm interested to learn how more experienced Python folks would solve
    this little problem.
    I think I'd do the following:

    from collections import defaultdict

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = defaultdict(lambda: 0)

    for name in names.split():
    freq[name] += 1

    pairs = [(v, k) for k, v in freq.iteritems()]

    for v, k in reversed(sorted(pairs)):
    print "%-10s: %d" % (k, v)


    defaultdict makes the frequency accumulation neater.

    reversed(sorted(pairs)) avoids the little -v hack and makes it more
    obvious what you are doing. Of course this could also be achieved by
    doing pairs.sort() and pairs.reverse() before iterating over the pairs
    list.

    Cheers,

    --
    Ant.
  • Bruno Desthuilliers at Jan 9, 2008 at 12:27 pm

    Ant a ?crit :
    I'm interested to learn how more experienced Python folks would solve
    this little problem.
    I think I'd do the following:

    from collections import defaultdict

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = defaultdict(lambda: 0)

    for name in names.split():
    freq[name] += 1

    pairs = [(v, k) for k, v in freq.iteritems()]

    for v, k in reversed(sorted(pairs)):
    print "%-10s: %d" % (k, v)


    defaultdict makes the frequency accumulation neater.

    reversed(sorted(pairs)) avoids the little -v hack and makes it more
    obvious what you are doing.
    But fails to implement the specs (emphasis is mine):
    """
    produce a frequency table of names, sorted descending by frequency.
    *then ascending by name*. For the above data, the output should be:

    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1
    """

    With your solution, you get:

    kevin : 3
    jock : 2
    freddy : 1
    fred : 1
    bill : 1
    andrew : 1
  • Peter Otten at Jan 9, 2008 at 11:43 am

    Andrew Savige wrote:

    I'm learning Python by reading David Beazley's "Python Essential
    Reference" book and writing a few toy programs. To get a feel for hashes
    and sorting, I set myself this little problem today (not homework, BTW):

    Given a string containing a space-separated list of names:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"

    produce a frequency table of names, sorted descending by frequency.
    then ascending by name. For the above data, the output should be:

    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1

    Here's my first attempt:

    names = "freddy fred bill jock kevin andrew kevin kevin jock" freq = {}
    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)
    deco = zip([-x for x in freq.values()], freq.keys()) deco.sort() for v,
    k in deco:
    print "%-10s: %d" % (k, -v)

    I'm interested to learn how more experienced Python folks would solve
    this little problem. Though I've read about the DSU Python sorting
    idiom, I'm not sure I've strictly applied it above ... and the -x hack
    above to achieve a descending sort feels a bit odd to me, though I
    couldn't think of a better way to do it.
    You can specify a reverse sort with

    deco.sort(reverse=True)

    Newer versions of Python have the whole idiom built in:
    items = freq.items()
    from operator import itemgetter
    items.sort(key=itemgetter(1), reverse=True)
    for item in items:
    ... print "%-10s: %d" % item
    ...
    kevin : 3
    jock : 2
    bill : 1
    andrew : 1
    fred : 1
    freddy : 1

    You can pass an arbitrary function as key. itemgetter(1) is equivalent to

    def key(item): return item[1]
    I also have a few specific questions. Instead of:

    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)

    I might try:

    for name in names.split():
    try:
    freq[name] += 1
    except KeyError:
    freq[name] = 1

    Which is preferred?
    I have no strong opinion about that. Generally speaking try...except is
    faster when you have many hits, i. e. the except suite is rarely invoked.
    Starting with Python 2.5 you can alternatively use

    from collections import defaultdict
    freq = defaultdict(int)
    for name in names.split():
    freq[name] += 1
    Ditto for:

    deco = zip([-x for x in freq.values()], freq.keys())

    versus:

    deco = zip(map(operator.neg, freq.values()), freq.keys())
    I think the list comprehension is slightly more readable.
    Finally, I might replace:

    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    with:

    print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)
    Again, I find the explicit for loop more readable, but sometimes use the
    genexp, too.

    Peter
  • Andrew Savige at Jan 9, 2008 at 12:08 pm

    Fredrik Lundh wrote:

    # sort items on descending count
    deco = sorted(freq.items(), key=lambda x: -x[1])
    Neat. Is there any way to use sorted() with multiple sort keys? ...
    Given that the spec calls for sorting by _two_ keys: first by
    frequency (descending), then by name (ascending). To clarify:

    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1

    is correct, while:

    kevin : 3
    jock : 2
    bill : 1
    andrew : 1
    fred : 1
    freddy : 1

    is incorrect because "andrew 1" must appear before "bill 1".
    Finally, I might replace:

    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    with:

    print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)
    why?
    No good reason, unless performance is significantly different.
    The first seems clearer to me. Just wondering what other folks think.

    Thanks,
    /-\


    Make the switch to the world's best email. Get the new Yahoo!7 Mail now. www.yahoo7.com.au/worldsbestemail
  • Fredrik Lundh at Jan 9, 2008 at 12:14 pm

    Andrew Savige wrote:

    Neat. Is there any way to use sorted() with multiple sort keys? ...
    sure! just return the "composite key" as a tuple:

    # sort on descending count, ascending name
    deco = sorted(freq.items(), key=lambda x: (-x[1], x[0]))

    (when comparing tuples, Python first compares the first item from each
    tuple. if they're equal, it continues with the second item, and so on).

    </F>
  • Bruno Desthuilliers at Jan 9, 2008 at 12:19 pm

    Andrew Savige a ?crit :
    I'm learning Python by reading David Beazley's "Python Essential Reference"
    book and writing a few toy programs. To get a feel for hashes and sorting,
    I set myself this little problem today (not homework, BTW):

    Given a string containing a space-separated list of names:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"

    produce a frequency table of names, sorted descending by frequency.
    then ascending by name. For the above data, the output should be:

    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1

    Here's my first attempt:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = {}
    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)
    deco = zip([-x for x in freq.values()], freq.keys())
    deco.sort()
    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    I'm interested to learn how more experienced Python folks would solve
    this little problem.
    For a one-shot Q&D script:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freqs = [(- names.count(name), name) for name in set(names.split())]
    print "\n".join("%-10s : %s" % (n, -f) for f, n in sorted(freqs))


    Now I might choose a very different solution for a more serious
    application, depending on detailed specs and intended use of the
    "frequency table".
    Though I've read about the DSU Python sorting idiom,
    I'm not sure I've strictly applied it above ...
    Perhaps not "strictly" since you don't really "undecorate", but that's
    another application of the same principle : provided the appropriate
    data structure, sort() (or sorted()) will do the right thing.

    and the -x hack above to
    achieve a descending sort feels a bit odd to me, though I couldn't think
    of a better way to do it.
    The "other" way would be to pass a custom comparison callback to sort,
    which would be both slower and more complicated. Your solution is IMHO
    the right thing to do here.
    I also have a few specific questions. Instead of:

    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)

    I might try:

    for name in names.split():
    try:
    freq[name] += 1
    except KeyError:
    freq[name] = 1
    or a couple other solutions, including a defaultdict (python >= 2.5).
    Which is preferred?
    It's a FAQ - or it should be one. Globally: the second one tends to be
    faster when there's no exception (ie the key already exists), but slower
    when exceptions happen. So it mostly depends on what you expect your
    dataset to be.

    Now note that you don't necessarily need a dict here !-)
    Ditto for:

    deco = zip([-x for x in freq.values()], freq.keys())

    versus:

    deco = zip(map(operator.neg, freq.values()), freq.keys())
    As far as I'm concerned, I'd favor the first solution here. Reads better
    IMHO
    Finally, I might replace:

    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    with:

    print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)
    That's what I'd do here too - but it depends on context (ie: for huge
    datasets and/or complex formating, i'd use a for loop).
  • MRAB at Jan 9, 2008 at 11:45 pm

    On Jan 9, 12:19 pm, Bruno Desthuilliers <bruno. 42.desthuilli... at wtf.websiteburo.oops.com> wrote:
    Andrew Savige a ?crit :


    I'm learning Python by reading David Beazley's "Python Essential Reference"
    book and writing a few toy programs. To get a feel for hashes and sorting,
    I set myself this little problem today (not homework, BTW):
    Given a string containing a space-separated list of names:
    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    produce a frequency table of names, sorted descending by frequency.
    then ascending by name. For the above data, the output should be:
    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1
    Here's my first attempt:
    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = {}
    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)
    deco = zip([-x for x in freq.values()], freq.keys())
    deco.sort()
    for v, k in deco:
    print "%-10s: %d" % (k, -v)
    I'm interested to learn how more experienced Python folks would solve
    this little problem.
    For a one-shot Q&D script:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freqs = [(- names.count(name), name) for name in set(names.split())]
    print "\n".join("%-10s : %s" % (n, -f) for f, n in sorted(freqs))
    [snip]
    That actually prints:

    kevin : 3
    fred : 2
    jock : 2
    andrew : 1
    bill : 1
    freddy : 1

    It says that "fred" occurs twice because of "freddy".

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    name_list = names.split()
    freqs = [(- name_list.count(name), name) for name in set(name_list)]
    print "\n".join("%-10s : %s" % (n, -f) for f, n in sorted(freqs))
  • Bruno Desthuilliers at Jan 10, 2008 at 8:41 am

    MRAB a ?crit :
    On Jan 9, 12:19 pm, Bruno Desthuilliers <bruno.
    42.desthuilli... at wtf.websiteburo.oops.com> wrote: (snip)
    That actually prints:

    kevin : 3
    fred : 2
    jock : 2
    andrew : 1
    bill : 1
    freddy : 1

    It says that "fred" occurs twice because of "freddy".
    oops ! My bad, didn't spot that one :(

    Thanks for pointing this out.
  • Peter Otten at Jan 9, 2008 at 12:22 pm

    Andrew Savige wrote:

    Fredrik Lundh wrote:
    # sort items on descending count
    deco = sorted(freq.items(), key=lambda x: -x[1])
    Neat. Is there any way to use sorted() with multiple sort keys? ...
    Given that the spec calls for sorting by _two_ keys: first by
    frequency (descending), then by name (ascending). To clarify:

    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1

    is correct, while:

    kevin : 3
    jock : 2
    bill : 1
    andrew : 1
    fred : 1
    freddy : 1

    is incorrect because "andrew 1" must appear before "bill 1".
    You can sort twice (on the name, then on the number) or use a more complex
    key:
    for n, v in sorted(freq.items(), key=lambda (n, v): (-v, n)):
    ... print n, v
    ...
    kevin 3
    jock 2
    andrew 1
    bill 1
    fred 1
    freddy 1
    items = freq.items()
    items.sort(key=itemgetter(0))
    items.sort(key=itemgetter(1), reverse=True)
    for n, v in items:
    ... print n, v
    ...
    kevin 3
    jock 2
    andrew 1
    bill 1
    fred 1
    freddy 1

    Peter
  • Paul Hankin at Jan 9, 2008 at 11:56 pm

    On Jan 9, 12:19?pm, Bruno Desthuilliers <bruno. 42.desthuilli... at wtf.websiteburo.oops.com> wrote:
    Andrew Savige a ?crit :
    and the -x hack above to
    achieve a descending sort feels a bit odd to me, though I couldn't think
    of a better way to do it.
    The "other" way would be to pass a custom comparison callback to sort,
    which would be both slower and more complicated. Your solution is IMHO
    the right thing to do here.
    Both list.sort and sorted have a parameter 'reverse' which does a
    descending rather than ascending sort.

    eg.
    deco.sort(reverse=True)
    for v, k in deco:
    ...

    or
    for v, k in sorted(deco, reverse=True):
    ...

    --
    Paul Hankin
  • Bruno Desthuilliers at Jan 10, 2008 at 8:47 am

    Paul Hankin a ?crit :
    On Jan 9, 12:19 pm, Bruno Desthuilliers <bruno.
    42.desthuilli... at wtf.websiteburo.oops.com> wrote:
    Andrew Savige a ?crit :
    and the -x hack above to
    achieve a descending sort feels a bit odd to me, though I couldn't think
    of a better way to do it.
    The "other" way would be to pass a custom comparison callback to sort,
    which would be both slower and more complicated. Your solution is IMHO
    the right thing to do here.
    Both list.sort and sorted have a parameter 'reverse' which does a
    descending rather than ascending sort.
    Yes. But here the specs says that sort must be descending on first key
    (frequency) and ascending on the second (name), so you can't just use
    reverse sort. FWIW, a correct (and readable) solution - based on the
    'key' param - has been given by Peter Otten, but it still requires a
    callback function, so while it avoids the '-x hack', it's still more
    expensive than a plain sort.
  • Rent at Jan 11, 2008 at 10:27 am
    import collections

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = collections.defaultdict(int)
    for name in names.split():
    freq[name] += 1
    keys = freq.keys()
    keys.sort(key = freq.get, reverse = True)
    for k in keys:
    print "%-10s: %d" % (k, freq[k])
    On Jan 9, 6:58 pm, Andrew Savige wrote:
    I'm learning Python by reading David Beazley's "Python Essential Reference"
    book and writing a few toy programs. To get a feel for hashes and sorting,
    I set myself this little problem today (not homework, BTW):

    Given a string containing a space-separated list of names:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"

    produce a frequency table of names, sorted descending by frequency.
    then ascending by name. For the above data, the output should be:

    kevin : 3
    jock : 2
    andrew : 1
    bill : 1
    fred : 1
    freddy : 1

    Here's my first attempt:

    names = "freddy fred bill jock kevin andrew kevin kevin jock"
    freq = {}
    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)
    deco = zip([-x for x in freq.values()], freq.keys())
    deco.sort()
    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    I'm interested to learn how more experienced Python folks would solve
    this little problem. Though I've read about the DSU Python sorting idiom,
    I'm not sure I've strictly applied it above ... and the -x hack above to
    achieve a descending sort feels a bit odd to me, though I couldn't think
    of a better way to do it.

    I also have a few specific questions. Instead of:

    for name in names.split():
    freq[name] = 1 + freq.get(name, 0)

    I might try:

    for name in names.split():
    try:
    freq[name] += 1
    except KeyError:
    freq[name] = 1

    Which is preferred?

    Ditto for:

    deco = zip([-x for x in freq.values()], freq.keys())

    versus:

    deco = zip(map(operator.neg, freq.values()), freq.keys())

    Finally, I might replace:

    for v, k in deco:
    print "%-10s: %d" % (k, -v)

    with:

    print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

    Any feedback on good Python style, performance tips, good books
    to read, etc. is appreciated.

    Thanks,
    /-\

    Make the switch to the world's best email. Get the new Yahoo!7 Mail now.www.yahoo7.com.au/worldsbestemail
  • Mike Meyer at Jan 11, 2008 at 4:17 pm

    On 11 Jan 2008 03:50:53 -0800 Paul Rubin wrote:

    rent <rentlong at gmail.com> writes:
    keys = freq.keys()
    keys.sort(key = freq.get, reverse = True)
    for k in keys:
    print "%-10s: %d" % (k, freq[k])
    I prefer (untested):

    def snd((x,y)): return y # I wish this was built-in
    What's wrong with operator.itemgetter?
    sorted_freq = sorted(freq.iteritems(), key=snd, reverse=True)
    (still untested)

    from operator import itemgetter
    sorted_freq = sorted(freq.iteritems(), key=itemgetter(2), reverse=True)

    <mike

    --
    Mike Meyer <mwm at mired.org> http://www.mired.org/consulting.html
    Independent Network/Unix/Perforce consultant, email for more information.
  • Hrvoje Niksic at Jan 11, 2008 at 5:11 pm

    Mike Meyer <mwm at mired.org> writes:
    On 11 Jan 2008 03:50:53 -0800 Paul Rubin wrote:

    rent <rentlong at gmail.com> writes:
    keys = freq.keys()
    keys.sort(key = freq.get, reverse = True)
    for k in keys:
    print "%-10s: %d" % (k, freq[k])
    I prefer (untested):

    def snd((x,y)): return y # I wish this was built-in
    What's wrong with operator.itemgetter?
    sorted_freq = sorted(freq.iteritems(), key=snd, reverse=True)
    (still untested)

    from operator import itemgetter
    sorted_freq = sorted(freq.iteritems(), key=itemgetter(2), reverse=True)
    It should be itemgetter(1). See how easy it is to get it wrong? :-)


    (Okay, this was too easy a shot to miss out on; I actually like
    itemgetter.)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJan 9, '08 at 10:58a
activeJan 11, '08 at 5:11p
posts16
users10
websitepython.org

People

Translate

site design / logo © 2022 Grokbase