FAQ
Hi there!

I'm having trouble to wrap my brain around this kind of problem:

What I have :
1) list of dicts
2) list of keys that i would like to be my grouping arguments of
elements from 1)
3) list of keys that i would like do "aggregation" on the elements
of 1) with some function e.g. sum

For instance i got:
1) [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
{ 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
{'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}, ... ]
2) ['g1', 'g2']
3) ['s_v1', 's_v2']

To be precise 1) is a result of a values_list method from a QuerySet
in Django; 2) is the arguments for that method; 3) those are the
annotation keys. so 1) is a result of:
qs.values_list('g1', 'g2').annotate(s_v1=Sum('v1'), s_v2=Sum('v2'))

What i want to have is:
a "big" nested dictionary with 'g1' values as 1st level keys and a
dictionary of aggregates and "subgroups" in it.

In my example it would be something like this:
{
1 : {
's_v1' : 7.0,
's_v2' : 6.5,
'g2' :{
8 : {
's_v1' : 5.0,
's_v2' : 3.5 },
9 : {
's_v1' : 2.0,
's_v2' : 3.0 }
}
},
2 : {
's_v1' : 6.0,
's_v2' : 8.0,
'g2' : {
8 : {
's_v1' : 6.0,
's_v2' : 8.0}
}
},
...
}

# notice the summed values of s_v1 and s_v2 when g1 == 1

I was looking for a solution that would let me do that kind of
grouping with variable lists of 2) and 3) i.e. having also 'g3' as
grouping element so the 'g2' dicts could also have their own
"subgroup" and be even more nested then.
I was trying something with itertools.groupby and updating nested
dicts, but as i was writing the code it started to feel too verbose to
me :/

Do You have any hints maybe? because i'm kind of stucked :/

Regards

S?awek

Search Discussions

  • Steven D'Aprano at Feb 5, 2011 at 2:27 am

    On Fri, 04 Feb 2011 15:14:24 -0800, Slafs wrote:

    Hi there!

    I'm having trouble to wrap my brain around this kind of problem:
    Perhaps you should consider backing up and staring from somewhere else
    with different input data, or changing the requirements. Just a thought.

    What I have :
    1) list of dicts
    2) list of keys that i would like to be my grouping arguments of
    elements from 1)
    3) list of keys that i would like do "aggregation" on the elements
    of 1) with some function e.g. sum

    You start with data:

    dicts = [ {'g1': 1, 'g2': 8, 's_v1': 5.0, 's_v2': 3.5},
    {'g1': 1, 'g2': 9, 's_v1': 2.0, 's_v2': 3.0},
    {'g1': 2, 'g2': 8, 's_v1': 6.0, 's_v2': 8.0} ]


    It sometimes helps me to think about data structures by drawing them out.
    In this case, you have what is effectively a two-dimensional table:

    g1 g2 s_v1 s_v2
    === === ===== ====
    1 8 5.0 3.5
    1 9 2.0 3.0
    2 8 6.0 8.0


    Nice and simple. But the result you want is a bit more complex -- it's a
    dict of dicts of dicts:

    {1: {'s_v1': 7.0, 's_v2': 6.5,
    'g2': {8: {'s_v1': 5.0, 's_v2': 3.5},
    9: {'s_v1': 2.0, 's_v2': 3.0}
    }},
    2: {'s_v1': 6.0, 's_v2': 8.0,
    'g2': {8: {'s_v1' : 6.0, 's_v2': 8.0}
    }}}


    (I quote from the Zen of Python: "Flat is better than nested." Hmmm.)

    which is equivalent to a *four* dimensional table, which is a bit hard to
    write out :)

    Here's a two-dimensional projection of a single slice with key = 1:

    s_v1 s_v2 g2
    ===== ===== =====
    7.0 6.5 | s_v1 s_v2
    ---------------
    8 | 5.0 3.5
    9 | 2.0 3.0


    Does this help you to either (1) redesign your data structures, or (2)
    work out how to go from there?

    [...]
    I was looking for a solution that would let me do that kind of grouping
    with variable lists of 2) and 3) i.e. having also 'g3' as grouping
    element so the 'g2' dicts could also have their own "subgroup" and be
    even more nested then. I was trying something with itertools.groupby and
    updating nested dicts, but as i was writing the code it started to feel
    too verbose to me :/
    I don't think groupby is the tool you want. It groups *consecutive* items
    in sequences:

    from itertools import groupby
    for key, it in groupby([1,1,1,2,3,4,3,3,3,5,1]):
    ... print(key, list(it))
    ...
    1 [1, 1, 1]
    2 [2]
    3 [3]
    4 [4]
    3 [3, 3, 3]
    5 [5]
    1 [1]


    Except for the name, I don't see any connection between this and what you
    want to do.

    The approach I would take is a top-down approach:

    dicts = [ ... ] # list of dicts, as above.
    result = {}
    for d in dicts:
    # process each dict in isolation
    temp = process(d)
    merge(result, temp)


    merge() hopefully should be straight forward, and process only needs to
    look at one dict at a time.



    --
    Steven
  • Paul Rubin at Feb 5, 2011 at 4:58 am

    Slafs <slafs.e at gmail.com> writes:
    What i want to have is:
    a "big" nested dictionary with 'g1' values as 1st level keys and a
    dictionary of aggregates and "subgroups" in it....

    I was looking for a solution that would let me do that kind of
    grouping with variable lists of 2) and 3) i.e. having also 'g3' as
    grouping element so the 'g2' dicts could also have their own
    "subgroup" and be even more nested then.
    I was trying something with itertools.groupby and updating nested
    dicts, but as i was writing the code it started to feel too verbose to
    me :/

    Do You have any hints maybe? because i'm kind of stucked :/
    I'm not sure I understood the problem and it would help if you gave
    sample data with the deeper nesting that you describe. But the
    following messy code matches the sample that you did give:

    from pprint import pprint
    from itertools import groupby

    x1 = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
    { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
    { 'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}
    ]
    x2 = ['g1', 'g2']
    x3 = ['s_v1', 's_v2']

    def agg(xdata, group_keys, agg_keys):
    if not group_keys:
    return {}
    k0, ks = group_keys[0], group_keys[1:]
    r = {}
    def gk(d): return d[k0]
    for k, g in groupby(sorted(xdata, key=gk), gk):
    gs = list(g)
    aggs = dict((ak,sum(d[ak] for d in gs)) for ak in agg_keys)
    r[k] = aggs
    if ks:
    r[k][ks[0]] = agg(gs,group_keys[1:], agg_keys)
    return r

    pprint (agg(x1, x2, x3))
  • Slafs at Feb 5, 2011 at 10:34 am

    On 5 Lut, 05:58, Paul Rubin wrote:
    Slafs <slaf... at gmail.com> writes:
    What i want to have is:
    a "big" nested dictionary with 'g1' values as 1st level keys and a
    dictionary of aggregates and "subgroups" in it....
    I was looking for a solution that would let me do that kind of
    grouping with variable lists of 2) and 3) i.e. having also 'g3' as
    grouping element so the 'g2' dicts could also have their own
    "subgroup" and be even more nested then.
    I was trying something with itertools.groupby and updating nested
    dicts, but as i was writing the code it started to feel too verbose to
    me :/
    Do You have any hints maybe? because i'm kind of stucked :/
    I'm not sure I understood the problem and it would help if you gave
    sample data with the deeper nesting that you describe. ?But the
    following messy code matches the sample that you did give:

    ? ? from pprint import pprint
    ? ? from itertools import groupby

    ? ? x1 = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
    ? ? ? ? ? ? ? { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
    ? ? ? ? ? ? ? { 'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}
    ? ? ? ? ? ? ? ]
    ? ? x2 = ['g1', 'g2']
    ? ? x3 = ['s_v1', 's_v2']

    ? ? def agg(xdata, group_keys, agg_keys):
    ? ? ? ? if not group_keys:
    ? ? ? ? ? ? return {}
    ? ? ? ? k0, ks = group_keys[0], group_keys[1:]
    ? ? ? ? r = {}
    ? ? ? ? def gk(d): return d[k0]
    ? ? ? ? for k, g in groupby(sorted(xdata, key=gk), gk):
    ? ? ? ? ? ? gs = list(g)
    ? ? ? ? ? ? aggs = dict((ak,sum(d[ak] for d in gs)) for ak in agg_keys)
    ? ? ? ? ? ? r[k] = aggs
    ? ? ? ? ? ? if ks:
    ? ? ? ? ? ? ? ? r[k][ks[0]] = agg(gs,group_keys[1:], agg_keys)
    ? ? ? ? return r

    ? ? pprint (agg(x1, x2, x3))
    Thank you both Steven and Paul for your replies.

    @Steven:
    Perhaps you should consider backing up and staring from somewhere else
    with different input data, or changing the requirements. Just a thought.
    I think it's not the issue. The data as you noticed i well structured
    (as a table for instance) and I don't think I can go better than that.
    I don't think groupby is the tool you want. It groups *consecutive* items
    in sequences:
    I was using groupby just like in Paul's code.

    @Paul:
    OMG. I think this is it! (getting my jaw from the floor...)
    The funny part is that I was kind of close to this solution ;). I was
    considering the use of recursion for this.

    Thank You so much!
  • Peter Otten at Feb 5, 2011 at 12:12 pm

    Slafs wrote:

    Hi there!

    I'm having trouble to wrap my brain around this kind of problem:

    What I have :
    1) list of dicts
    2) list of keys that i would like to be my grouping arguments of
    elements from 1)
    3) list of keys that i would like do "aggregation" on the elements
    of 1) with some function e.g. sum

    For instance i got:
    1) [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
    { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
    {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}, ... ]
    2) ['g1', 'g2']
    3) ['s_v1', 's_v2']

    To be precise 1) is a result of a values_list method from a QuerySet
    in Django; 2) is the arguments for that method; 3) those are the
    annotation keys. so 1) is a result of:
    qs.values_list('g1', 'g2').annotate(s_v1=Sum('v1'), s_v2=Sum('v2'))

    What i want to have is:
    a "big" nested dictionary with 'g1' values as 1st level keys and a
    dictionary of aggregates and "subgroups" in it.

    In my example it would be something like this:
    {
    1 : {
    's_v1' : 7.0,
    's_v2' : 6.5,
    'g2' :{
    8 : {
    's_v1' : 5.0,
    's_v2' : 3.5 },
    9 : {
    's_v1' : 2.0,
    's_v2' : 3.0 }
    }
    },
    2 : {
    's_v1' : 6.0,
    's_v2' : 8.0,
    'g2' : {
    8 : {
    's_v1' : 6.0,
    's_v2' : 8.0}
    }
    },
    ...
    }

    # notice the summed values of s_v1 and s_v2 when g1 == 1

    I was looking for a solution that would let me do that kind of
    grouping with variable lists of 2) and 3) i.e. having also 'g3' as
    grouping element so the 'g2' dicts could also have their own
    "subgroup" and be even more nested then.
    I was trying something with itertools.groupby and updating nested
    dicts, but as i was writing the code it started to feel too verbose to
    me :/

    Do You have any hints maybe? because i'm kind of stucked :/

    Regards

    S?awek
    Not super-efficient, but simple:

    $ cat python sumover.py
    cat: python: No such file or directory
    data = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
    { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
    {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}]
    sum_over = ["s_v1", "s_v2"]
    group_by = ["g1", "g2"]

    wanted = {
    1 : {
    's_v1' : 7.0,
    's_v2' : 6.5,
    'g2' :{
    8 : {
    's_v1' : 5.0,
    's_v2' : 3.5 },
    9 : {
    's_v1' : 2.0,
    's_v2' : 3.0 }
    }
    },
    2 : {
    's_v1' : 6.0,
    's_v2' : 8.0,
    'g2' : {
    8 : {
    's_v1' : 6.0,
    's_v2' : 8.0}
    }
    },
    }

    def calc(data, group_by, sum_over):
    tree = {}
    group_by = group_by + [None]
    for item in data:
    d = tree
    for g in group_by:
    for so in sum_over:
    d[so] = d.get(so, 0.0) + item[so]
    if g:
    d = d.setdefault(g, {}).setdefault(item[g], {})
    return tree

    got = calc(data, group_by, sum_over)[group_by[0]]
    assert got == wanted
    $ python sumover.py
    $

    Untested.
  • Nn at Feb 7, 2011 at 4:52 pm

    On Feb 5, 7:12?am, Peter Otten wrote:
    Slafs wrote:
    Hi there!
    I'm having trouble to wrap my brain around this kind of problem:
    What I have :
    ? 1) list of dicts
    ? 2) list of keys that i would like to be my grouping arguments of
    elements from 1)
    ? 3) list of keys that i would like do "aggregation" on the elements
    of 1) with some function e.g. sum
    For instance i got:
    1) [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
    ? ? ? { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
    ? ? ? {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}, ... ]
    2) ['g1', 'g2']
    3) ['s_v1', 's_v2']
    To be precise 1) is a result of a values_list method from a QuerySet
    in Django; 2) is the arguments for that method; 3) those are the
    annotation keys. so 1) is a result of:
    ? ?qs.values_list('g1', 'g2').annotate(s_v1=Sum('v1'), s_v2=Sum('v2'))
    What i want to have is:
    a "big" nested dictionary with 'g1' values as 1st level keys and a
    dictionary of aggregates and "subgroups" in it.
    In my example it would be something like this:
    {
    ? 1 : {
    ? ? ? ? ? 's_v1' : 7.0,
    ? ? ? ? ? 's_v2' : 6.5,
    ? ? ? ? ? 'g2' :{
    ? ? ? ? ? ? ? ? ? ?8 : {
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v1' : 5.0,
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v2' : 3.5 },
    ? ? ? ? ? ? ? ? ? ?9 : ?{
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v1' : 2.0,
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v2' : 3.0 }
    ? ? ? ? ? ? ? ? }
    ? ? ? ?},
    ? 2 : {
    ? ? ? ? ? ?'s_v1' : 6.0,
    ? ? ? ? ? ?'s_v2' : 8.0,
    ? ? ? ? ? ?'g2' : {
    ? ? ? ? ? ? ? ? ? ? 8 : {
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v1' : 6.0,
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v2' : 8.0}
    ? ? ? ? ? ?}
    ? ? ? ?},
    ...
    }
    # notice the summed values of s_v1 and s_v2 when g1 == 1
    I was looking for a solution that would let me do that kind of
    grouping with variable lists of 2) and 3) i.e. having also 'g3' as
    grouping element so the 'g2' dicts could also have their own
    "subgroup" and be even more nested then.
    I was trying something with itertools.groupby and updating nested
    dicts, but as i was writing the code it started to feel too verbose to
    me :/
    Do You have any hints maybe? because i'm kind of stucked :/
    Regards
    S?awek
    Not super-efficient, but simple:

    $ cat python sumover.py
    cat: python: No such file or directory
    data = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
    ? ? ? ? ?{ 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
    ? ? ? ? ?{'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}]
    sum_over = ["s_v1", "s_v2"]
    group_by = ["g1", "g2"]

    wanted = {
    ? 1 : {
    ? ? ? ? ? 's_v1' : 7.0,
    ? ? ? ? ? 's_v2' : 6.5,
    ? ? ? ? ? 'g2' :{
    ? ? ? ? ? ? ? ? ? ?8 : {
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v1' : 5.0,
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v2' : 3.5 },
    ? ? ? ? ? ? ? ? ? ?9 : ?{
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v1' : 2.0,
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v2' : 3.0 }
    ? ? ? ? ? ? ? ? }
    ? ? ? ?},
    ? 2 : {
    ? ? ? ? ? ?'s_v1' : 6.0,
    ? ? ? ? ? ?'s_v2' : 8.0,
    ? ? ? ? ? ?'g2' : {
    ? ? ? ? ? ? ? ? ? ? 8 : {
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v1' : 6.0,
    ? ? ? ? ? ? ? ? ? ? ? ? ? 's_v2' : 8.0}
    ? ? ? ? ? ?}
    ? ? ? ?},

    }

    def calc(data, group_by, sum_over):
    ? ? tree = {}
    ? ? group_by = group_by + [None]
    ? ? for item in data:
    ? ? ? ? d = tree
    ? ? ? ? for g in group_by:
    ? ? ? ? ? ? for so in sum_over:
    ? ? ? ? ? ? ? ? d[so] = d.get(so, 0.0) + item[so]
    ? ? ? ? ? ? if g:
    ? ? ? ? ? ? ? ? d = d.setdefault(g, {}).setdefault(item[g], {})
    ? ? return tree

    got = calc(data, group_by, sum_over)[group_by[0]]
    assert got == wanted
    $ python sumover.py
    $

    Untested.
    Very clever. I didn't understand how it worked until I rewrote it like
    this:

    def calc(data, group_by, sum_over):
    tree = {}
    group_by = [None] + group_by
    for item in data:
    d = tree
    for g in group_by:
    if g:
    d = d.setdefault(g, {}).setdefault(item[g], {})
    for so in sum_over:
    d[so] = d.get(so, 0.0) + item[so]
    return tree


    Processing "None" in the last round of the loop was throwing me off.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedFeb 4, '11 at 11:14p
activeFeb 7, '11 at 4:52p
posts6
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase