Aggregating an array of objects by attribute

I have a list of dicts, each with two key / value pairs. I need to combine dicts that have the same value for the first key by summing the values โ€‹โ€‹of their second keys. For example:

[
    {'foo': 34, 'bar': 2}, 
    {'foo': 34, 'bar': 3}, 
    {'foo': 35, 'bar': 1}, 
    {'foo': 35, 'bar': 7}, 
    {'foo': 35, 'bar': 2}
]

      

will come out as:

[
    {'foo': 34, 'bar': 5}, 
    {'foo': 35, 'bar': 10}
]

      

I wrote the following function, which works, but looks terribly verbose, and I'm pretty sure there is a cool pythonic trick that is cleaner and more efficient.

def combine(arr):
    arr_out = []
    if arr:
        arr_out.append({'foo': arr[0]['foo'], 'bar': 0})
        for i in range(len(arr)):
            if arr[i]['foo'] == arr_out[-1]['foo']:
                arr_out[-1]['bar'] += arr[i]['bar']
            else:
                arr_out.append({'foo': arr[i]['foo'], 'bar': arr[i]['bar']})
    return arr_out

      

Anyone have any suggestions?

+3


source to share


3 answers


  • Group the values bar

    based on the value foo

    and add them.

    >>> grouper = {}
    >>> for d in data:
    ...     grouper[d["foo"]] = grouper.get(d["foo"], 0) + d["bar"]
    ... 
    >>> grouper
    {34: 5, 35: 10}
    
          

  • Then rebuild the dicts list with a list like

    >>> [{"foo": item, "bar": grouper[item]} for item in grouper]
    [{'foo': 34, 'bar': 5}, {'foo': 35, 'bar': 10}]
    
          



+3


source


Using itertools.groupby

:

>>> arr = [
...     {'foo': 34, 'bar': 2},
...     {'foo': 34, 'bar': 3},
...     {'foo': 35, 'bar': 1},
...     {'foo': 35, 'bar': 7},
...     {'foo': 35, 'bar': 2}
... ]
>>> import itertools
>>> key = lambda d: d['foo']
>>> [{'foo': key, 'bar': sum(d['bar'] for d in grp)}
...  for key, grp in itertools.groupby(sorted(arr, key=key), key=key)]
[{'foo': 34, 'bar': 5}, {'foo': 35, 'bar': 10}]

      



If the list is already sorted, you can omit the call sorted

:

>>> [{'foo': key, 'bar': sum(d['bar'] for d in grp)}
...  for key, grp in itertools.groupby(arr, key=key)]
[{'foo': 34, 'bar': 5}, {'foo': 35, 'bar': 10}]

      

+5


source


This solution uses collections.defaultdict

:

def combine(arr):
    c = collections.defaultdict(int)
    for i in arr:
        c[i['foo']] += i['bar']
    # c == {34: 5, 35: 10}

    return [{'foo': k, 'bar': c[k]} for k in sorted(c)]

      

The dictionary c

is the defaultdict value with "foo" as the key and "bar" as the value.

0


source







All Articles