Aggregating an array of objects by attribute
I have a list of dicts, each with two key / value pairs. I need to combine dicts that have the same value for the first key by summing the values โโof their second keys. For example:
[
{'foo': 34, 'bar': 2},
{'foo': 34, 'bar': 3},
{'foo': 35, 'bar': 1},
{'foo': 35, 'bar': 7},
{'foo': 35, 'bar': 2}
]
will come out as:
[
{'foo': 34, 'bar': 5},
{'foo': 35, 'bar': 10}
]
I wrote the following function, which works, but looks terribly verbose, and I'm pretty sure there is a cool pythonic trick that is cleaner and more efficient.
def combine(arr):
arr_out = []
if arr:
arr_out.append({'foo': arr[0]['foo'], 'bar': 0})
for i in range(len(arr)):
if arr[i]['foo'] == arr_out[-1]['foo']:
arr_out[-1]['bar'] += arr[i]['bar']
else:
arr_out.append({'foo': arr[i]['foo'], 'bar': arr[i]['bar']})
return arr_out
Anyone have any suggestions?
source to share
-
Group the values
bar
based on the valuefoo
and add them.>>> grouper = {} >>> for d in data: ... grouper[d["foo"]] = grouper.get(d["foo"], 0) + d["bar"] ... >>> grouper {34: 5, 35: 10}
-
Then rebuild the dicts list with a list like
>>> [{"foo": item, "bar": grouper[item]} for item in grouper] [{'foo': 34, 'bar': 5}, {'foo': 35, 'bar': 10}]
source to share
Using itertools.groupby
:
>>> arr = [
... {'foo': 34, 'bar': 2},
... {'foo': 34, 'bar': 3},
... {'foo': 35, 'bar': 1},
... {'foo': 35, 'bar': 7},
... {'foo': 35, 'bar': 2}
... ]
>>> import itertools
>>> key = lambda d: d['foo']
>>> [{'foo': key, 'bar': sum(d['bar'] for d in grp)}
... for key, grp in itertools.groupby(sorted(arr, key=key), key=key)]
[{'foo': 34, 'bar': 5}, {'foo': 35, 'bar': 10}]
If the list is already sorted, you can omit the call sorted
:
>>> [{'foo': key, 'bar': sum(d['bar'] for d in grp)}
... for key, grp in itertools.groupby(arr, key=key)]
[{'foo': 34, 'bar': 5}, {'foo': 35, 'bar': 10}]
source to share
This solution uses collections.defaultdict
:
def combine(arr):
c = collections.defaultdict(int)
for i in arr:
c[i['foo']] += i['bar']
# c == {34: 5, 35: 10}
return [{'foo': k, 'bar': c[k]} for k in sorted(c)]
The dictionary c
is the defaultdict value with "foo" as the key and "bar" as the value.
source to share