Summing more than two information frames with the same indexes in Pandas
1 answer
Option 1
Usesum
sum([df1, df2, df3, df4])
Option 2
Usereduce
from functools import reduce
reduce(pd.DataFrame.add, [df1, df2, df3, df4])
Option 3
Use pd.concat
and pd.DataFrame.sum
With level=1
This only works when there is one level for data indexes. We need to get a little prettier to make it work. I recommend other options.
pd.concat(dict(enumerate([df1, df2, df3, df4]))).sum(level=1)
Customization
df = pd.DataFrame([[1, -1], [complex(0, 1), complex(0, -1)]])
df1, df2, df3, df4 = [df] * 4
Demo
sum([df1, df2, df3, df4])
0 1
0 (4+0j) (-4+0j)
1 4j -4j
from functools import reduce
reduce(pd.DataFrame.add, [df1, df2, df3, df4])
0 1
0 (4+0j) (-4+0j)
1 4j -4j
pd.concat(dict(enumerate([df1, df2, df3, df4]))).sum(level=1)
0 1
0 (4+0j) (-4+0j)
1 4j -4j
Timing
small data
%timeit sum([df1, df2, df3, df4])
%timeit reduce(pd.DataFrame.add, [df1, df2, df3, df4])
%timeit pd.concat(dict(enumerate([df1, df2, df3, df4]))).sum(level=1)
1000 loops, best of 3: 591 ยตs per loop
1000 loops, best of 3: 456 ยตs per loop
100 loops, best of 3: 3.61 ms per loop
big data
df = pd.DataFrame([[1, -1], [complex(0, 1), complex(0, -1)]])
df = pd.concat([df] * 1000, ignore_index=True)
df = pd.concat([df] * 100, axis=1, ignore_index=True)
df1, df2, df3, df4 = [df] * 4
%timeit sum([df1, df2, df3, df4])
%timeit reduce(pd.DataFrame.add, [df1, df2, df3, df4])
%timeit pd.concat(dict(enumerate([df1, df2, df3, df4]))).sum(level=1)
100 loops, best of 3: 3.94 ms per loop
100 loops, best of 3: 2.9 ms per loop
1 loop, best of 3: 1min per loop
+8
source to share