Split data values ββinto a specific number of groups and apply a function - pandas
df=pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9])
I would like to divide df
by the specified number of groups and sum all the items in each group. For example, dividing df
into 4 groups
1,4,1,3 2,8,3,6 3,7,3,1 2,9
will lead to
9 19 14 11
I could do df.groupby(np.arange(len(df))//4).sum()
, but it won't work for larger dataframes
for example
df1=pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9,1,5,3,4]) df1.groupby(np.arange(len(df1))//4).sum()
creates 5 groups instead of 4
source to share
You can use numpy.array_split
:
df=pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9,1,5,3,4])
a = pd.Series([x.values.sum() for x in np.array_split(df, 4)])
print (a)
0 11
1 27
2 15
3 13
dtype: int64
Solution with concat
and sum
:
a = pd.concat(np.array_split(df, 4), keys=np.arange(4)).sum(level=0)
print (a)
0
0 11
1 27
2 15
3 13
source to share
I looked through the comments and I thought that you can use explicit code python
when normal pandas functions cannot satisfy your needs.
So:
import pandas as pd
def get_sum(a, chunks):
for k in range(0, len(df), chunks):
yield a[k:k+chunks].values.sum()
df = pd.DataFrame([1,4,1,3,2,8,3,6,3,7,3,1,2,9])
group_size = list(get_sum(df, 4))
print(group_size)
Output:
[9, 19, 14, 11]
source to share