Pandas: how to add bin values ββto original dataframe
I am new to Pandas and I have a dataframe as shown below
id values
1 2.1
2 0.8
3 1.0
4 3.2
And I want to split the "value" columns into different cells, for example bin = 2, and add a "counts" column that represents how many rows go to the basket, for example:
id values counts
1 2.1 2 (since 2.1 and 3.2 both belong to the bin 2-4)
2 0.8 2
3 1.0 2
4 3.2 2
I know that the value_counts function can calculate the frequency, but I don't know how to add them back to the original framework.
Any help is greatly appreciated!
source to share
Using numpy
searchsorted
to define cells and bincount
to count them.
It should be very fast.
# This defines the bin edges
# [1, 2, 3] would have created
# different bins
# v
b = np.searchsorted([2], df['values'].values)
df.assign(counts=np.bincount(b)[b])
id values counts
0 1 2.1 2
1 2 0.8 2
2 3 1.0 2
3 4 3.2 2
-
np.searchsorted
determines where in the first array each element of the second array should be placed to preserve the sort.- It means:
-
2.1
must come after2
, which is the position1
. -
0.8
must go to2
, which is the position0
. -
1.0
must go to2
, which is the position0
. -
3.2
must come after2
, which is the position1
.
-
np.bincount
conveniently calculates the integral bin frequency ... like the ones we just created. - Cutting the highlighted bins by the appearance of bins, we get
transform
ascount
source to share
Let's use pd.cut
and groupby
:
For two bins:
df.assign(counts=df.groupby(pd.cut(df['values'], bins=2))['values'].transform('count'))
Or if you want your bin size = 2:
df.assign(counts=df.groupby(pd.cut(df['values'], bins=[0,2,4]))['values'].transform('count'))
Output:
id values counts
0 1 2.1 2.0
1 2 0.8 2.0
2 3 1.0 2.0
3 4 3.2 2.0
source to share