Pandas: how to add bin values ​​to original dataframe

I am new to Pandas and I have a dataframe as shown below

id    values   
 1       2.1
 2       0.8  
 3       1.0
 4       3.2

      

And I want to split the "value" columns into different cells, for example bin = 2, and add a "counts" column that represents how many rows go to the basket, for example:

id     values   counts
 1        2.1       2 (since 2.1 and 3.2 both belong to the bin 2-4)
 2        0.8       2 
 3        1.0       2
 4        3.2       2

      

I know that the value_counts function can calculate the frequency, but I don't know how to add them back to the original framework.

Any help is greatly appreciated!

+3


source to share


2 answers


Using numpy

searchsorted

to define cells and bincount

to count them.
It should be very fast.

#         This defines the bin edges
#        [1, 2, 3] would have created
#               different bins
#                    v
b = np.searchsorted([2], df['values'].values)
df.assign(counts=np.bincount(b)[b])

   id  values  counts
0   1     2.1       2
1   2     0.8       2
2   3     1.0       2
3   4     3.2       2

      




  • np.searchsorted

    determines where in the first array each element of the second array should be placed to preserve the sort.
    • It means:
    • 2.1

      must come after 2

      , which is the position 1

      .
    • 0.8

      must go to 2

      , which is the position 0

      .
    • 1.0

      must go to 2

      , which is the position 0

      .
    • 3.2

      must come after 2

      , which is the position 1

      .
  • np.bincount

    conveniently calculates the integral bin frequency ... like the ones we just created.
  • Cutting the highlighted bins by the appearance of bins, we get transform

    ascount

+5


source


Let's use pd.cut

and groupby

:

For two bins:

df.assign(counts=df.groupby(pd.cut(df['values'], bins=2))['values'].transform('count'))

      

Or if you want your bin size = 2:



df.assign(counts=df.groupby(pd.cut(df['values'], bins=[0,2,4]))['values'].transform('count'))

      

Output:

   id  values  counts
0   1     2.1     2.0
1   2     0.8     2.0
2   3     1.0     2.0
3   4     3.2     2.0

      

+3


source







All Articles