Pandas: how to add bin values to original dataframe

Question

Pandas: how to add bin values to original dataframe

I am new to Pandas and I have a dataframe as shown below

id    values   
 1       2.1
 2       0.8  
 3       1.0
 4       3.2

And I want to split the "value" columns into different cells, for example bin = 2, and add a "counts" column that represents how many rows go to the basket, for example:

id     values   counts
 1        2.1       2 (since 2.1 and 3.2 both belong to the bin 2-4)
 2        0.8       2 
 3        1.0       2
 4        3.2       2

I know that the value_counts function can calculate the frequency, but I don't know how to add them back to the original framework.

Any help is greatly appreciated!

+3

python numpy pandas

ELI June 11. 17 at 23:57

source to share

2 answers

Let's use pd.cut

and groupby

:

For two bins:

df.assign(counts=df.groupby(pd.cut(df['values'], bins=2))['values'].transform('count'))

Or if you want your bin size = 2:

df.assign(counts=df.groupby(pd.cut(df['values'], bins=[0,2,4]))['values'].transform('count'))

Output:

   id  values  counts
0   1     2.1     2.0
1   2     0.8     2.0
2   3     1.0     2.0
3   4     3.2     2.0

+3

Scott boston June 12. 17 at 12:48 am

source to share

piRSquared · Accepted Answer · 2017-06-12T02:37:12+0000

Using numpy

searchsorted

to define cells and bincount

to count them.
It should be very fast.

#         This defines the bin edges
#        [1, 2, 3] would have created
#               different bins
#                    v
b = np.searchsorted([2], df['values'].values)
df.assign(counts=np.bincount(b)[b])

   id  values  counts
0   1     2.1       2
1   2     0.8       2
2   3     1.0       2
3   4     3.2       2

np.searchsorted

determines where in the first array each element of the second array should be placed to preserve the sort.
- It means:
- 2.1
  
  must come after 2
  
  , which is the position 1
  
  .
- 0.8
  
  must go to 2
  
  , which is the position 0
  
  .
- 1.0
  
  must go to 2
  
  , which is the position 0
  
  .
- 3.2
  
  must come after 2
  
  , which is the position 1
  
  .
np.bincount

conveniently calculates the integral bin frequency ... like the ones we just created.
Cutting the highlighted bins by the appearance of bins, we get transform

ascount

Pandas: how to add bin values ​​to original dataframe

More articles:

Pandas: how to add bin values to original dataframe