How to replace column value with range in pandas dataframe
I have a data frame called 'df' and I want to replace the values ββin a range of columns in the data frame with the corresponding value in another column.
-
6 <= age <11 then 1
11 <= age <16, then 2
16 <= age <21 then 3
21 <= age then 4
age 86508 12.0 86509 6.0 86510 7.0 86511 8.0 86512 10.0 86513 15.0 86514 15.0 86515 16.0 86516 20.0 86517 23.0 86518 23.0 86519 7.0 86520 18.0
results
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 2
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
Thank.
+3
source to share
2 answers
Use pd.cut () :
In [37]: df['stage'] = pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4])
In [38]: df
Out[38]:
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 2
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
or a more general solution provided by @ayhan :
In [39]: df['stage'] = pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1
In [40]: df
Out[40]:
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 2
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
+5
source to share
Using np.searchsorted
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
age stage
86508 12.0 2
86509 6.0 1
86510 7.0 1
86511 8.0 1
86512 10.0 1
86513 15.0 2
86514 15.0 2
86515 16.0 3
86516 20.0 3
86517 23.0 4
86518 23.0 4
86519 7.0 1
86520 18.0 3
Timing
small data
%%timeit
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
1000 loops, best of 3: 288 Β΅s per loop
%%timeit
df.assign(stage=pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4]))
1000 loops, best of 3: 668 Β΅s per loop
+4
source to share