Inequalities in pandas column
I have a pandas dataframe and I would like to create a new column based on an existing column and some inequalities. For example, let
df=pd.DataFrame({'a':[1,2,3,4,5,6,7],'b':[3,6,4,2,7,7,1]})
so df
looks like
a b
0 1 3
1 2 6
2 3 4
3 4 2
4 5 7
5 6 7
6 7 1
I would like to add a new column res
that is 0 if the corresponding value in is a
less than 2, 1 if the corresponding value in a
is is at least 2 and less than 4, and 2 otherwise. So I would like to receive
a b res
0 1 3 0
1 2 6 1
2 3 4 1
3 4 2 2
4 5 7 2
5 6 7 2
6 7 1 2
So far I have been doing it using apply
this way:
def f(x):
if x['a']<2:
return 0
elif x['a']>=2 and x['a']<4:
return 1
else:
return 2
df['res']=df.apply(f,axis=1)
but I was wondering if there is a more direct way or some specific pandas method that might allow me to do this.
source to share
searchsorted
Should give you better results. Similarly pd.cut
, you need to specify breakpoints.
pandas
pd.Series.searchsorted
df.assign(res=pd.Series([2, 4]).searchsorted(df.a, side='right'))
a b res
0 1 3 0
1 2 6 1
2 3 4 1
3 4 2 2
4 5 7 2
5 6 7 2
6 7 1 2
numpy
ndarray.searchsorted
df.assign(res=np.array([2, 4]).searchsorted(df.a.values, side='right'))
a b res
0 1 3 0
1 2 6 1
2 3 4 1
3 4 2 2
4 5 7 2
5 6 7 2
6 7 1 2
Timing
%timeit df.assign(res=pd.Series([2, 4]).searchsorted(df.a, side='right'))
%timeit df.assign(res=np.array([2, 4]).searchsorted(df.a.values, side='right'))
%timeit df.assign(res=pd.np.where(df.a < 2, 0, pd.np.where((df.a >= 2) & (df.a < 4), 1, 2)))
%timeit df.assign(res=pd.cut(df.a, [-np.inf,2,4,np.inf], labels=[0,1,2], right=False))
1000 loops, best of 3: 443 ยตs per loop
1000 loops, best of 3: 337 ยตs per loop
1000 loops, best of 3: 1.06 ms per loop
1000 loops, best of 3: 530 ยตs per loop
source to share