Applying a specific function to replace a column value based on criteria from another column in a data frame
Here's what I want to do:
Dataframe before:
name value apply_f
0 SEBASTIEN 9 false
1 JOHN 4 false
2 JENNY np.inf true
Apply function f: len(df['name']) to columns 'value' only if columns 'apply_f' == True
Dataframe after:
name value apply_f
0 SEBASTIEN 9 False
1 JOHN 4 False
2 JENNY 5 True
Here I have:
from pandas import *
from numpy import *
df = DataFrame( { "name": ['SEBASTIEN', 'JOHN', 'JENNY'] ,
"value": [9, 4, np.inf] ,
"apply_f": [False,False,True]} )
def f(x):
return len(x)
df['value'] = df[df['apply_f'] == True]['name'].apply(f)
but the result is not what I expected:
apply_f name value
0 False SEBASTIEN NaN
1 False JOHN NaN
2 True JENNY 5
Column replaces initial values with NaN
source to share
The reason it gets overwritten is because the indexing on the left side is by default the same as the full dataframe, if you apply the mask to the left hand using also loc
, then it only affects the lines where the condition is: / p>
In [272]:
df.loc[df['apply_f'] == True, 'value'] = df[df['apply_f'] == True]['name'].apply(lambda row: f(row))
df
Out[272]:
apply_f name value
0 False SEBASTIEN 9
1 False JOHN 4
2 True JENNY 5
The usage loc
in the above example is that let's say I used the same boolean mask semantics, which may or may not work, and caused an error in recent versions of pandas:
In[274]:
df[df['apply_f'] == True]['value'] = df[df['apply_f'] == True]['name'].apply(lambda row: f(row))
df
-c:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Out[274]:
apply_f name value
0 False SEBASTIEN 9.000000
1 False JOHN 4.000000
2 True JENNY inf
For what you are doing it would be more concise and readable to use numpy where
:
In [279]:
df['value'] = np.where(df['apply_f']==True, len(df['name']), df['value'])
df
Out[279]:
apply_f name value
0 False SEBASTIEN 9
1 False JOHN 4
2 True JENNY 3
I understand your example is to demonstrate the problem, but you can also use where
for certain situations.
source to share