Applying a specific function to replace a column value based on criteria from another column in a data frame

Here's what I want to do:

Dataframe before: 
       name         value    apply_f
0      SEBASTIEN    9        false
1      JOHN         4        false
2      JENNY        np.inf   true

Apply function f: len(df['name']) to columns 'value' only if columns 'apply_f' == True 

Dataframe after: 
       name       value    apply_f
0      SEBASTIEN  9        False
1      JOHN       4        False
2      JENNY      5        True

      

Here I have:

from pandas import *
from numpy import *

df = DataFrame( { "name":  ['SEBASTIEN', 'JOHN', 'JENNY'] , 
                  "value": [9, 4, np.inf] , 
                  "apply_f":  [False,False,True]} )

def f(x):
    return len(x)

df['value'] = df[df['apply_f'] == True]['name'].apply(f)

      

but the result is not what I expected:

    apply_f    name         value
0   False      SEBASTIEN    NaN
1   False      JOHN         NaN
2    True      JENNY        5

      

Column replaces initial values ​​with NaN

+3


source to share


1 answer


The reason it gets overwritten is because the indexing on the left side is by default the same as the full dataframe, if you apply the mask to the left hand using also loc

, then it only affects the lines where the condition is: / p>

In [272]:

df.loc[df['apply_f'] == True, 'value'] = df[df['apply_f'] == True]['name'].apply(lambda row: f(row))
df
Out[272]:
  apply_f       name  value
0   False  SEBASTIEN      9
1   False       JOHN      4
2    True      JENNY      5

      

The usage loc

in the above example is that let's say I used the same boolean mask semantics, which may or may not work, and caused an error in recent versions of pandas:

In[274]:
df[df['apply_f'] == True]['value'] = df[df['apply_f'] == True]['name'].apply(lambda row: f(row))
df
-c:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Out[274]:
  apply_f       name     value
0   False  SEBASTIEN  9.000000
1   False       JOHN  4.000000
2    True      JENNY       inf

      



For what you are doing it would be more concise and readable to use numpy where

:

In [279]:

df['value'] = np.where(df['apply_f']==True, len(df['name']), df['value'])
df
Out[279]:
  apply_f       name  value
0   False  SEBASTIEN      9
1   False       JOHN      4
2    True      JENNY      3

      

I understand your example is to demonstrate the problem, but you can also use where

for certain situations.

+4


source







All Articles