Conditional If Statement: If the value in the row contains a string ... sets another column equal to the string

EDIT MADE:

I have an "Activity" column filled with rows and I want to get the values ​​in the "Activity_2" column using an if statement.

So Activity_2 is showing the desired output. Basically, I want to name what kind of activity is going on.

I tried to do it using my code below, but it won't start (see screenshot below for error). Any help is greatly appreciated!

enter image description here

    for i in df2['Activity']:
        if i contains 'email':
            df2['Activity_2'] = 'email'
        elif i contains 'conference'
            df2['Activity_2'] = 'conference'
        elif i contains 'call'
            df2['Activity_2'] = 'call'
        else:
            df2['Activity_2'] = 'task'


Error: if i contains 'email':
                ^
SyntaxError: invalid syntax

      

+8


source to share


4 answers


I assume you are using pandas

, then you can use numpy.where

, which is a vectorized version of the if / else, with a condition built with str.contains

:



df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
                   pd.np.where(df.Activity.str.contains("conference"), "conference",
                   pd.np.where(df.Activity.str.contains("call"), "call", "task")))

df

#   Activity            Activity_2
#0  email personA       email
#1  attend conference   conference
#2  send email          email
#3  call Sam            call
#4  random text         task
#5  random text         task
#6  lwantto call        call

      

+14


source


This also works:



df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'

      

+4


source


you have invalid syntax for checking strings.

try using

 for i in df2['Activity']:
        if 'email' in i :
            df2['Activity_2'] = 'email'

      

0


source


The current solution misbehaves if your df contains NaN values. In this case, I recommend using the following code which worked for me

temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
                   pd.np.where(temp.str.contains("email"), "email",
                   pd.np.where(temp.str.contains("conference"), "conference",
                   pd.np.where(temp.str.contains("call"), "call", "task"))))

      

0


source







All Articles