Conditional If Statement: If the value in the row contains a string ... sets another column equal to the string
EDIT MADE:
I have an "Activity" column filled with rows and I want to get the values ββin the "Activity_2" column using an if statement.
So Activity_2 is showing the desired output. Basically, I want to name what kind of activity is going on.
I tried to do it using my code below, but it won't start (see screenshot below for error). Any help is greatly appreciated!
for i in df2['Activity']:
if i contains 'email':
df2['Activity_2'] = 'email'
elif i contains 'conference'
df2['Activity_2'] = 'conference'
elif i contains 'call'
df2['Activity_2'] = 'call'
else:
df2['Activity_2'] = 'task'
Error: if i contains 'email':
^
SyntaxError: invalid syntax
source to share
I assume you are using pandas
, then you can use numpy.where
, which is a vectorized version of the if / else, with a condition built with str.contains
:
df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
pd.np.where(df.Activity.str.contains("conference"), "conference",
pd.np.where(df.Activity.str.contains("call"), "call", "task")))
df
# Activity Activity_2
#0 email personA email
#1 attend conference conference
#2 send email email
#3 call Sam call
#4 random text task
#5 random text task
#6 lwantto call call
source to share
The current solution misbehaves if your df contains NaN values. In this case, I recommend using the following code which worked for me
temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
pd.np.where(temp.str.contains("email"), "email",
pd.np.where(temp.str.contains("conference"), "conference",
pd.np.where(temp.str.contains("call"), "call", "task"))))
source to share