Python pandas column is conditioned by two other column values

Is there a way in python pandas to apply a conditional value if a particular column has a value?

For one column, I know I can use the following code to apply a test flag if the word "test" is specified in the column header.

df['Test_Flag'] = np.where(df['Title'].str.contains("test|Test"), 'Y', '')

      

But if I would like to say that if the column heading or column subtitle includes the word "test", add a test flag, how can I do that?

It clearly didn't work

df['Test_Flag'] = np.where(df['Title'|'Subtitle'].str.contains("test|Test"), 'Y', '')

      

+3


source to share


3 answers


pattern = "test|Test"
match = df['Title'].str.contains(pattern) | df['Subtitle'].str.contains(pattern)
df['Test_Flag'] = np.where(match, 'Y', '')

      



+3


source


If many columns are then simpler then create a subset of df[['Title', 'Subtitle']]

and apply

contains

because it only works with Series

and check at least one True

per row any

:

mask = df[['Title', 'Subtitle']].apply(lambda x: x.str.contains("test|Test")).any(axis=1)
df['Test_Flag'] = np.where(mask,'Y', '')

      



Example:

df = pd.DataFrame({'Title':['test','Test','e', 'a'], 'Subtitle':['b','a','Test', 'a']})
mask = df[['Title', 'Subtitle']].apply(lambda x: x.str.contains("test|Test")).any(axis=1)
df['Test_Flag'] = np.where(mask,'Y', '')
print (df)
  Subtitle Title Test_Flag
0        b  test         Y
1        a  Test         Y
2     Test     e         Y
3        a     a          

      

+3


source


Using @jezrael setting

df = pd.DataFrame(
    {'Title':['test','Test','e', 'a'],
     'Subtitle':['b','a','Test', 'a']})

      

pandas

you can stack

+ str.contains

+unstack

import re

df.stack().str.contains('test', flags=re.IGNORECASE).unstack()

  Subtitle  Title
0    False   True
1    False   True
2     True  False
3    False  False

      

Bring it all along with

truth_map = {True: 'Y', False: ''}
truth_flag = df.stack().str.contains(
    'test', flags=re.IGNORECASE).unstack().any(1).map(truth_map)
df.assign(Test_flag=truth_flag)

  Subtitle Title Test_flag
0        b  test         Y
1        a  Test         Y
2     Test     e         Y
3        a     a        

      

numpy

If performance is an issue

v = df.values.astype(str)
low = np.core.defchararray.lower(v)
flg = np.core.defchararray.find(low, 'test') >= 0
ys = np.where(flg.any(1), 'Y', '')
df.assign(Test_flag=ys)

  Subtitle Title Test_flag
0        b  test         Y
1        a  Test         Y
2     Test     e         Y
3        a     a          

      


naive time test

enter image description here

+2


source







All Articles