Pandas: selecting rows based on multiple object values ​​within a column

I have a pandas dataframe in which one of the columns contains user information. Each entry in this column is a list, which in turn contains dictionaries of user information. As well as the following:

                                                USER                      id  
1  [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}]  634618   
2  [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}]  642054   
3  [{u'STATUS': u'ACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}]  631426    

      

I only want to select rows where STATUS is ACTV and NAME is abc. How to select rows with nested data. So in the above df only row 3 will be selected

+2


source to share


3 answers


You can loop through the USER column with apply

, and then check if any of the dictionary satisfies a condition that makes a boolean series for a subset:



df[df.USER.apply(lambda lst: any(d['NAME']=='abc' and d['STATUS']=='ACTV' for d in lst))]

#                                                USER      id
#3  [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':...  631426

      

+3


source


We can unpack your column df.USER

into pd.Panel

and find rows that way. Lots of overhead. Not worth it! But cool ... maybe. I'll try again.



pn = pd.Panel({k: pd.DataFrame(v) for k, v in df.USER.iteritems()})
cond1 = pn.loc[:, :, 'STATUS'] == 'ACTV'
cond2 = pn.loc[:, :, 'NAME'] == 'abc'

df.loc[pn.loc[(cond1 & cond2).any(), :, :].items]

                                                USER      id
2  [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':...  631426

      

+3


source


You can filter your datafile using string comparison:

df[(df['USER'].str.contains('\'STATUS\': u\'ACTV\'')) & (df['USER'].str.contains('\'NAME\': u\'abc\''))]

      

+1


source







All Articles