Pandas: selecting rows based on multiple object values within a column
I have a pandas dataframe in which one of the columns contains user information. Each entry in this column is a list, which in turn contains dictionaries of user information. As well as the following:
USER id
1 [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}] 634618
2 [{u'STATUS': u'INACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}] 642054
3 [{u'STATUS': u'ACTV', u'NAME': 'abc'},{u'STATUS': u'ACTV', u'NAME': 'xyz'}] 631426
I only want to select rows where STATUS is ACTV and NAME is abc. How to select rows with nested data. So in the above df only row 3 will be selected
source to share
You can loop through the USER column with apply
, and then check if any of the dictionary satisfies a condition that makes a boolean series for a subset:
df[df.USER.apply(lambda lst: any(d['NAME']=='abc' and d['STATUS']=='ACTV' for d in lst))]
# USER id
#3 [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':... 631426
source to share
We can unpack your column df.USER
into pd.Panel
and find rows that way. Lots of overhead. Not worth it! But cool ... maybe. I'll try again.
pn = pd.Panel({k: pd.DataFrame(v) for k, v in df.USER.iteritems()})
cond1 = pn.loc[:, :, 'STATUS'] == 'ACTV'
cond2 = pn.loc[:, :, 'NAME'] == 'abc'
df.loc[pn.loc[(cond1 & cond2).any(), :, :].items]
USER id
2 [{'STATUS': 'ACTV', 'NAME': 'abc'}, {'STATUS':... 631426
source to share