Pandas: filter by values โ€‹โ€‹in multiple columns

I'm trying to filter a dataframe based on values โ€‹โ€‹within multiple columns based on one condition, but keep other columns that I don't want to filter at all.

I've looked at these answers, with the third one being the closest, but still no luck:

Setting:

import pandas as pd

df = pd.DataFrame({
        'month':[1,1,1,2,2],
        'a':['A','A','A','A','NONE'],
        'b':['B','B','B','B','B'],
        'c':['C','C','C','NONE','NONE']
    }, columns = ['month','a','b','c'])

l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)

      

Current output:

   month     a     c
0      2     A  NONE
1      2  NONE  NONE

      

Desired output:

   month     a
0      2     A
1      2  NONE

      

I tried:

sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]

      

and many other options ( .all()

, [sub, :]

, ~df.loc[...]

, (axis = 0)

), but not all are successful.

Basically I want to delete any column (in a list sub

) that has all "NONE" values โ€‹โ€‹in it.

Any help is greatly appreciated.

+3


source to share


1 answer


First you want to replace 'NONE'

with np.nan

so that it is considered null dropna

. Then use loc

with your boolean rows and subset of columns. Then use dropna

with axis=1

andhow='all'



df.replace('NONE', np.nan) \
    .loc[df.month == df.month.max(), l].dropna(axis=1, how='all')

   month     a
3      2     A
4      2  NONE

      

+3


source







All Articles