Pandas: filter by values โโin multiple columns
I'm trying to filter a dataframe based on values โโwithin multiple columns based on one condition, but keep other columns that I don't want to filter at all.
I've looked at these answers, with the third one being the closest, but still no luck:
- how do you filter pandas numeric frames across multiple columns
- Filtering multiple Pandas columns
- Python pandas - How to filter multiple columns by one value
Setting:
import pandas as pd
df = pd.DataFrame({
'month':[1,1,1,2,2],
'a':['A','A','A','A','NONE'],
'b':['B','B','B','B','B'],
'c':['C','C','C','NONE','NONE']
}, columns = ['month','a','b','c'])
l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)
Current output:
month a c
0 2 A NONE
1 2 NONE NONE
Desired output:
month a
0 2 A
1 2 NONE
I tried:
sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]
and many other options ( .all()
, [sub, :]
, ~df.loc[...]
, (axis = 0)
), but not all are successful.
Basically I want to delete any column (in a list sub
) that has all "NONE" values โโin it.
Any help is greatly appreciated.
source to share
First you want to replace 'NONE'
with np.nan
so that it is considered null dropna
. Then use loc
with your boolean rows and subset of columns. Then use dropna
with axis=1
andhow='all'
df.replace('NONE', np.nan) \
.loc[df.month == df.month.max(), l].dropna(axis=1, how='all')
month a
3 2 A
4 2 NONE
source to share