Pandas: removing all columns with nans, 0 and NA from DataFrame

I have a DataFrame that looks like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                'B': [0, np.nan, np.nan, 0, 0, 0],
                'C': [0, 0, 0, 0, 0, 0.0],
                'D': [5, 5, 5, 5, 5.6, 6.8],
                'E': ['NA', 'NA', 'NA', 'NA', 'NA', 'NA'],})

      

How do I remove all columns NA

, Nans

and 0

in columns to get the following result?

df2 = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                'D': [5, 5, 5, 5, 5.6, 6.8],})

      

So far I know I will .dropna()

get rid of everyone nan

and I tried df2=df[~(df==0).all(axis=1)]

it and it didn't work.

+3


source to share


2 answers


>>> df
     A   B  C    D   E
0  1.0   0  0  5.0  NA
1  2.1 NaN  0  5.0  NA
2  NaN NaN  0  5.0  NA
3  4.7   0  0  5.0  NA
4  5.6   0  0  5.6  NA
5  6.8   0  0  6.8  NA
>>> f = df.replace([0,'NA'], np.nan).apply(lambda x: any(~x.isnull()))
>>> f
A     True
B    False
C    False
D     True
E    False
dtype: bool
>>> df.loc[:,f]
     A    D
0  1.0  5.0
1  2.1  5.0
2  NaN  5.0
3  4.7  5.0
4  5.6  5.6
5  6.8  6.8

      



+1


source


You can try using df.isin()

and all()

to find an array of columns that do not only contain null values, and then use that array to select the corresponding columns df

:

>>> df[df.columns[(~df.isin([NaN, 'NA', 0])).all().values]]
     A    D
0  1.0  5.0
1  2.1  5.0
2  NaN  5.0
3  4.7  5.0
4  5.6  5.6
5  6.8  6.8

      



Or more succinctly: df.loc[:, (~df.isin([NaN, 'NA', 0])).all()]

+1


source







All Articles