Deleting rows in panda where there are less than 3 non-null values

Question

Deleting rows in panda where there are less than 3 non-null values

Hi guys I want to remove rows containing less than 3 non-null values (not including the shared column) from my panda DataFrame.

So at the moment I have.

    year    2001 2002 2003 2004 2005 2006 2007 TOTAL
    player  
    Emma    0     0     0    0    3    4    5    12
    Max     3     5     0    0    0    0    0    8
    Josh    1     2     4    1    2    1    0    11
    Steve   0     0     0    0    3    0    0    3
    Mike    1     0     0    0    0    0    2    3

But I want:

    year    2001 2002 2003 2004 2005 2006 2007 TOTAL
    player  
    Emma    0     0     0    0    3    4    5    12
    Josh    1     2     4    1    2    1    0    11

I was thinking about using a for loop, but I'm not sure how to implement it / if this is the best way to solve my problem.

+3

python-3.x pandas

aoshea May 11 '17 at 3:29

source to share

2 answers

piRSquared · Answer 1 · 2017-05-11T03:33:32+0000

pandas

I drop

TOTAl

and the sum

number of nonzero elements for each row

df[df.drop('TOTAL', 1).ne(0).sum(1).gt(2)]

year    2001  2002  2003  2004  2005  2006  2007  TOTAL
player                                                 
Emma       0     0     0     0     3     4     5     12
Josh       1     2     4     1     2     1     0     11

numpy

faster solution

v = df.values
m = (v[:, :-1] != 0).sum(1) > 2
pd.DataFrame(v[m], df.index[m], df.columns)

year    2001  2002  2003  2004  2005  2006  2007  TOTAL
player                                                 
Emma       0     0     0     0     3     4     5     12
Josh       1     2     4     1     2     1     0     11

Allen · Answer 2 · 2017-05-11T03:36:50+0000

Setting up

df = pd.DataFrame({'2001': {'Emma': 0, 'Josh': 1, 'Max': 3, 'Mike': 1, 'Steve': 0},
 '2002': {'Emma': 0, 'Josh': 2, 'Max': 5, 'Mike': 0, 'Steve': 0},
 '2003': {'Emma': 0, 'Josh': 4, 'Max': 0, 'Mike': 0, 'Steve': 0},
 '2004': {'Emma': 0, 'Josh': 1, 'Max': 0, 'Mike': 0, 'Steve': 0},
 '2005': {'Emma': 3, 'Josh': 2, 'Max': 0, 'Mike': 0, 'Steve': 3},
 '2006': {'Emma': 4, 'Josh': 1, 'Max': 0, 'Mike': 0, 'Steve': 0},
 '2007': {'Emma': 5, 'Josh': 0, 'Max': 0, 'Mike': 2, 'Steve': 0},
 'TOTAL': {'Emma': 12, 'Josh': 11, 'Max': 8, 'Mike': 3, 'Steve': 3}})

Decision

df.loc[np.sum(df.iloc[:,:-1]>0, axis=1)[lambda x: x>=3].index]
    Out[889]: 
      2001  2002  2003  2004  2005  2006  2007  TOTAL
Emma     0     0     0     0     3     4     5     12
Josh     1     2     4     1     2     1     0     11

Alternatively use groupby and filter:

df.groupby(level=0).filter(lambda x: np.sum(x.iloc[0,:]>0)>=4)
Out[918]: 
      2001  2002  2003  2004  2005  2006  2007  TOTAL
Emma     0     0     0     0     3     4     5     12
Josh     1     2     4     1     2     1     0     11

Deleting rows in panda where there are less than 3 non-null values

More articles: