Deleting rows in panda where there are less than 3 non-null values
Hi guys I want to remove rows containing less than 3 non-null values (not including the shared column) from my panda DataFrame.
So at the moment I have.
year 2001 2002 2003 2004 2005 2006 2007 TOTAL
player
Emma 0 0 0 0 3 4 5 12
Max 3 5 0 0 0 0 0 8
Josh 1 2 4 1 2 1 0 11
Steve 0 0 0 0 3 0 0 3
Mike 1 0 0 0 0 0 2 3
But I want:
year 2001 2002 2003 2004 2005 2006 2007 TOTAL
player
Emma 0 0 0 0 3 4 5 12
Josh 1 2 4 1 2 1 0 11
I was thinking about using a for loop, but I'm not sure how to implement it / if this is the best way to solve my problem.
+3
aoshea
source
to share
2 answers
pandas
I drop
TOTAl
and the sum
number of nonzero elements for each row
df[df.drop('TOTAL', 1).ne(0).sum(1).gt(2)]
year 2001 2002 2003 2004 2005 2006 2007 TOTAL
player
Emma 0 0 0 0 3 4 5 12
Josh 1 2 4 1 2 1 0 11
numpy
faster solution
v = df.values
m = (v[:, :-1] != 0).sum(1) > 2
pd.DataFrame(v[m], df.index[m], df.columns)
year 2001 2002 2003 2004 2005 2006 2007 TOTAL
player
Emma 0 0 0 0 3 4 5 12
Josh 1 2 4 1 2 1 0 11
+2
piRSquared
source
to share
Setting up
df = pd.DataFrame({'2001': {'Emma': 0, 'Josh': 1, 'Max': 3, 'Mike': 1, 'Steve': 0},
'2002': {'Emma': 0, 'Josh': 2, 'Max': 5, 'Mike': 0, 'Steve': 0},
'2003': {'Emma': 0, 'Josh': 4, 'Max': 0, 'Mike': 0, 'Steve': 0},
'2004': {'Emma': 0, 'Josh': 1, 'Max': 0, 'Mike': 0, 'Steve': 0},
'2005': {'Emma': 3, 'Josh': 2, 'Max': 0, 'Mike': 0, 'Steve': 3},
'2006': {'Emma': 4, 'Josh': 1, 'Max': 0, 'Mike': 0, 'Steve': 0},
'2007': {'Emma': 5, 'Josh': 0, 'Max': 0, 'Mike': 2, 'Steve': 0},
'TOTAL': {'Emma': 12, 'Josh': 11, 'Max': 8, 'Mike': 3, 'Steve': 3}})
Decision
df.loc[np.sum(df.iloc[:,:-1]>0, axis=1)[lambda x: x>=3].index]
Out[889]:
2001 2002 2003 2004 2005 2006 2007 TOTAL
Emma 0 0 0 0 3 4 5 12
Josh 1 2 4 1 2 1 0 11
Alternatively use groupby and filter:
df.groupby(level=0).filter(lambda x: np.sum(x.iloc[0,:]>0)>=4)
Out[918]:
2001 2002 2003 2004 2005 2006 2007 TOTAL
Emma 0 0 0 0 3 4 5 12
Josh 1 2 4 1 2 1 0 11
0
Allen
source
to share