How to select rows that do not consist of only NaN and 0s values
This is my data file:
cols = ['Country', 'Year', 'Orange', 'Apple', 'Plump']
data = [['US', 2008, 17, 29, 19],
['US', 2009, 11, 12, 16],
['US', 2010, 14, 16, 38],
['Spain', 2008, 11, None, 33],
['Spain', 2009, 12, 19, 17],
['France', 2008, 17, 19, 21],
['France', 2009, 19, 22, 13],
['France', 2010, 12, 11, 0],
['France', 2010, 0, 0, 0],
['Italy', 2009, None, None, None],
['Italy', 2010, 15, 16, 17],
['Italy', 2010, 0, None, None],
['Italy', 2011, 42, None, None]]
I want to select rows where the orange apple and plumes are not composed of only "No", only 0, or a combination of both. Thus, the final result should be:
Country Year Orange Apple Plump
0 US 2008 17.0 29.0 19.0
1 US 2009 11.0 12.0 16.0
2 US 2010 14.0 16.0 38.0
3 Spain 2008 11.0 NaN 33.0
4 Spain 2009 12.0 19.0 17.0
5 France 2008 17.0 19.0 21.0
6 France 2009 19.0 22.0 13.0
7 France 2010 12.0 11.0 0.0
10 Italy 2010 15.0 16.0 17.0
12 Italy 2011 42.0 NaN NaN
Secondly, I want to abandon countries for which I have not had observations for all three years. Therefore, the final result should consist only of us and France. How could I get them? I've tried something like:
df = df[(df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull()) | (df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0)]
Also I tried:
df = df[((df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull())) & ((df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0))]
source to share
In [307]: df[~df[['Orange','Apple','Plump']].fillna(0).eq(0).all(1)]
Out[307]:
Country Year Orange Apple Plump
0 US 2008 17.0 29.0 19.0
1 US 2009 11.0 12.0 16.0
2 US 2010 14.0 16.0 38.0
3 Spain 2008 11.0 NaN 33.0
4 Spain 2009 12.0 19.0 17.0
5 France 2008 17.0 19.0 21.0
6 France 2009 19.0 22.0 13.0
7 France 2010 12.0 11.0 0.0
10 Italy 2010 15.0 16.0 17.0
12 Italy 2011 42.0 NaN NaN
source to share
None of the values will be read as NaN, so you can replace the 0s and convert them as NaN. After that, you can do what MaxU suggested. It will be something like:
In: df = df.replace(0,np.nan)
df = df[df[['Orange','Apple','Plump']].notnull().any(1)]
Out:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
3 Spain 2008 11 NaN 33
4 Spain 2009 12 19 17
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
10 Italy 2010 15 16 17
12 Italy 2011 42 NaN NaN
For your second question, I understand that in this case you want to get rid of the countries for which you do not have observations for 2008,2009,2010. To do this, you can do something like:
countries = []
for group,values in enumerate(df.groupby('Country')):
lista = values[1].Year.unique() == [2008,2009,2010]
if (np.all(lista)):
countries.append(values[0])
df = df[df.Country.isin(countries)]
Which will give something like:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
8 France 2010 NaN NaN NaN
Finally, you can apply both solutions at the same time:
df[df[['Orange','Apple','Plump']].notnull().any(1) & df.Country.isin(countries)])
Receiving:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
source to share