How to select rows that do not consist of only NaN and 0s values

Question

How to select rows that do not consist of only NaN and 0s values

This is my data file:

cols = ['Country', 'Year', 'Orange', 'Apple', 'Plump']

data = [['US', 2008, 17, 29, 19],
        ['US', 2009, 11, 12, 16],
        ['US', 2010, 14, 16, 38],
        ['Spain', 2008, 11, None, 33],
        ['Spain', 2009, 12, 19, 17],
        ['France', 2008, 17, 19, 21],
        ['France', 2009, 19, 22, 13],
        ['France', 2010, 12, 11, 0],
        ['France', 2010, 0, 0, 0],
        ['Italy', 2009, None, None, None],
        ['Italy', 2010, 15, 16, 17],
        ['Italy', 2010, 0, None, None],
        ['Italy', 2011, 42, None, None]]

I want to select rows where the orange apple and plumes are not composed of only "No", only 0, or a combination of both. Thus, the final result should be:

   Country  Year  Orange  Apple  Plump  
0       US  2008    17.0   29.0   19.0  
1       US  2009    11.0   12.0   16.0  
2       US  2010    14.0   16.0   38.0  
3    Spain  2008    11.0    NaN   33.0  
4    Spain  2009    12.0   19.0   17.0  
5   France  2008    17.0   19.0   21.0 
6   France  2009    19.0   22.0   13.0  
7   France  2010    12.0   11.0    0.0  
10   Italy  2010    15.0   16.0   17.0  
12   Italy  2011    42.0    NaN    NaN

Secondly, I want to abandon countries for which I have not had observations for all three years. Therefore, the final result should consist only of us and France. How could I get them? I've tried something like:

df = df[(df['Orange'].notnull())| \
            (df['Apple'].notnull()) | (df['Plump'].notnull()) | (df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0)]

Also I tried:

df = df[((df['Orange'].notnull())| \
                (df['Apple'].notnull()) | (df['Plump'].notnull())) & ((df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0))]

+3

python pandas

edyvedy13 Apr 27. 17 at 22:36

source to share

2 answers

None of the values will be read as NaN, so you can replace the 0s and convert them as NaN. After that, you can do what MaxU suggested. It will be something like:

In: df = df.replace(0,np.nan)
    df = df[df[['Orange','Apple','Plump']].notnull().any(1)]
Out:
   Country  Year  Orange  Apple  Plump

0       US  2008      17     29     19
1       US  2009      11     12     16
2       US  2010      14     16     38
3    Spain  2008      11    NaN     33
4    Spain  2009      12     19     17
5   France  2008      17     19     21
6   France  2009      19     22     13
7   France  2010      12     11    NaN
10   Italy  2010      15     16     17
12   Italy  2011      42    NaN    NaN

For your second question, I understand that in this case you want to get rid of the countries for which you do not have observations for 2008,2009,2010. To do this, you can do something like:

countries = []
for group,values in enumerate(df.groupby('Country')):
    lista = values[1].Year.unique() == [2008,2009,2010]
    if (np.all(lista)):
        countries.append(values[0])
df = df[df.Country.isin(countries)]

Which will give something like:

  Country  Year  Orange  Apple  Plump
0      US  2008      17     29     19
1      US  2009      11     12     16
2      US  2010      14     16     38
5  France  2008      17     19     21
6  France  2009      19     22     13
7  France  2010      12     11    NaN
8  France  2010     NaN    NaN    NaN

Finally, you can apply both solutions at the same time:

df[df[['Orange','Apple','Plump']].notnull().any(1) & df.Country.isin(countries)])

Receiving:

  Country  Year  Orange  Apple  Plump
0      US  2008      17     29     19
1      US  2009      11     12     16
2      US  2010      14     16     38
5  France  2008      17     19     21
6  France  2009      19     22     13   
7  France  2010      12     11    NaN

+1

VictorGGl Apr 27. 17 at 23:33

source to share

MaxU · Accepted Answer · 2017-04-27T23:07:12+0000

In [307]: df[~df[['Orange','Apple','Plump']].fillna(0).eq(0).all(1)]
Out[307]:
   Country  Year  Orange  Apple  Plump
0       US  2008    17.0   29.0   19.0
1       US  2009    11.0   12.0   16.0
2       US  2010    14.0   16.0   38.0
3    Spain  2008    11.0    NaN   33.0
4    Spain  2009    12.0   19.0   17.0
5   France  2008    17.0   19.0   21.0
6   France  2009    19.0   22.0   13.0
7   France  2010    12.0   11.0    0.0
10   Italy  2010    15.0   16.0   17.0
12   Italy  2011    42.0    NaN    NaN

How to select rows that do not consist of only NaN and 0s values

More articles: