In Python, is there a direct way to filter the pd.dataframe parameter by 2 ranges of column values?

I got a simple dataframe:

df
Out[102]: 
         0        1
0   nfp_zb  0.04325
1   ftb_zb  0.05645
2   ftb_cl  0.09055
3     cl_2  0.12865
4   ftb_gc  0.13385
5     cl_1  0.22795
6     cl_3  0.26985
7     es_3  0.37955
8     es_2  0.39450
9     zb_3  0.42170
10    es_1  0.45170
11  nfp_es  0.47190
12    zb_2  0.50130
13  nfp_cl  0.53170
14  nfp_gc  0.74260
15    gc_2  0.76640
16    gc_3  0.80915
17    zb_1  0.83010
18    gc_1  0.89795

      

All I am trying to do is select values ​​that are above threshold a and less than threshold b, where the two value ranges are NON OVERLAPPING. Imagine (over 85% and under 15%). Obvioulsy both terms are independent. So I do it like this:

def filter(df):
    df['filter'] = ""
    df.loc[df[1] > 0.85, 'filter'] = 1
    df.loc[df[1] < 0.15, 'filter'] = 1
    df = df[df['filter'] == 1]
    del df['filter']
    return df

      

And I am getting the correct answer:

filter(df)
Out[104]: 
         0        1 
0   nfp_zb  0.04325       
1   ftb_zb  0.05645      
2   ftb_cl  0.09055      
3     cl_2  0.12865      
4   ftb_gc  0.13385      
18    gc_1  0.89795   

      

However, I would like to know if there is a direct way to do this without creating a custom formula. Perhaps using groupby ....

thanks for the help

+3


source to share


3 answers


You can simply put all the conditions in an accessory .loc

separated by an operator, or:

df.loc[(df['1'] > 0.85) | (df['1'] < 0.15), :]
Out[19]: 
         0        1
0   nfp_zb  0.04325
1   ftb_zb  0.05645
2   ftb_cl  0.09055
3     cl_2  0.12865
4   ftb_gc  0.13385
18    gc_1  0.89795

      



The suggestions people provided in the other answers should work equally well, you just need to flip the inequality and use or instead.

+3


source


You probably want to use logical masking.



mask1 = df['1'] > .85
mask2 = df['1'] < .15

filtered = df[mask1 | mask2]

      

+3


source


You can try df.query which was added in pandas v0.13

import pandas as pd
df = pd.read_clipboard()
df

         A        B
0   nfp_zb  0.04325
1   ftb_zb  0.05645
2   ftb_cl  0.09055
3     cl_2  0.12865
4   ftb_gc  0.13385
5     cl_1  0.22795
6     cl_3  0.26985
7     es_3  0.37955
8     es_2  0.39450
9     zb_3  0.42170
10    es_1  0.45170
11  nfp_es  0.47190
12    zb_2  0.50130
13  nfp_cl  0.53170
14  nfp_gc  0.74260
15    gc_2  0.76640
16    gc_3  0.80915
17    zb_1  0.83010
18    gc_1  0.89795

df.query('B > 0.85 or B < 0.15')

      

+3


source







All Articles