In Python, is there a direct way to filter the pd.dataframe parameter by 2 ranges of column values?
I got a simple dataframe:
df
Out[102]:
0 1
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
5 cl_1 0.22795
6 cl_3 0.26985
7 es_3 0.37955
8 es_2 0.39450
9 zb_3 0.42170
10 es_1 0.45170
11 nfp_es 0.47190
12 zb_2 0.50130
13 nfp_cl 0.53170
14 nfp_gc 0.74260
15 gc_2 0.76640
16 gc_3 0.80915
17 zb_1 0.83010
18 gc_1 0.89795
All I am trying to do is select values that are above threshold a and less than threshold b, where the two value ranges are NON OVERLAPPING. Imagine (over 85% and under 15%). Obvioulsy both terms are independent. So I do it like this:
def filter(df):
df['filter'] = ""
df.loc[df[1] > 0.85, 'filter'] = 1
df.loc[df[1] < 0.15, 'filter'] = 1
df = df[df['filter'] == 1]
del df['filter']
return df
And I am getting the correct answer:
filter(df)
Out[104]:
0 1
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
18 gc_1 0.89795
However, I would like to know if there is a direct way to do this without creating a custom formula. Perhaps using groupby ....
thanks for the help
source to share
You can simply put all the conditions in an accessory .loc
separated by an operator, or:
df.loc[(df['1'] > 0.85) | (df['1'] < 0.15), :]
Out[19]:
0 1
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
18 gc_1 0.89795
The suggestions people provided in the other answers should work equally well, you just need to flip the inequality and use or instead.
source to share
You can try df.query which was added in pandas v0.13
import pandas as pd
df = pd.read_clipboard()
df
A B
0 nfp_zb 0.04325
1 ftb_zb 0.05645
2 ftb_cl 0.09055
3 cl_2 0.12865
4 ftb_gc 0.13385
5 cl_1 0.22795
6 cl_3 0.26985
7 es_3 0.37955
8 es_2 0.39450
9 zb_3 0.42170
10 es_1 0.45170
11 nfp_es 0.47190
12 zb_2 0.50130
13 nfp_cl 0.53170
14 nfp_gc 0.74260
15 gc_2 0.76640
16 gc_3 0.80915
17 zb_1 0.83010
18 gc_1 0.89795
df.query('B > 0.85 or B < 0.15')
source to share