Boolean Matching Boolean Logic Dataframe
I have created a pandas dataframe and would like to filter data based on some logical logic. Basically what I would like to do is closer to the excels index comparison function than simple filtering. I researched many other threads.
-
When I apply my filter, the dataframe returns zero true values. Why are null true values ββbeing returned when I was flexible with my logic? and;
-
If I entered the 5th column, say column
'D'
, crandom.randomint(100-1000,100)
, what logic would I use to conditionally find the maximum values ββfor just the columnD
? That is, Can I force the dataframe to return the highest true values ββfrom a specific column only, in the case of multiple true values ββbeing returned?
The council appreciated. Thank you in advance.
import pandas as pd
df = pd.DataFrame({
'Step': [1,1,1,1,1,1,2,2,2,2,2,2],
'A': [4,5,6,7,4,5,6,7,4,5,6,7],
'B': [10,20,30,40,10,20,30,40,10,20,30,40],
'C': [0,0.5,1,1.5,2,2.5,0,0.5,1,1.5,2.0,2.5]
})
columns = ['Step','A','B','C']
df=df[columns]
new_df=df[(df.Step == 1) & (df.A == 4|5|6|7) & (df.B == 10|20|30|40)]
new_df
source to share
You can use boolean indexing
with isin
:
new_df=df[(df.Step == 1) & (df.A.isin([4,5,6,7])) & (df.B.isin([10,20,30,40]))]
It seems that the second question needs DataFrame.nlargest
:
np.random.seed(789)
df = pd.DataFrame({
'Step': [1,1,1,1,1,1,2,2,2,2,2,2],
'A': [4,5,6,7,4,5,6,7,4,5,6,7],
'B': [10,20,30,40,10,20,30,40,10,20,30,40],
'C': [0,0.5,1,1.5,2,2.5,0,0.5,1,1.5,2.0,2.5],
'D':np.random.choice(np.arange(100,1000,100), size=12)
})
print (df)
A B C D Step
0 4 10 0.0 400 1
1 5 20 0.5 300 1
2 6 30 1.0 200 1
3 7 40 1.5 400 1
4 4 10 2.0 500 1
5 5 20 2.5 900 1
6 6 30 0.0 500 2
7 7 40 0.5 200 2
8 4 10 1.0 900 2
9 5 20 1.5 100 2
10 6 30 2.0 200 2
11 7 40 2.5 200 2
new_df= df[(df.Step == 1)&(df.A.isin([4,5,6,7]))&(df.B.isin([10,20,30,40]))].nlargest(1,'D')
print (new_df)
A B C D Step
5 5 20 2.5 900 1
source to share
Using DataFrame.query () Method :
In [7]: new_df = df.query("Step==1 and A in [4,5,6,7] and B in [10,20,30,40]")
In [8]: new_df
Out[8]:
Step A B C
0 1 4 10 0.0
1 1 5 20 0.5
2 1 6 30 1.0
3 1 7 40 1.5
4 1 4 10 2.0
5 1 5 20 2.5
source to share