Np.where multiple pandas boolean operators
I know there are many questions about chained logical operators using np.where.
I have 2 data frames:
df1
A B C D E F Postset
0 1 2 3 4 5 6 yes
1 1 2 3 4 5 6 no
2 1 2 3 4 5 6 yes
df2
A B C D E F Preset
0 1 2 3 4 5 6 yes
1 1 2 3 4 5 6 yes
2 1 2 3 4 5 6 yes
I want to compare the uniqueness of the strings in each frame. For this I need to check that all values ββare equal for multiple selected columns.
From this question : if i check columns a
b
c
d
e
f
i can do:
np.where((df1.A != df2.A) | (df1.B != df2.B) | (df1.C != df2.C) | (df1.D != df2.D) | (df1.E != df2.E) | (df1.F != df2.F))
Which correctly gives:
(array([], dtype=int64),)
i.e. the values ββin all columns are independently equal for both data frames.
This is great for a small dataframe, but my real dataframe has a large number of columns that I have to check. The condition is np.where
too long to write with precision.
Instead, I would like to put my columns in a list:
columns_check_list = ['A','B','C','D','E','F']
And use my operator np.where
to check all columns automatically.
This obviously doesn't work, but its the type of form I'm looking for. Something like:
check = np.where([df[column) != df[column] | for column in columns_check_list])
How can I achieve this?
Questions:
- I have many columns
- The format of my data has been corrected.
- The values ββin the columns can contain either
strings
orfloats
.
source to share
You seem to need all
to check if all values ββare True
per line or any
if only one value per line True
:
mask= ~(df1[columns_check_list] == df2[columns_check_list]).all(axis=1).values
print (mask)
[False False False]
Or more readable, thanks to IanS :
mask= (df1[columns_check_list] != df2[columns_check_list]).any(axis=1).values
print (mask)
[False False False]
It is also possible to compare numpy array
s:
mask= (df1[columns_check_list].values != df2[columns_check_list].values).any(axis=1)
print (mask)
[False False False]
source to share
You can use by comparison values:np.logical_or
reduce
>>> import numpy as np
>>> np.logical_or.reduce((df1 != df2).values, axis=1) # along rows
array([False, False, False], dtype=bool) # each value represents a row
You may need to exclude columns before doing the comparison:
(df1[include_columns_list] != df2[include_columns_list]).values
or after:
(df1 != df2)[include_columns_list].values
Except np.logical_or
also exists np.bitwise_or
, but if you are dealing with booleans (and the comparison returns an array of booleans), these are equivalent.
source to share