Np.where multiple pandas boolean operators

I know there are many questions about chained logical operators using np.where.

I have 2 data frames:

df1
   A  B  C  D  E  F Postset
0  1  2  3  4  5  6     yes
1  1  2  3  4  5  6      no
2  1  2  3  4  5  6     yes

df2
   A  B  C  D  E  F Preset
0  1  2  3  4  5  6    yes
1  1  2  3  4  5  6    yes
2  1  2  3  4  5  6    yes

      

I want to compare the uniqueness of the strings in each frame. For this I need to check that all values ​​are equal for multiple selected columns.

From this question : if i check columns a

b

c

d

e

f

i can do:

np.where((df1.A != df2.A) | (df1.B != df2.B) | (df1.C != df2.C) | (df1.D != df2.D) | (df1.E != df2.E) | (df1.F != df2.F))

      

Which correctly gives:

(array([], dtype=int64),)

      

i.e. the values ​​in all columns are independently equal for both data frames.

This is great for a small dataframe, but my real dataframe has a large number of columns that I have to check. The condition is np.where

too long to write with precision.

Instead, I would like to put my columns in a list:

columns_check_list = ['A','B','C','D','E','F'] 

      

And use my operator np.where

to check all columns automatically.

This obviously doesn't work, but its the type of form I'm looking for. Something like:

check = np.where([df[column) != df[column] | for column in columns_check_list]) 

      

How can I achieve this?

Questions:

  • I have many columns
  • The format of my data has been corrected.
  • The values ​​in the columns can contain either strings

    or floats

    .
+3


source to share


2 answers


You seem to need all

to check if all values ​​are True

per line or any

if only one value per line True

:

mask= ~(df1[columns_check_list] == df2[columns_check_list]).all(axis=1).values
print (mask)
[False False False]

      

Or more readable, thanks to IanS :



mask= (df1[columns_check_list] != df2[columns_check_list]).any(axis=1).values
print (mask)
[False False False]

      

It is also possible to compare numpy array

s:

mask= (df1[columns_check_list].values != df2[columns_check_list].values).any(axis=1)
print (mask)
[False False False]

      

+3


source


You can use by comparison values:np.logical_or

reduce

>>> import numpy as np
>>> np.logical_or.reduce((df1 != df2).values, axis=1)  # along rows
array([False, False, False], dtype=bool)               # each value represents a row

      

You may need to exclude columns before doing the comparison:

(df1[include_columns_list] != df2[include_columns_list]).values

      



or after:

(df1 != df2)[include_columns_list].values

      

Except np.logical_or

also exists np.bitwise_or

, but if you are dealing with booleans (and the comparison returns an array of booleans), these are equivalent.

+3


source







All Articles