Python: Elementary comparison of arrays of the same shape

I have n matrices of the same size and want to see how many cells are equal to each other across all matrices. Code:

import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[5,6,7], [4,2,6], [7, 8, 9]])
c = np.array([2,3,4],[4,5,6],[1,2,5])

#Intuition is below but is wrong
a == b == c

      

How do I get Python to return the value 2 (cells 2,1 and 2,3 are the same in all 3 matrices) or an array of [[False, False, False], [True, False, True], [False, False, False]]?

+3


source to share


3 answers


You can do:

(a == b) & (b==c)

[[False False False]
 [ True False  True]
 [False False False]]

      

For n

elements, for example, in a list, for example x=[a, b, c, a, b, c]

, you can do:

r = x[0] == x[1]
for temp in x[2:]:
    r &= x[0]==temp

      

The result is now in r

.

If the structure is already in a 3D numpy array, one can also use:

np.amax(x,axis=2)==np.amin(x,axis=2)

      

The idea behind the above line is that while it would be ideal to have a function equal

with an argument axis

, there is none, so this line notes that if amin==amax

along the axis, then all elements are equal.




If the various arrays to be compared are not already in the 3D numpy array (or will not be in the future), the list loop will be quick and easy. While I generally agree that I avoid Python loops for Numpy arrays, this seems like a case where it's easier and faster (see below) to use a Python loop, since the loop is only along one axis and it's easy to accumulate comparisons in place. Here's a timing test:

def f0(x):
    r = x[0] == x[1]
    for y in x[2:]:
        r &= x[0]==y

def f1(x):  # from @Divakar
    r = ~np.any(np.diff(np.dstack(x),axis=2),axis=2)

def f2(x):
    x = np.dstack(x)
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

# speed test
for n, size, reps in ((1000, 3, 1000), (10, 1000, 100)):
    x = [np.ones((size, size)) for i in range(n)]
    print n, size, reps
    print "f0: ",
    print timeit("f0(x)", "from __main__ import x, f0, f1", number=reps)
    print "f1: ",
    print timeit("f1(x)", "from __main__ import x, f0, f1", number=reps)
    print

1000 3 1000
f0:  1.14673900604  # loop
f1:  3.93413209915  # diff
f2:  3.93126702309  # min max

10 1000 100
f0:  2.42633581161  # loop
f1:  27.1066679955  # diff
f2:  25.9518558979  # min max

      

If the arrays are already in the same 3D numpy array (like using x = np.dstack(x)

in the above example) then modify the above defs function appropriately and with the addition of the approach min==max

gives:

def g0(x):
    r = x[:,:,0] == x[:,:,1]
    for iy in range(x[:,:,2:].shape[2]):
        r &= x[:,:,0]==x[:,:,iy]

def g1(x):   # from @Divakar
    r = ~np.any(np.diff(x,axis=2),axis=2)

def g2(x):
    r = np.amax(x,axis=2)==np.amin(x,axis=2)

      

which gives:

1000 3 1000
g0:  3.9761030674      # loop
g1:  0.0599548816681   # diff
g2:  0.0313589572906   # min max

10 1000 100
g0:  10.7617051601     # loop
g1:  10.881870985      # diff
g2:  9.66712999344     # min max

      

Note that for a list of large arrays f0 = 2.4

and for a pre-built array g0, g1, g2 ~= 10.

, so if the input arrays are large, then the fastest approach around 4x is to store them separately in the list. I find this a bit surprising and I guess it might be due to a cache switch (or bad code?), But I'm not sure if anyone really cares, so I'll stop it here.

+3


source


Combine on the third axis with c np.dstack

and differentiate with c np.diff

so that identical ones appear as zeros. Then check for cases where all are zeros with ~np.any

. Thus, you will have a one line solution like this -

~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)

      



Example run -

In [39]: a
Out[39]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [40]: b
Out[40]: 
array([[5, 6, 7],
       [4, 2, 6],
       [7, 8, 9]])

In [41]: c
Out[41]: 
array([[2, 3, 4],
       [4, 5, 6],
       [1, 2, 5]])

In [42]: ~np.any(np.diff(np.dstack((a,b,c)),axis=2),axis=2)
Out[42]: 
array([[False, False, False],
       [ True, False,  True],
       [False, False, False]], dtype=bool)

      

+4


source


Try the following:

z1 = a == b
z2 = a == c
z  = np.logical_and(z1,z2)
print "count:", np.sum(z)

      

You can do it in one expression:

count = np.sum( np.logical_and(a == b, a == c) )

      

+1


source







All Articles