Finding identical rows and columns in a numpy array

I have a bolean array of nxn elements and I want to check if any string is identical to another. If there are identical rows, I want to check if the corresponding columns are identical.

Here's an example:

A=np.array([[0, 1, 0, 0, 0, 1],
            [0, 0, 0, 1, 0, 1],
            [0, 1, 0, 0, 0, 1],
            [1, 0, 1, 0, 1, 1],
            [1, 1, 1, 0, 0, 0],
            [0, 1, 0, 1, 0, 1]])

      

I would like the program to detect that the first and third rows are identical, and then check if the first and third columns are identical; which in this case they are.

+3


source to share


3 answers


You can use np.array_equal () :

for i in range(len(A)): #generate pairs
    for j in range(i+1,len(A)): 
        if np.array_equal(A[i],A[j]): #compare rows
            if np.array_equal(A[:,i],A[:,j]): #compare columns
                print (i, j),
        else: pass

      



or using combinations () :

import itertools

for pair in itertools.combinations(range(len(A)),2):
    if np.array_equal(A[pair[0]],A[pair[1]]) and np.array_equal(A[:,pair[0]],A[:,pair[1]]): #compare columns
        print pair

      

+3


source


Starting with the typical way of applying arrays np.unique

to 2D and returning unique pairs to it:

def unique_pairs(arr):
    uview = np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
    uvals, uidx = np.unique(uview, return_inverse=True)
    pos = np.where(np.bincount(uidx) == 2)[0]

    pairs = []
    for p in pos:
        pairs.append(np.where(uidx==p)[0])

    return np.array(pairs)

      

Then we can do the following:



row_pairs = unique_pairs(A)
col_pairs = unique_pairs(A.T)

for pair in row_pairs:
    if np.any(np.all(pair==col_pairs, axis=1)):
        print pair

>>> [0 2]

      

Of course, there are many optimizations left, but the main thing is to use np.unique

. The effectiveness of this method over others depends largely on how you define "small" arrays.

+2


source


Since you said performance isn't critical, here's a not-so-numpythonic brute-force solution:

>>> n = len(A)
>>> for i1, row1 in enumerate(A):
...     offset = i1 + 1  # skip rows already compared 
...     for i2, row2 in enumerate(A[offset:], start=offset):
...         if (row1 == row2).all() and (A.T[i1] == A.T[i2]).all():
...             print i1, i2
...             
0 2

      

Probably O (n ^ 2). I am using a transposed array A.T

to check for columns that are also equal.

+1


source







All Articles