How to calculate euclidean distance between a pair of strings of a numpy array

I have an array numpy

, for example:

import numpy as np
a = np.array([[1,0,1,0],
             [1,1,0,0],
             [1,0,1,0],
             [0,0,1,1]])

      

I would like to calculate euclidian distance

between each pair of lines.

from scipy.spatial import distance
for i in range(0,a.shape[0]):
    d = [np.sqrt(np.sum((a[i]-a[j])**2)) for j in range(i+1,a.shape[0])]
    print(d)

      

[1.4142135623730951, 0.0, 1.4142135623730951]

[1.4142135623730951, 2.0]

[1.4142135623730951]

[]

Is there any better pythonic way to do this since I have to run this code on a huge array numpy

?

+3


source to share


3 answers


And for completeness, einsum often refers to distance calculations.



a = np.array([[1,0,1,0],
         [1,1,0,0],
         [1,0,1,0],
         [0,0,1,1]])

b = a.reshape(a.shape[0], 1, a.shape[1])

np.sqrt(np.einsum('ijk, ijk->ij', a-b, a-b))

array([[ 0.        ,  1.41421356,  0.        ,  1.41421356],
       [ 1.41421356,  0.        ,  1.41421356,  2.        ],
       [ 0.        ,  1.41421356,  0.        ,  1.41421356],
       [ 1.41421356,  2.        ,  1.41421356,  0.        ]])

      

+4


source


In terms of something more "elegant", you can always use scikitlearn at paired Euclidean distance:

from sklearn.metrics.pairwise import euclidean_distances
euclidean_distances(a,a)

      



having the same result as a single array.

array([[ 0.        ,  1.41421356,  0.        ,  1.41421356],
       [ 1.41421356,  0.        ,  1.41421356,  2.        ],
       [ 0.        ,  1.41421356,  0.        ,  1.41421356],
       [ 1.41421356,  2.        ,  1.41421356,  0.        ]])

      

+3


source


I used itertools.combinations

together with the np.linalg.norm

vector of differences (this is the Euclidean distance):

import numpy as np
import itertools
a = np.array([[1,0,1,0],
              [1,1,0,0],
              [1,0,1,0],
              [0,0,1,1]])

print([np.linalg.norm(x[0]-x[1]) for x in itertools.combinations(a, 2)])

      

For an understanding take a look at this example on the docs :
combinations('ABCD', 2)

gives AB AC AD BC BD CD

. In your case A

, B

, C

and D

are strings your matrix A

, however, the term x[0]-x[1]

appearing in the above code, is a vector of vectors of difference in rows A

.

0


source







All Articles