How to calculate euclidean distance between a pair of strings of a numpy array
I have an array numpy
, for example:
import numpy as np
a = np.array([[1,0,1,0],
[1,1,0,0],
[1,0,1,0],
[0,0,1,1]])
I would like to calculate euclidian distance
between each pair of lines.
from scipy.spatial import distance
for i in range(0,a.shape[0]):
d = [np.sqrt(np.sum((a[i]-a[j])**2)) for j in range(i+1,a.shape[0])]
print(d)
[1.4142135623730951, 0.0, 1.4142135623730951]
[1.4142135623730951, 2.0]
[1.4142135623730951]
[]
Is there any better pythonic way to do this since I have to run this code on a huge array numpy
?
source to share
And for completeness, einsum often refers to distance calculations.
a = np.array([[1,0,1,0],
[1,1,0,0],
[1,0,1,0],
[0,0,1,1]])
b = a.reshape(a.shape[0], 1, a.shape[1])
np.sqrt(np.einsum('ijk, ijk->ij', a-b, a-b))
array([[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 0. , 1.41421356, 2. ],
[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 2. , 1.41421356, 0. ]])
source to share
In terms of something more "elegant", you can always use scikitlearn at paired Euclidean distance:
from sklearn.metrics.pairwise import euclidean_distances
euclidean_distances(a,a)
having the same result as a single array.
array([[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 0. , 1.41421356, 2. ],
[ 0. , 1.41421356, 0. , 1.41421356],
[ 1.41421356, 2. , 1.41421356, 0. ]])
source to share
I used itertools.combinations
together with the np.linalg.norm
vector of differences (this is the Euclidean distance):
import numpy as np
import itertools
a = np.array([[1,0,1,0],
[1,1,0,0],
[1,0,1,0],
[0,0,1,1]])
print([np.linalg.norm(x[0]-x[1]) for x in itertools.combinations(a, 2)])
For an understanding take a look at this example on the docs : combinations('ABCD', 2)
gives AB AC AD BC BD CD
. In your case A
, B
, C
and D
are strings your matrix A
, however, the term x[0]-x[1]
appearing in the above code, is a vector of vectors of difference in rows A
.
source to share