How do I do this in numPy?

I have an array of X 3D coordinates of N points (N * 3) and want to calculate the eukelian distance between each pair of points.

I can do this by iterating over X and comparing them to the threshold.

coords = array([ for v in vertices])
for vertice in vertices:
    tests = np.sum(array(coords - ** 2, 1) < threshold
    closest = [v for v, t in zip(vertices, tests) if t]


Can this be done in one operation? I remember linear algebra 10 years ago and can't find a way to do it.

It should probably be a 3D array (point a, point b, axis) and then summed using dimension axis


edit: Found a solution on my own, but it doesn't work with large datasets.

    coords = array([ for v in vertices])
    big = np.repeat(array([coords]), len(coords), 0)
    big_same = np.swapaxes(big, 0, 1)
    tests = np.sum((big - big_same) ** 2, 0) < thr_square

    for v, test_vector in zip(vertices, tests):
        v.closest = self.filter(vertices, test_vector)



source to share

3 answers

Use scipy.spatial.distance

. If X

is an array of points n

× 3, you can get a matrix of distances n

× n


from scipy.spatial import distance
D = distance.squareform(distance.pdist(X))


Then the i

point with the index is closest to the point



( [1]

Skips the value in the diagonal, which will be returned first.)



I'm not really sure what you are asking here. If you are calculating the Euclidean distance between each pair of points in N-point space, it would make sense for me to represent the results as a search matrix. So for N points, you get an NxN symmetric matrix. Element (3, 5) will represent the distance between points 3 and 5, while element (2, 2) will be the distance between point 2 and (zero) itself. This is how I would do it for random points:

import numpy as np

N = 5 

coords = np.array([np.random.rand(3) for _ in range(N)])
dist = np.zeros((N, N)) 

for i in range(N):
    for j in range(i, N): 
        dist[i, j] = np.linalg.norm(coords[i] - coords[j])
        dist[j, i] = dist[i, j]

print dist




If xyz is an array with your coordinates, then the following code will calculate the distance matrix (works fast until you have enough memory to store N ^ 2 distances):

xyz = np.random.uniform(size=(1000,3))
distances = (sum([(xyzs[:,i][:,None]-xyzs[:,i][None,:])**2 for i in range(3)]))**.5




All Articles