How do I do this in numPy?
I have an array of X 3D coordinates of N points (N * 3) and want to calculate the eukelian distance between each pair of points.
I can do this by iterating over X and comparing them to the threshold.
coords = array([v.xyz for v in vertices])
for vertice in vertices:
tests = np.sum(array(coords - vertice.xyz) ** 2, 1) < threshold
closest = [v for v, t in zip(vertices, tests) if t]
Can this be done in one operation? I remember linear algebra 10 years ago and can't find a way to do it.
It should probably be a 3D array (point a, point b, axis) and then summed using dimension axis
.
edit: Found a solution on my own, but it doesn't work with large datasets.
coords = array([v.xyz for v in vertices])
big = np.repeat(array([coords]), len(coords), 0)
big_same = np.swapaxes(big, 0, 1)
tests = np.sum((big - big_same) ** 2, 0) < thr_square
for v, test_vector in zip(vertices, tests):
v.closest = self.filter(vertices, test_vector)
source to share
Use scipy.spatial.distance
. If X
is an array of points n
× 3, you can get a matrix of distances n
× n
from
from scipy.spatial import distance
D = distance.squareform(distance.pdist(X))
Then the i
point with the index is closest to the point
np.argsort(D[i])[1]
( [1]
Skips the value in the diagonal, which will be returned first.)
source to share
I'm not really sure what you are asking here. If you are calculating the Euclidean distance between each pair of points in N-point space, it would make sense for me to represent the results as a search matrix. So for N points, you get an NxN symmetric matrix. Element (3, 5) will represent the distance between points 3 and 5, while element (2, 2) will be the distance between point 2 and (zero) itself. This is how I would do it for random points:
import numpy as np
N = 5
coords = np.array([np.random.rand(3) for _ in range(N)])
dist = np.zeros((N, N))
for i in range(N):
for j in range(i, N):
dist[i, j] = np.linalg.norm(coords[i] - coords[j])
dist[j, i] = dist[i, j]
print dist
source to share
If xyz is an array with your coordinates, then the following code will calculate the distance matrix (works fast until you have enough memory to store N ^ 2 distances):
xyz = np.random.uniform(size=(1000,3))
distances = (sum([(xyzs[:,i][:,None]-xyzs[:,i][None,:])**2 for i in range(3)]))**.5
source to share