How do I find the closest matching vectors between two coordinate matrices?
I have the following problem in Python that I need to solve:
For two coordinate matrices (numPy ndarrays) A
and B
find for all coordinate vectors A
in the A
corresponding coordinate vectors B
in B
such that the Euclidean distance is ||a-b||
minimized. Coordinate matrices A
and B
can have a different number of coordinate vectors (i.e., a different number of rows).
This method should return a matrix of coordinate vectors C
, where the i-th vector C
in C
is a vector from B
, which minimizes the Euclidean distance with the i-th coordinate vector A
in A
.
For example, let's say
A = np.array([[1,1], [3,4]])
and B = np.array([[1,2], [3,6], [8,1]])
Euclidean distances between vector [1,1]
A
and vectors B
are equal:
1, 5.385165, 7
So, the first vector in C
will be[1,2]
Similarly, the distances for vector [3,4]
A
and vectors B
are equal:
2.828427, 2, 5.830952
So the second and last vectors in C
will be[3,6]
So C = [[1,2], [3,6]]
How do I code this correctly in Python?
source to share
You can use cdist
from scipy.spatial.distance
to get euclidean distances efficiently and then use np.argmin
to get the indices that match the minimum values ββand use them to index in B
for the final output. Here's the implementation -
import numpy as np
from scipy.spatial.distance import cdist
C = B[np.argmin(cdist(A,B),1)]
Example run -
In [99]: A
Out[99]:
array([[1, 1],
[3, 4]])
In [100]: B
Out[100]:
array([[1, 2],
[3, 6],
[8, 1]])
In [101]: B[np.argmin(cdist(A,B),1)]
Out[101]:
array([[1, 2],
[3, 6]])
source to share