Python: How to find two equal / close values between two separate arrays?
Let's say we have two arrays of equal length:
arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 20)
Which variable from arr1
is equal to / closest to variable from arr2
?
Looking at these two lists, we can easily tell that the closest numbers are 4.5 and 5 . I tried to implement a function that returns the two closest values given by the two lists, and this is similar to the examples above, but this is hardly a solution because it is not optimal. And you can easily check that the function is not working when we slightly change the arrays like this:
arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 18)
function return values are 13 and 18
Here is the function:
def get_nearest(arr1, arr2):
lr = [[0, 0, 0]]
for x1 in arr1:
for x2 in arr2:
r = (x1 / x2 % (x1 + x2))
print x1, x2, r
if r <= 1 and r >= lr[0][2]:
lr.pop()
lr.append([x1, x2, r])
return lr
Can you think of a better option?
source to share
Is speed a problem? Do you care about connections? If not, how about something simple like
from itertools import product
sorted(product(arr1, arr2), key=lambda t: abs(t[0]-t[1]))[0]
For both
arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 20)
and
arr1 = (21, 2, 3, 5, 13)
arr2 = (10, 4.5, 9, 12, 18)
this gives
(5, 4.5)
Explanation:
product(arr1, arr2) = [(a1, a2) for (a1, a2) in product(arr1, arr2)]
lists all pairs of numbers N**2
:
[(21, 10), (21, 4.5), ..., (13, 12), (13, 20)]
Then we sort them by absolute difference ( |a1 - a2|
) using sorted
. Omitting the sorted
keyword key
lets say sorted
use the sorting criteria lambda t: abs(t[0] - t[1])
. The pair with the smallest absolute difference is placed in the first index of the sorted array, so we can grab it by inserting it [0]
at the end.
Edit:
As suggested by Peter in the comments, you can upload key=func
to min
and max
, which makes it much faster. Try this instead:
from itertools import product
min(product(arr1, arr2), key=lambda t: abs(t[0]-t[1]))[0]
source to share
This is the fastest algorithm I could write, it has n * log (n) complexity, which is much faster than the naive n * n approach presented in other answers. It sorts the arrays before processing (this is the most time consuming part) and then tries to minimize the difference (it takes 2 * n in the worst case):
def closest_array_items(a1, a2):
if not a1 or not a2:
raise ValueError('Empty array')
a1, a2 = iter(sorted(a1)), iter(sorted(a2))
i1, i2 = a1.next(), a2.next()
min_dif = float('inf')
while 1:
dif = abs(i1 - i2)
if dif < min_dif:
min_dif = dif
pair = i1, i2
if not min_dif:
break
if i1 > i2:
try:
i2 = a2.next()
except StopIteration:
break
else:
try:
i1 = a1.next()
except StopIteration:
break
return pair
source to share
>>> arr1 = (21, 2, 3, 5, 13)
>>> arr2 = (10, 4.5, 9, 12, 20)
>>> for a1 in arr1:
... for a2 in arr2:
... if a1 > a2:
... result.append([a1, a2, a1-a2])
... else:
... result.append([a1, a2, a2-a1])
>>> sorted(result, key=lambda i:i[-1])[0][:2]
[5, 4.5]
An easy way could be to distinguish between both arrays and sort them by their difference and get the first element.
>>> sorted([[a1,a2,a1-a2] if(a1>a2) else [a1,a2,a2-a1] for a1 in arr1 for a2 in arr2], key=lambda i:i[-1])[0][:2]
[5, 4.5]
source to share
Here's a function that solves this problem in ~ 0.01s for two vectors of length 1000, 2000:
def get_closest_elements(arr_1, arr_2):
"""
The function finds the two closest elements in two arrays
Returns
-------
idx_1 : int
index of element in arr_1
idx_2 : int
index of element in arr_2
min_diff : float
minimal difference between arrays
"""
# get array with all differences between arrays
diff_arr = x[:, np.newaxis] - y
# get absolute value
diff_arr = np.abs(diff_arr)
# get minimum difference
min_diff = np.min(diff_arr)
# get the indexes for the elements of interest in arr_1 and arr_2
idx_1, idx_2 = np.where(diff_arr == min_diff)
return idx_1, idx_2, min_diff
# apply function
x = np.array([21, 2, 3, 5, 13])
y = np.array([10, 4.5, 9, 12, 20])
# n = 1000
# x = np.random.rand(n)
# y = np.random.rand(2*n)
idx_1, idx_2, min_diff = get_closest_elements(x, y)
print "x{} - y{} = {}".format(idx_1, idx_2, min_diff)
source to share