Pandas: finding points at maximum distance
I am trying to find pairs of points (x, y) at the maximum distance from each other. I thought the easiest way would be to create a DataFrame and go through each point one by one, calculating if there are points with coordinates (x, y) at a distance r from a given point (x_0, y_0). Then divide the total number of pairs found by 2.
%pylab inline
import pandas as pd
def find_nbrs(low, high, num, max_d):
x = random.uniform(low, high, num)
y = random.uniform(low, high, num)
points = pd.DataFrame({'x':x, 'y':y})
tot_nbrs = 0
for i in arange(len(points)):
x_0 = points.x[i]
y_0 = points.y[i]
pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
tot_nbrs += len(pt_nbrz)
plot (pt_nbrz.x, pt_nbrz.y, 'r-')
plot (points.x, points.y, 'b.')
return tot_nbrs
print find_nbrs(0, 1, 50, 0.1)
-
First of all, it doesn't always find the right pairs (I see points at the specified distance that are not marked).
-
If I write
plot(..., 'or')
, it highlights all the points. This means itpt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
returns at least one (x, y). What for? Should it return an empty array if the comparison is False? -
How can I do all of the above more elegantly in Pandas? For example, without having to scroll through each element.
source to share
The functionality you are looking for is included in scipy's spatial distance module .
Here's an example of how you could use it. The real magic is in squareform(pdist(points))
.
from scipy.spatial.distance import pdist, squareform
import numpy as np
import matplotlib.pyplot as plt
points = np.random.uniform(-.5, .5, (1000,2))
# Compute the distance between each different pair of points in X with pdist.
# Then, just for ease of working, convert to a typical symmetric distance matrix
# with squareform.
dists = squareform(pdist(points))
poi = points[4] # point of interest
dist_min = .1
close_points = dists[4] < dist_min
print("There are {} other points within a distance of {} from the point "
"({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi))
There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)
For visualization purposes:
f,ax = plt.subplots(subplot_kw=
dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5)))
ax.plot(points[:,0], points[:,1], 'b+ ')
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g')
ax.plot(points[close_points,0], points[close_points,1],
marker='o', mfc='none', mec='r', ls='') # draw all points within distance
t = np.linspace(0, 2*np.pi, 512)
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance
plt.show()
source to share