Pandas: finding points at maximum distance

I am trying to find pairs of points (x, y) at the maximum distance from each other. I thought the easiest way would be to create a DataFrame and go through each point one by one, calculating if there are points with coordinates (x, y) at a distance r from a given point (x_0, y_0). Then divide the total number of pairs found by 2.

%pylab inline
import pandas as pd

def find_nbrs(low, high, num, max_d):
    x = random.uniform(low, high, num)
    y = random.uniform(low, high, num)
    points = pd.DataFrame({'x':x, 'y':y})

    tot_nbrs = 0

    for i in arange(len(points)):
        x_0 = points.x[i]
        y_0 = points.y[i]

        pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
        tot_nbrs += len(pt_nbrz)
        plot (pt_nbrz.x, pt_nbrz.y, 'r-')

    plot (points.x, points.y, 'b.')
    return tot_nbrs

print find_nbrs(0, 1, 50, 0.1)

      

  • First of all, it doesn't always find the right pairs (I see points at the specified distance that are not marked).

  • If I write plot(..., 'or')

    , it highlights all the points. This means it pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]

    returns at least one (x, y). What for? Should it return an empty array if the comparison is False?

  • How can I do all of the above more elegantly in Pandas? For example, without having to scroll through each element.

+3


source to share


1 answer


The functionality you are looking for is included in scipy's spatial distance module .

Here's an example of how you could use it. The real magic is in squareform(pdist(points))

.

from scipy.spatial.distance import pdist, squareform
import numpy as np
import matplotlib.pyplot as plt

points = np.random.uniform(-.5, .5, (1000,2))

# Compute the distance between each different pair of points in X with pdist.
# Then, just for ease of working, convert to a typical symmetric distance matrix
# with squareform.
dists = squareform(pdist(points))

poi = points[4] # point of interest
dist_min = .1
close_points = dists[4] < dist_min

print("There are {} other points within a distance of {} from the point "
    "({:.3f}, {:.3f})".format(close_points.sum() - 1, dist_min, *poi))

      

There are 27 other points within a distance of 0.1 from the point (0.194, 0.160)



For visualization purposes:

f,ax = plt.subplots(subplot_kw=
    dict(aspect='equal', xlim=(-.5, .5), ylim=(-.5, .5)))
ax.plot(points[:,0], points[:,1], 'b+ ')
ax.plot(poi[0], poi[1], ms=15, marker='s', mfc='none', mec='g')
ax.plot(points[close_points,0], points[close_points,1],
    marker='o', mfc='none', mec='r', ls='')  # draw all points within distance

t = np.linspace(0, 2*np.pi, 512)
circle = dist_min*np.vstack([np.cos(t), np.sin(t)]).T
ax.plot((circle+poi)[:,0], (circle+poi)[:,1], 'k:') # Add a visual check for that distance
plt.show()

      

enter image description here

+7


source







All Articles