Get coordinates from pandas df from the center of gravity of coordinates

i has pandas df

id     x_value  y_value
100     1       2
200     3       1
300     5       3
400     3       6
500     3.2     3.5
600     4.5     3

      

I want to find the midpoint of these coordinate pairs, such that it will return the closest row of data from the midpoint coordinates.

def get_centroid(df):
    lat_mean = df['x_value'].mean()
    lat_mean = df['x_value'].iloc[(df['x_value']-lat_mean).abs().argsort([:1]].tolist()[0]
    long_mean = df['y_value'].mean()
    long_mean = df['y_value'].iloc[(df['y_value']-long_mean).abs().argsort()[:1]].tolist()[0]
    return([lat_mean,long_mean])

      

But this approach is wrong as I would not get exact df pairs.

Is there any other way to do this?

+3


source to share


2 answers


The center will only be the middle x and y, which you can get with

df.mean()

x_value    3.283333
y_value    3.083333
dtype: float64

      

This gives you the location of the smallest square of the distance from the mean

df.sub(df.mean()).pow(2).sum(1).idxmin()

500

      

This will give you the line



df.loc[[df.sub(df.mean()).pow(2).sum(1).idxmin()]]

     x_value  y_value
id                   
500      3.2      3.5

      


Customization

df = pd.DataFrame({
        'x_value': [1.0, 3.0, 5.0, 3.0, 3.2, 4.5],
        'y_value': [2.0, 1.0, 3.0, 6.0, 3.5, 3.0]
    }, pd.Index([100, 200, 300, 400, 500, 600], name='id')
)

      

+2


source


If you are looking for the smallest Euclidean distance, you can calculate the center distance for each row and choose the minimum:

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> df = pd.DataFrame([{'y': 2.0, 'x': 1.0, 'id': 100}, {'y': 1.0, 'x': 3.0, 'id': 200}, {'y': 3.0, 'x': 5.0, 'id': 300}, {'y': 6.0, 'x': 3.0, 'id': 400}, {'y': 3.5, 'x': 3.2, 'id': 500}, {'y': 3.0, 'x': 4.5, 'id': 600}])

>>> df = df.set_index('id')
>>> df
       x    y
id           
100  1.0  2.0
200  3.0  1.0
300  5.0  3.0
400  3.0  6.0
500  3.2  3.5
600  4.5  3.0
>>> center_x, center_y = df.mean()
>>> np.sqrt((center_x - df['x'])**2 + (center_y - df['y'])**2)
id
100    2.527295
200    2.102512
300    1.718688
400    2.930396
500    0.424918
600    1.219517
dtype: float64
>>> (np.sqrt((center_x - df['x'])**2 + (center_y - df['y'])**2)).idxmin()
500
>>> df.loc[(np.sqrt((center_x - df['x'])**2 + (center_y - df['y'])**2)).idxmin()]
x    3.2
y    3.5
Name: 500, dtype: float64

      



As far as I can tell, this is the same method as this answer , but less concise.

+1


source







All Articles