Get coordinates from pandas df from the center of gravity of coordinates
i has pandas df
id x_value y_value
100 1 2
200 3 1
300 5 3
400 3 6
500 3.2 3.5
600 4.5 3
I want to find the midpoint of these coordinate pairs, such that it will return the closest row of data from the midpoint coordinates.
def get_centroid(df):
lat_mean = df['x_value'].mean()
lat_mean = df['x_value'].iloc[(df['x_value']-lat_mean).abs().argsort([:1]].tolist()[0]
long_mean = df['y_value'].mean()
long_mean = df['y_value'].iloc[(df['y_value']-long_mean).abs().argsort()[:1]].tolist()[0]
return([lat_mean,long_mean])
But this approach is wrong as I would not get exact df pairs.
Is there any other way to do this?
+3
source to share
2 answers
The center will only be the middle x and y, which you can get with
df.mean()
x_value 3.283333
y_value 3.083333
dtype: float64
This gives you the location of the smallest square of the distance from the mean
df.sub(df.mean()).pow(2).sum(1).idxmin()
500
This will give you the line
df.loc[[df.sub(df.mean()).pow(2).sum(1).idxmin()]]
x_value y_value
id
500 3.2 3.5
Customization
df = pd.DataFrame({
'x_value': [1.0, 3.0, 5.0, 3.0, 3.2, 4.5],
'y_value': [2.0, 1.0, 3.0, 6.0, 3.5, 3.0]
}, pd.Index([100, 200, 300, 400, 500, 600], name='id')
)
+2
source to share
If you are looking for the smallest Euclidean distance, you can calculate the center distance for each row and choose the minimum:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame([{'y': 2.0, 'x': 1.0, 'id': 100}, {'y': 1.0, 'x': 3.0, 'id': 200}, {'y': 3.0, 'x': 5.0, 'id': 300}, {'y': 6.0, 'x': 3.0, 'id': 400}, {'y': 3.5, 'x': 3.2, 'id': 500}, {'y': 3.0, 'x': 4.5, 'id': 600}])
>>> df = df.set_index('id')
>>> df
x y
id
100 1.0 2.0
200 3.0 1.0
300 5.0 3.0
400 3.0 6.0
500 3.2 3.5
600 4.5 3.0
>>> center_x, center_y = df.mean()
>>> np.sqrt((center_x - df['x'])**2 + (center_y - df['y'])**2)
id
100 2.527295
200 2.102512
300 1.718688
400 2.930396
500 0.424918
600 1.219517
dtype: float64
>>> (np.sqrt((center_x - df['x'])**2 + (center_y - df['y'])**2)).idxmin()
500
>>> df.loc[(np.sqrt((center_x - df['x'])**2 + (center_y - df['y'])**2)).idxmin()]
x 3.2
y 3.5
Name: 500, dtype: float64
As far as I can tell, this is the same method as this answer , but less concise.
+1
source to share