Clustering / Aggregation Trajectory with Python

I'm working with geo-local social media posts and clustering their locations (latitude / longitude) using DBSCAN. In my dataset, I have many users who have laid out multiple times, which allows me to output their trajectory (a time-ordered sequence of positions from place to place). Example:

3945641 [[38.9875, -76.94], [38.91711157, -77.02435118], [38.8991, -77.029], [38.8991, -77.029], [38.88927534, -77.04858468])

      

I have created trajectories for my entire dataset, and in the next step I will cluster or aggregate the trajectories to identify areas with dense movement between locations. Any ideas on how to solve trajectory clustering / aggregation in Python?

Here is some code I worked with to create trajectories as strings / JSON dicts:

import pandas as pd
import numpy as np
import ujson as json
import time

# Import Data
data = pd.read_csv('filepath.csv', delimiter=',', engine='python')
#print len(data),"rows"
#print data

# Create Data Fame
df = pd.DataFrame(data, columns=['user_id','timestamp','latitude','longitude','cluster_labels])
#print data.head()

# Get a list of unique user_id values
uniqueIds = np.unique(data['user_id'].values)

# Get the ordered (by timestamp) coordinates for each user_id
output = [[id,data.loc[data['user_id']==id].sort_values(by='timestamp')[['latitude','longitude']].values.tolist()] for id in uniqueIds]

# Save outputs as csv
outputs = pd.DataFrame(output)
#print outputs
outputs.to_csv('filepath_out.csv', index=False, header=False)

# Save outputs as JSON
#outputDict = {}
#for i in output:
# outputDict[i[0]]=i[1]

#with open('filepath.json','w') as f:
#json.dump(outputDict, f, sort_keys=True, indent=4, ensure_ascii=False,)

      

EDIT

I came across the python package, NetworkX, and was discussing the idea of ​​generating network graphics from my clusters rather than clustering trajectory lines / segments. Any opinions about cluster trajectories vs turning clusters into a graph to identify densely grouped movements between locations.

Below is an example of some clusters: Cluster example

+3


source to share


1 answer


In an attempt to answer my own 1+ year old question, I've come up with a couple of solutions that solved this (and similar questions), albeit without Python (which I was hoping for). First, using the method I provided to the user on the StackExchange GIS, using ArcGIS and several built-in line density analysis tools ( https://gis.stackexchange.com/questions/42224/creating-polyline-based -heatmap-from- gps-track / 270524 # 270524 ). It takes GPS points, creates lines, segments the lines, and then groups them. The second method uses SQL (primarily ST_MakeLine

) and Postgres / GIS / CARTO database to create strings, ordered by ascending timestamp and then grouped by users (e.g. https://carto.com/blog/jets-and-datelines/) .You can then count the number of line occurrences (assuming the points are grouped with well-defined centroids, similar to my original question above) and treat this as a cluster (e.g. Python / NetworkX: adding weights to edges by edge frequency , https : //carto.com/blog/alteryx-and-carto-to-explore-london-bike-data/ ).



0


source







All Articles