Clustering / Aggregation Trajectory with Python

I'm working with geo-local social media posts and clustering their locations (latitude / longitude) using DBSCAN. In my dataset, I have many users who have laid out multiple times, which allows me to output their trajectory (a time-ordered sequence of positions from place to place). Example:

3945641 [[38.9875, -76.94], [38.91711157, -77.02435118], [38.8991, -77.029], [38.8991, -77.029], [38.88927534, -77.04858468])


I have created trajectories for my entire dataset, and in the next step I will cluster or aggregate the trajectories to identify areas with dense movement between locations. Any ideas on how to solve trajectory clustering / aggregation in Python?

Here is some code I worked with to create trajectories as strings / JSON dicts:

import pandas as pd
import numpy as np
import ujson as json
import time

# Import Data
data = pd.read_csv('filepath.csv', delimiter=',', engine='python')
#print len(data),"rows"
#print data

# Create Data Fame
df = pd.DataFrame(data, columns=['user_id','timestamp','latitude','longitude','cluster_labels])
#print data.head()

# Get a list of unique user_id values
uniqueIds = np.unique(data['user_id'].values)

# Get the ordered (by timestamp) coordinates for each user_id
output = [[id,data.loc[data['user_id']==id].sort_values(by='timestamp')[['latitude','longitude']].values.tolist()] for id in uniqueIds]

# Save outputs as csv
outputs = pd.DataFrame(output)
#print outputs
outputs.to_csv('filepath_out.csv', index=False, header=False)

# Save outputs as JSON
#outputDict = {}
#for i in output:
# outputDict[i[0]]=i[1]

#with open('filepath.json','w') as f:
#json.dump(outputDict, f, sort_keys=True, indent=4, ensure_ascii=False,)



I came across the python package, NetworkX, and was discussing the idea of ​​generating network graphics from my clusters rather than clustering trajectory lines / segments. Any opinions about cluster trajectories vs turning clusters into a graph to identify densely grouped movements between locations.

Below is an example of some clusters: Cluster example


source to share

1 answer

In an attempt to answer my own 1+ year old question, I've come up with a couple of solutions that solved this (and similar questions), albeit without Python (which I was hoping for). First, using the method I provided to the user on the StackExchange GIS, using ArcGIS and several built-in line density analysis tools ( -heatmap-from- gps-track / 270524 # 270524 ). It takes GPS points, creates lines, segments the lines, and then groups them. The second method uses SQL (primarily ST_MakeLine

) and Postgres / GIS / CARTO database to create strings, ordered by ascending timestamp and then grouped by users (e.g. .You can then count the number of line occurrences (assuming the points are grouped with well-defined centroids, similar to my original question above) and treat this as a cluster (e.g. Python / NetworkX: adding weights to edges by edge frequency , https : // ).



All Articles