How to apply specific function in pandas dataframe
I have a specific function that works with 2d arrays. The function angle
calculates the angle between vectors.
When you call the function below, it is entered in "directions" as a parameter, which is a 2-dimensional array (with 2 columns, one with x vals and the other with y vals).
Now directions
obtained by using the np.diff()
function 2d array .
import matplotlib.pyplot as plt
import numpy as np
import os
import rdp
def angle(dir):
"""
Returns the angles between vectors.
Parameters:
dir is a 2D-array of shape (N,M) representing N vectors in M-dimensional space.
The return value is a 1D-array of values of shape (N-1,), with each value between 0 and pi.
0 implies the vectors point in the same direction
pi/2 implies the vectors are orthogonal
pi implies the vectors point in opposite directions
"""
dir2 = dir[1:]
dir1 = dir[:-1]
return np.arccos((dir1*dir2).sum(axis=1)/(np.sqrt((dir1**2).sum(axis=1)*(dir2**2).sum(axis=1))))
tolerance = 70
min_angle = np.pi*0.22
filename = os.path.expanduser('~/tmp/bla.data')
points = np.genfromtxt(filename).T
print(len(points))
x, y = points.T
# Use the Ramer-Douglas-Peucker algorithm to simplify the path
# http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
# Python implementation: https://github.com/sebleier/RDP/
simplified = np.array(rdp.rdp(points.tolist(), tolerance))
print(len(simplified))
sx, sy = simplified.T
# compute the direction vectors on the simplified curve
directions = np.diff(simplified, axis=0)
theta = angle(directions)
# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = np.where(theta>min_angle)[0]+1
I want to implement the above code pandas.DataFrame
with trajectory data.
Below is a sample df
. sx
, sy
values belonging to the same subid
are counted as one trajectory, for example, line (0-3) has the same subid
as 2, and id
since 11 is counted as points on the trajectory. Lines (4-6) are one path, etc. Therefore, when the value of subid
or changes id
, individual track data is detected.
id subid simplified_points sx sy
0 11 2 (3,4) 3 4
1 11 2 (5,6) 5 6
2 11 2 (7,8) 7 8
3 11 2 (9,9) 9 9
4 11 3 (10,12) 10 12
5 11 3 (12,14) 12 14
6 11 3 (13,15) 13 15
7 12 9 (18,20) 18 20
8 12 9 (22,24) 22 24
9 12 9 (25,27) 25 27
The above data frame was obtained after applying the rdp algorithm. simplified_points
is further decompressed into two columns sx
and sy
is the result of rdp algo.
The problem is getting directions
for each of these trajectories and then getting theta
and idx
. Since the above code was only implemented for one path and also for a 2d array, I can't seem to implement it for a pandas dataframe.
Please suggest me a way to implement the above code for each trajectory file in df.
source to share
You can use pandas.DataFrame.groupby.apply()
to work with each (id, subid)
, with something like:
Code:
def theta(group):
dx = pd.Series(group.sx.diff(), name='dx')
dy = pd.Series(group.sy.diff(), name='dy')
theta = pd.Series(np.arctan2(dy, dx), name='theta')
return pd.concat([dx, dy, theta], axis=1)
df2 = df.groupby(['id', 'subid']).apply(theta)
Test code:
df = pd.read_fwf(StringIO(u"""
id subid simplified_points sx sy
11 2 (3,4) 3 4
11 2 (5,6) 5 6
11 2 (7,8) 7 8
11 2 (9,9) 9 9
11 3 (10,12) 10 12
11 3 (12,14) 12 14
11 3 (13,15) 13 15
12 9 (18,20) 18 20
12 9 (22,24) 22 24
12 9 (25,27) 25 27"""),
header=1)
df2 = df.groupby(['id', 'subid']).apply(theta)
df = pd.concat([df, pd.DataFrame(df2.values, columns=df2.columns)], axis=1)
print(df)
Results:
id subid simplified_points sx sy dx dy theta
0 11 2 (3,4) 3 4 NaN NaN NaN
1 11 2 (5,6) 5 6 2.0 2.0 0.785398
2 11 2 (7,8) 7 8 2.0 2.0 0.785398
3 11 2 (9,9) 9 9 2.0 1.0 0.463648
4 11 3 (10,12) 10 12 NaN NaN NaN
5 11 3 (12,14) 12 14 2.0 2.0 0.785398
6 11 3 (13,15) 13 15 1.0 1.0 0.785398
7 12 9 (18,20) 18 20 NaN NaN NaN
8 12 9 (22,24) 22 24 4.0 4.0 0.785398
9 12 9 (25,27) 25 27 3.0 3.0 0.785398
source to share