How do I execute a function on a group of rows in a pandas dataframe?
I am trying to implement an algorithm . Let's say the algorithm is executed as function "xyz"
The function is specifically designed to work with trajectory data, i.e. (x, y).
The function takes two arguments:
the first argument is a list of tuples of (x, y) points,
and the second is a constant value.
This can be illustrated as follows:
line = [(0,0),(1,0),(2,0),(2,1),(2,2),(1,2),(0,2),(0,1),(0,0)]
xyz(line, 5.0) #calling the function
Output:
[(0, 0), (2, 0), (2, 2), (0, 2), (0, 0)]
This can be easily implemented when there is only one line. But I have a huge dataframe:
id x y x,y
0 1 0 0 (0,0)
1 1 1 0 (1,0)
2 1 2 0 (2,0)
3 1 2 1 (2,1)
4 1 2 2 (2,2)
5 1 1 2 (1,2)
6 2 1 3 (1,3)
7 2 1 4 (1,4)
8 2 2 3 (2,3)
9 2 1 2 (1,2)
10 3 2 5 (2,5)
11 3 3 3 (3,3)
12 3 1 9 (1,9)
13 3 4 6 (4,6)
In the above data frame, the lines with the same "id" form the points of one separate path / line. I want to implement the above function for each of these lines.
Three different trajectories with ids 1,2,3 can be observed from df. Trajectory 1 has x, y value on line (0-5), trajectory 2 has its points on lines (6-9), etc.
How do I implement the "xyz" function for each of these lines, and since the output of this function is again a list of tuples of x, y coordinates, how do I store this list? Note. The output list can contain any number of tuples.
source to share
I think you need groupby
with apply
:
print (df.groupby('id')['x,y'].apply(lambda x: xyz(x, 5.0)))
Or:
print (df.groupby('id')['x,y'].apply(xyz, 5.0))
Sample with rdp
function - must be added tolist
, otherwise get KeyError: -1
:
print (df.groupby('id')['x,y'].apply(lambda x: rdp(x.tolist(), 5.0)))
#alternative with list
#print (df.groupby('id')['x,y'].apply(lambda x: rdp(list(x), 5.0))
id
1 [(0, 0), (1, 2)]
2 [(1, 3), (1, 2)]
3 [(2, 5), (4, 6)]
Name: x,y, dtype: object
source to share