How do I execute a function on a group of rows in a pandas dataframe?

Question

How do I execute a function on a group of rows in a pandas dataframe?

I am trying to implement an algorithm . Let's say the algorithm is executed as function "xyz"

The function is specifically designed to work with trajectory data, i.e. (x, y).

The function takes two arguments:

the first argument is a list of tuples of (x, y) points,

and the second is a constant value.

This can be illustrated as follows:

 line = [(0,0),(1,0),(2,0),(2,1),(2,2),(1,2),(0,2),(0,1),(0,0)]
 xyz(line, 5.0) #calling the function

Output:

 [(0, 0), (2, 0), (2, 2), (0, 2), (0, 0)]

This can be easily implemented when there is only one line. But I have a huge dataframe:

     id      x     y    x,y
  0  1       0     0    (0,0)
  1  1       1     0    (1,0)
  2  1       2     0    (2,0)
  3  1       2     1    (2,1)
  4  1       2     2    (2,2)
  5  1       1     2    (1,2)
  6  2       1     3    (1,3)
  7  2       1     4    (1,4)
  8  2       2     3    (2,3)
  9  2       1     2    (1,2)
 10  3       2     5    (2,5)
 11  3       3     3    (3,3)
 12  3       1     9    (1,9)
 13  3       4     6    (4,6)

In the above data frame, the lines with the same "id" form the points of one separate path / line. I want to implement the above function for each of these lines.

Three different trajectories with ids 1,2,3 can be observed from df. Trajectory 1 has x, y value on line (0-5), trajectory 2 has its points on lines (6-9), etc.

How do I implement the "xyz" function for each of these lines, and since the output of this function is again a list of tuples of x, y coordinates, how do I store this list? Note. The output list can contain any number of tuples.

+3

python pandas dataframe

Liza Apr 28 17 at 5:35 am

source to share

1 answer

jezrael · Accepted Answer · 2017-04-28T05:38:55+0000

I think you need groupby

with apply

:

print (df.groupby('id')['x,y'].apply(lambda x: xyz(x, 5.0)))

Or:

print (df.groupby('id')['x,y'].apply(xyz, 5.0))

Sample with rdp

function - must be added tolist

, otherwise get KeyError: -1

:

print (df.groupby('id')['x,y'].apply(lambda x: rdp(x.tolist(), 5.0)))
#alternative with list
#print (df.groupby('id')['x,y'].apply(lambda x: rdp(list(x), 5.0))
id
1    [(0, 0), (1, 2)]
2    [(1, 3), (1, 2)]
3    [(2, 5), (4, 6)]
Name: x,y, dtype: object

How do I execute a function on a group of rows in a pandas dataframe?

More articles: