Selecting a specific string from a groupby object in python
id marks year
1 18 2013
1 25 2012
3 16 2014
2 16 2013
1 19 2013
3 25 2013
2 18 2014
suppose I am now grouping the above id value with the python command.
grouped = file.groupby (file.id)
I would like to get a new file with only a line in each group with the latest year which is the highest for the entire year in the group.
Please let me know the command I am trying to apply, but it only has a boolean expression. I want the whole line with the last year.
+1
source to share
1 answer
I pieced this together using the following: Python: Get the string that has the maximum value in groups using groupby
So basically we can group the "id" column and then call transform
on the "year" column and create a boolean index where the year is the maximum year value for each "id":
In [103]:
df[df.groupby(['id'])['year'].transform(max) == df['year']]
Out[103]:
id marks year
0 1 18 2013
2 3 16 2014
4 1 19 2013
6 2 18 2014
+3
source to share