Selecting a specific string from a groupby object in python

id    marks  year 
1     18      2013
1     25      2012
3     16      2014
2     16      2013
1     19      2013
3     25      2013
2     18      2014

      

suppose I am now grouping the above id value with the python command.
    grouped = file.groupby (file.id)

I would like to get a new file with only a line in each group with the latest year which is the highest for the entire year in the group.

Please let me know the command I am trying to apply, but it only has a boolean expression. I want the whole line with the last year.

+1


source to share


1 answer


I pieced this together using the following: Python: Get the string that has the maximum value in groups using groupby

So basically we can group the "id" column and then call transform

on the "year" column and create a boolean index where the year is the maximum year value for each "id":



In [103]:

df[df.groupby(['id'])['year'].transform(max) == df['year']]
Out[103]:
   id  marks  year
0   1     18  2013
2   3     16  2014
4   1     19  2013
6   2     18  2014

      

+3


source







All Articles