How to simply keep the rows with the maximum value in a column for elements of the same type?

I have the following table:

Item number | crit_A | crit_B|
------------|--------|-------|
     1      |  100   |  20   |
     1      |   10   | 100   |
     1      |   50   |  50   |
     2      |   10   | 100   |
     2      |   90   |  10   |
     2      |   90   |  10   |

      

I would like the pandas dataframe operation to return only the first and fifth rows. This matches the lines where crit_A is the maximum for the given item.

Item number | crit_A | crit_B|
------------|--------|-------|
     1      |  100   |  20   |
     2      |   90   |  10   |

      

Note. When crit_A has multiple equal values ​​for a given item, I just need to return one item.

The following is not what I'm looking for:

res_82_df.groupby(['Item number']).max()

      

This doesn't work because it will group by Item number, but it will return the maximum value for all columns. Also note: I could search for an arbitrary threshold and execute the query. But this approach is also not reliable, because I always have to look at the data and make estimates.

How can this be done effectively?

Note. My question is indeed a duplicate of the one linked above. The answer here, however, is very unique and much more succinct, and does what I ask.

+3


source to share


1 answer


I would do it like this:



In [107]: df.loc[df.groupby('Item number')['crit_A'].idxmax()]
Out[107]:
   Item number  crit_A  crit_B
0            1     100      20
4            2      90      10

      

+3


source







All Articles