How to simply keep the rows with the maximum value in a column for elements of the same type?

Question

How to simply keep the rows with the maximum value in a column for elements of the same type?

I have the following table:

Item number | crit_A | crit_B|
------------|--------|-------|
     1      |  100   |  20   |
     1      |   10   | 100   |
     1      |   50   |  50   |
     2      |   10   | 100   |
     2      |   90   |  10   |
     2      |   90   |  10   |

I would like the pandas dataframe operation to return only the first and fifth rows. This matches the lines where crit_A is the maximum for the given item.

Item number | crit_A | crit_B|
------------|--------|-------|
     1      |  100   |  20   |
     2      |   90   |  10   |

Note. When crit_A has multiple equal values for a given item, I just need to return one item.

The following is not what I'm looking for:

res_82_df.groupby(['Item number']).max()

This doesn't work because it will group by Item number, but it will return the maximum value for all columns. Also note: I could search for an arbitrary threshold and execute the query. But this approach is also not reliable, because I always have to look at the data and make estimates.

How can this be done effectively?

Note. My question is indeed a duplicate of the one linked above. The answer here, however, is very unique and much more succinct, and does what I ask.

+3

python pandas dataframe

Thornhale May 10 '17 at 18:12

source to share

1 answer

MaxU · Accepted Answer · 2017-05-10T18:19:12+0000

I would do it like this:

In [107]: df.loc[df.groupby('Item number')['crit_A'].idxmax()]
Out[107]:
   Item number  crit_A  crit_B
0            1     100      20
4            2      90      10

How to simply keep the rows with the maximum value in a column for elements of the same type?

More articles: