Select some of the column items and find the maximum number of them, multiple times over a large file. USING PYTHON

I have a large file with 2.2 million lines.

Value Label
4       1
6       1
2       2
6       2
3       2
5       3
8       3
7       3
1       4
5       4
2       5
4       5 
1       5

      

I want to know the fastest way to get the following output where "Max" stores the maximum value in each label

Label   Max
  1      6
  2      6
  3      8
  4      5
  5      4

      

I have implemented normal logic using 'for' & 'while' loops in python, but it takes hours. I expect pandas to have something to solve this problem.

+3


source to share


1 answer


Calling the max

groupby object:

In [116]:

df.groupby('Label').max()
Out[116]:
       Value
Label       
1          6
2          6
3          8
4          5
5          4

      



If you want to restore a column Label

from the index, then call reset_index

:

In [117]:

df.groupby('Label').max().reset_index()
Out[117]:
   Label  Value
0      1      6
1      2      6
2      3      8
3      4      5
4      5      4

      

+5


source







All Articles