Select some of the column items and find the maximum number of them, multiple times over a large file. USING PYTHON
I have a large file with 2.2 million lines.
Value Label
4 1
6 1
2 2
6 2
3 2
5 3
8 3
7 3
1 4
5 4
2 5
4 5
1 5
I want to know the fastest way to get the following output where "Max" stores the maximum value in each label
Label Max
1 6
2 6
3 8
4 5
5 4
I have implemented normal logic using 'for' & 'while' loops in python, but it takes hours. I expect pandas to have something to solve this problem.
+3
source to share
1 answer
Calling the max
groupby object:
In [116]:
df.groupby('Label').max()
Out[116]:
Value
Label
1 6
2 6
3 8
4 5
5 4
If you want to restore a column Label
from the index, then call reset_index
:
In [117]:
df.groupby('Label').max().reset_index()
Out[117]:
Label Value
0 1 6
1 2 6
2 3 8
3 4 5
4 5 4
+5
source to share