How do I create a pandas framework to match the top 20% value in a column?

Question

How do I create a pandas framework to match the top 20% value in a column?

There is a pandas dataframe:

df = pd.DataFrame({'c1':['a','b','c','d','e','f','g','h','i','j'],
                   'c2':[10,12,23,4,18,98,11,23,33,99]})


    c1  c2
0   a   10
1   b   12
2   c   23
3   d   4
4   e   18
5   f   98
6   g   11
7   h   23
8   i   33
9   j   99

I want to create a new dataframe that only contains the top 20% of the rows according to the values in column c2, in this case:

output:

   c1   c2
0   f   98
1   j   99

+3

python pandas

freefrog Jul 27. 17 at 12:37 am

source to share

4 answers

Psidom · Answer 1 · 2017-07-27T00:41:02+0000

You can use the method quantile

to calculate the 80 percent threshold and store values above it:

df[df.c2.gt(df.c2.quantile(0.8))]

#  c1   c2
#5  f   98
#9  j   99

Or use nlargest

:

df.nlargest(int(len(df) * 0.2), 'c2')
#  c1   c2
#9  j   99
#5  f   98

Alexander · Answer 2 · 2017-07-27T01:22:34+0000

In the interest of diversity ...

top_percentage = 0.2
>>> df.sort_values('c2').tail(int(len(df) * top_percentage))
# Output:
#    c1  c2
# 5  f  98
# 9  j  99

Bhushan mehta · Answer 3 · 2017-07-27T00:53:58+0000

df = df.sort_values(by=['c2'],ascending = True)
split_len = int(0.8*len(df))
df = df.iloc[split_len:]

piRSquared · Answer 4 · 2017-07-27T01:27:03+0000

Using a parameter pct=True

in a methodpd.Series.rank

df[df.c2.rank(pct=True).gt(.8)]

  c1  c2
5  f  98
9  j  99

How do I create a pandas framework to match the top 20% value in a column?

More articles: