How do I specify the quartile of a specific column in a pandas dataframe?
I am working on a dataframe in python. How can I specify all rows that matter for a particular column, "rate", within a particular quartile (ex q1, q2, q3, q4)? Here interval is the "speed" range, therefore [-0, 0.913056] is the entire range. I want to indicate that the "rate" value on each line will fall within the range quantile.
name rate
0 3POWER ENERGY GROUP INC -0.000000
1 808 RENEWABLE ENERGY CORP -0.112192
2 YORK WATER CO 0.774955
3 ZTO EXPRESS (CAYM) INC -ADR 0.086352
4 AEP GENERATING CO 0.850960
5 AEP TEXAS CENTRAL CO 0.600301
6 AIR T INC 0.254511
7 ALABAMA GAS CORP 0.611631
8 ALABAMA POWER CO 0.913056
9 ALLEGIANT TRAVEL CO 0.227421
10 COMCAST CORP 0.012037
11 HAWAIIAN ELECTRIC CO 0.670980
12 HAWAIIAN ELECTRIC INDS 0.775778
df like this.
name rate quartile
0 3POWER ENERGY GROUP INC -0.000000 q1
1 808 RENEWABLE ENERGY CORP -0.112192 q1
2 YORK WATER CO 0.774955 q3
3 ZTO EXPRESS (CAYM) INC -ADR 0.086352 q1
4 AEP GENERATING CO 0.850960 q4
5 AEP TEXAS CENTRAL CO 0.600301 q3
6 AIR T INC 0.254511 q2
7 ALABAMA GAS CORP 0.611631 q3
8 ALABAMA POWER CO 0.913056 q4
9 ALLEGIANT TRAVEL CO 0.227421 q2
10 COMCAST CORP 0.012037 q1
11 HAWAIIAN ELECTRIC CO 0.670980 q4
12 HAWAIIAN ELECTRIC INDS 0.775778 q4
+3
source to share
1 answer
You need qcut
:
df['quartile'] = pd.qcut(df['rate'], 4, ['q1','q2','q3','q4'])
print (df)
name rate quartile
0 3POWER ENERGY GROUP INC -0.000000 q1
1 808 RENEWABLE ENERGY CORP -0.112192 q1
2 YORK WATER CO 0.774955 q3
3 ZTO EXPRESS (CAYM) INC -ADR 0.086352 q1
4 AEP GENERATING CO 0.850960 q4
5 AEP TEXAS CENTRAL CO 0.600301 q2
6 AIR T INC 0.254511 q2
7 ALABAMA GAS CORP 0.611631 q3
8 ALABAMA POWER CO 0.913056 q4
9 ALLEGIANT TRAVEL CO 0.227421 q2
10 COMCAST CORP 0.012037 q1
11 HAWAIIAN ELECTRIC CO 0.670980 q3
12 HAWAIIAN ELECTRIC INDS 0.775778 q4
+4
source to share