Pandas pivot_table group column by values
I am trying to use numeric values ββas columns on a Pandas pivot_table. The problem is, since each number is mostly unique, the resulting pivot_table is not very useful as a way to aggregate my data.
Here's what I have so far (fake data example):
import pandas as pd
df = pd.DataFrame({'Country': ['US', 'Brazil', 'France', 'Germany'],
'Continent': ['Americas', 'Americas', 'Europe', 'Europe'],
'Population': [321, 207, 80, 66]})
pd.pivot_table(df, index='Continent', columns='Population', aggfunc='count')
Here is a picture of the resulting pivot_table .
How can I group my values ββinto ranges based on my columns?
In other words, how can I count all countries with populations of ... <100, 100-200,> 300?
+3
source to share
1 answer
Use pd.cut:
df = df.assign(PopGroup=pd.cut(df.Population,bins=[0,100,200,300,np.inf],labels=['<100','100-200','200-300','>300']))
Output:
Continent Country Population PopGroup
0 Americas US 321 >300
1 Americas Brazil 207 200-300
2 Europe France 80 <100
3 Europe Germany 66 <100
pd.pivot_table(df, index='Continent', columns='PopGroup',values=['Country'], aggfunc='count')
Output:
Country
PopGroup 200-300 <100 >300
Continent
Americas 1.0 NaN 1.0
Europe NaN 2.0 NaN
+4
source to share