Splitting histograms in Pandas

Question

Splitting histograms in Pandas

I am reading a csv file through pandas and making simple bar charts like this:

df = pd.read_csv(sys.argv[1],header=0)
hFare = df['Fare'].dropna().hist(bins=[0,10,20,30,45,60,75,100,600],label = "All")
hSurFare = df[df.Survived==1]['Fare'].dropna().hist(bins=[0,10,20,30,45,60,75,100,600],label="Survivors")

What I would like is to have a bin to bin ratio of two histograms. Is there an easy way to do this?

+3

python numpy pandas

jagartner 12 Aug 14 at 23:36

source to share

1 answer

Marius · Accepted Answer · 2014-08-13T00:09:18+0000

First, we'll create some sample data. In the future, if you ask a question about pandas, your best bet is to include example data that people can easily copy-paste into their Python console:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Fare': np.random.uniform(0, 600, 400), 
                   'Survived': np.random.randint(0, 2, 400)})

Then use pd.cut

to flatten the data just like you did in your histogram:

df['fare_bin'] = pd.cut(df['Fare'], bins=[0,10,20,30,45,60,75,100,600])

Look at the total and number of survivors in each bunker (you can probably do this as separate columns, but I just do it quickly):

df.groupby('fare_bin').apply(lambda g: (g.shape[0], g.loc[g['Survived'] == 1, :].shape[0]))

Out[34]: 
fare_bin
(0, 10]           (7, 4)
(10, 20]          (9, 6)
(100, 600]    (326, 156)
(20, 30]          (5, 4)
(30, 45]         (12, 6)
(45, 60]        (15, 11)
(60, 75]         (13, 7)
(75, 100]        (13, 6)
dtype: object

Then write a quick function to get the ratio:

def get_ratio(g):
    try:
        return float(g.shape[0]) / g.loc[g['Survived'] == 1, :].shape[0]
    except ZeroDivisionError:
        return np.nan
df.groupby('fare_bin').apply(get_ratio)

Out[30]: 
fare_bin
(0, 10]       1.750000
(10, 20]      1.500000
(100, 600]    2.089744
(20, 30]      1.250000
(30, 45]      2.000000
(45, 60]      1.363636
(60, 75]      1.857143
(75, 100]     2.166667
dtype: float64

Splitting histograms in Pandas

More articles: