Pandas: How to calculate employee turnover?

Question

Pandas: How to calculate employee turnover?

I want to calculate the turnover rate of a group of people using Pandas. The group size can change, but I want to know the percentage of people who come out each year.

Better explain with an example. Here are my details:

  teachers  year
0     John  2007
1     Paul  2007
2     Mary  2007

3     John  2008
4     Paul  2008
5     Abel  2008
6     Watt  2008

7     John  2009
8     Mary  2009

I want to arrive at this dataset:

year turnover 
2008 .33333 
2009 .75

In the first year, Maria left, in the second year, Paul, Abel and Watt left. He will have some kind of bias: if the group shrinks, the turnover rate will be higher.

+3

numpy pandas

neves Apr 07 17 at 10:56

source to share

3 answers

There is a possibility here:

from io import StringIO

import numpy as np
import pandas as pd

data = pd.read_table(StringIO(
"""  teachers  year
0     John  2007
1     Paul  2007
2     Mary  2007
3     John  2008
4     Paul  2008
5     Abel  2008
6     Watt  2008
7     John  2009
8     Mary  2009"""
), delim_whitespace=True, index_col=0)

data['presence'] = 1
teacher_presence = data.groupby(['teachers', 'year']).count().unstack(1).fillna(0)
teacher_presence.columns = teacher_presence.columns.droplevel(0)

teacher_remain = teacher_presence.iloc[:, 1:] * teacher_presence.iloc[:, :-1].values
turnover = 1 - teacher_remain.sum() / teacher_presence.iloc[:, :-1].sum().values
turnover.name = 'turnover'

print(turnover)

Result:

year
2008    0.333333
2009    0.750000
Name: turnover, dtype: float64

+1

jdehesa Apr 07 17 at 12:16

source to share

You can also convert teachers to set

after grouping and then perform recruiting operations.

In [72]: t = df.groupby('year')['teachers'].apply(lambda x: set(x.values.tolist()))

In [73]: t
Out[73]:
year
2007          {John, Paul, Mary}
2008    {John, Abel, Paul, Watt}
2009                {John, Mary}
Name: teachers, dtype: object

In [76]: t.combine(t.shift(), lambda a, b: len(b-a) / len(b) if isinstance(b, set) else np.nan).dropna()
Out[76]:
year
2008    0.333333
2009        0.75
Name: teachers, dtype: object

0

gzc Apr 07 17 at 15:58

source to share

piRSquared · Accepted Answer · 2017-04-07T14:11:53+0000

Plan

I am going to set the index with 'year'

and 'teachers'

therefore I am assign

dummy x=1

ahead of time.
I want to have 'year'

both my index, so I unstack

put 'teachers'

in columns. I am using a parameter fill_value=0

to fill in the zeros where the teachers have not been for a certain year.
Using diff

and checking if the event ID is -1. sum(1)

summarizes all current events.
d1.sum(1).shift()

counts all teachers for the previous year.
divide to get fluidity.

d1 = pd.Series(1, [df.year, df.teachers]).unstack(fill_value=0)
d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()

year
2008    0.333333
2009    0.750000
dtype: float64

As @jrjc pointed out in the comments, my first line is this crosstab

. with that in mind, we can reduce the code to:

d1 = pd.crosstab(df.year, df.teachers)
d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()

one line using pipe

pd.crosstab(df.year, df.teachers).pipe(
    lambda c: c.diff().eq(-1).sum(1).div(c.sum(1).shift(),0).dropna()
)

Pandas: How to calculate employee turnover?

More articles: