Pandas: How to calculate employee turnover?
I want to calculate the turnover rate of a group of people using Pandas. The group size can change, but I want to know the percentage of people who come out each year.
Better explain with an example. Here are my details:
teachers year
0 John 2007
1 Paul 2007
2 Mary 2007
3 John 2008
4 Paul 2008
5 Abel 2008
6 Watt 2008
7 John 2009
8 Mary 2009
I want to arrive at this dataset:
year turnover
2008 .33333
2009 .75
In the first year, Maria left, in the second year, Paul, Abel and Watt left. He will have some kind of bias: if the group shrinks, the turnover rate will be higher.
source to share
Plan
- I am going to set the index with
'year'
and'teachers'
therefore I amassign
dummyx=1
ahead of time. - I want to have
'year'
both my index, so Iunstack
put'teachers'
in columns. I am using a parameterfill_value=0
to fill in the zeros where the teachers have not been for a certain year. - Using
diff
and checking if the event ID is -1.sum(1)
summarizes all current events. -
d1.sum(1).shift()
counts all teachers for the previous year. - divide to get fluidity.
d1 = pd.Series(1, [df.year, df.teachers]).unstack(fill_value=0)
d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()
year
2008 0.333333
2009 0.750000
dtype: float64
As @jrjc pointed out in the comments, my first line is this crosstab
. with that in mind, we can reduce the code to:
d1 = pd.crosstab(df.year, df.teachers)
d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()
one line using pipe
pd.crosstab(df.year, df.teachers).pipe(
lambda c: c.diff().eq(-1).sum(1).div(c.sum(1).shift(),0).dropna()
)
source to share
There is a possibility here:
from io import StringIO
import numpy as np
import pandas as pd
data = pd.read_table(StringIO(
""" teachers year
0 John 2007
1 Paul 2007
2 Mary 2007
3 John 2008
4 Paul 2008
5 Abel 2008
6 Watt 2008
7 John 2009
8 Mary 2009"""
), delim_whitespace=True, index_col=0)
data['presence'] = 1
teacher_presence = data.groupby(['teachers', 'year']).count().unstack(1).fillna(0)
teacher_presence.columns = teacher_presence.columns.droplevel(0)
teacher_remain = teacher_presence.iloc[:, 1:] * teacher_presence.iloc[:, :-1].values
turnover = 1 - teacher_remain.sum() / teacher_presence.iloc[:, :-1].sum().values
turnover.name = 'turnover'
print(turnover)
Result:
year
2008 0.333333
2009 0.750000
Name: turnover, dtype: float64
source to share
You can also convert teachers to set
after grouping and then perform recruiting operations.
In [72]: t = df.groupby('year')['teachers'].apply(lambda x: set(x.values.tolist()))
In [73]: t
Out[73]:
year
2007 {John, Paul, Mary}
2008 {John, Abel, Paul, Watt}
2009 {John, Mary}
Name: teachers, dtype: object
In [76]: t.combine(t.shift(), lambda a, b: len(b-a) / len(b) if isinstance(b, set) else np.nan).dropna()
Out[76]:
year
2008 0.333333
2009 0.75
Name: teachers, dtype: object
source to share