Pandas: How to calculate employee turnover?

I want to calculate the turnover rate of a group of people using Pandas. The group size can change, but I want to know the percentage of people who come out each year.

Better explain with an example. Here are my details:

  teachers  year
0     John  2007
1     Paul  2007
2     Mary  2007

3     John  2008
4     Paul  2008
5     Abel  2008
6     Watt  2008

7     John  2009
8     Mary  2009

      

I want to arrive at this dataset:

year turnover 
2008 .33333 
2009 .75

      

In the first year, Maria left, in the second year, Paul, Abel and Watt left. He will have some kind of bias: if the group shrinks, the turnover rate will be higher.

+3


source to share


3 answers


Plan

  • I am going to set the index with 'year'

    and 'teachers'

    therefore I am assign

    dummy x=1

    ahead of time.
  • I want to have 'year'

    both my index, so I unstack

    put 'teachers'

    in columns. I am using a parameter fill_value=0

    to fill in the zeros where the teachers have not been for a certain year.
  • Using diff

    and checking if the event ID is -1. sum(1)

    summarizes all current events.
  • d1.sum(1).shift()

    counts all teachers for the previous year.
  • divide to get fluidity.



d1 = pd.Series(1, [df.year, df.teachers]).unstack(fill_value=0)
d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()

year
2008    0.333333
2009    0.750000
dtype: float64

      




As @jrjc pointed out in the comments, my first line is this crosstab

. with that in mind, we can reduce the code to:

d1 = pd.crosstab(df.year, df.teachers)
d1.diff().eq(-1).sum(1).div(d1.sum(1).shift(), 0).dropna()

      




one line using pipe

pd.crosstab(df.year, df.teachers).pipe(
    lambda c: c.diff().eq(-1).sum(1).div(c.sum(1).shift(),0).dropna()
)

      

+3


source


There is a possibility here:

from io import StringIO

import numpy as np
import pandas as pd

data = pd.read_table(StringIO(
"""  teachers  year
0     John  2007
1     Paul  2007
2     Mary  2007
3     John  2008
4     Paul  2008
5     Abel  2008
6     Watt  2008
7     John  2009
8     Mary  2009"""
), delim_whitespace=True, index_col=0)

data['presence'] = 1
teacher_presence = data.groupby(['teachers', 'year']).count().unstack(1).fillna(0)
teacher_presence.columns = teacher_presence.columns.droplevel(0)

teacher_remain = teacher_presence.iloc[:, 1:] * teacher_presence.iloc[:, :-1].values
turnover = 1 - teacher_remain.sum() / teacher_presence.iloc[:, :-1].sum().values
turnover.name = 'turnover'

print(turnover)

      



Result:

year
2008    0.333333
2009    0.750000
Name: turnover, dtype: float64

      

+1


source


You can also convert teachers to set

after grouping and then perform recruiting operations.

In [72]: t = df.groupby('year')['teachers'].apply(lambda x: set(x.values.tolist()))

In [73]: t
Out[73]:
year
2007          {John, Paul, Mary}
2008    {John, Abel, Paul, Watt}
2009                {John, Mary}
Name: teachers, dtype: object

In [76]: t.combine(t.shift(), lambda a, b: len(b-a) / len(b) if isinstance(b, set) else np.nan).dropna()
Out[76]:
year
2008    0.333333
2009        0.75
Name: teachers, dtype: object

      

0


source







All Articles