Recursively manipulate employee and manager data to create an organization tree hierarchy in R
I usually analyze data in an "organizational tree" format to understand the frequency of actions under a particular leader in an organization. I need to create a wide hierarchy with two columns of data: employee name and manager name.
----------
df <- data.frame("Employee"=c("Bill","James","Amy","Jen","Henry"),
"Supervisor"=c("Jen","Jen","Steve","Amy","Amy"))
df
# Employee Supervisor
# 1 Bill Jen
# 2 James Jen
# 3 Amy Steve
# 4 Jen Amy
# 5 Henry Amy
End the wide data frame that defines the organization chart, starting with the CEO (or tallest employee):
# Employee H1 H2 H3
# 1 Bill Steve Amy Jen
# 2 James Steve Amy Jen
# 3 Amy Steve NA NA
# 4 Jen Steve Amy NA
# 5 Henry Steve Amy NA
After much research, the package data.tree
offers the most help. How can I accomplish this operation?
source to share
Try the following:
library(data.table)
setDT(df)
setnames(df, 'Supervisor', 'Supervisor.1')
j=1
while (df[, any(get(paste0('Supervisor.',j)) %in% Employee)]) {
df[df, on=paste0('Supervisor.',j,'==Employee'),
paste0('Supervisor.',j+1):= i.Supervisor.1]
j = j + 1
}
> df
# Employee Supervisor.1 Supervisor.2 Supervisor.3
# 1: Bill Jen Amy Steve
# 2: James Jen Amy Steve
# 3: Amy Steve NA NA
# 4: Jen Amy Steve NA
# 5: Henry Amy Steve NA
To change the order in lines:
df = cbind(df[, 1], t(apply(df[, -1], 1, function(r) c(rev(r[!is.na(r)]), r[is.na(r)]))))
> df
# Employee V1 V2 V3
# 1: Bill Steve Amy Jen
# 2: James Steve Amy Jen
# 3: Amy Steve NA NA
# 4: Jen Steve Amy NA
# 5: Henry Steve Amy NA
source to share
If you don't insist on exiting, but want to work with hierarchy, then data.tree is a great choice. Here are some examples:
libary(data.tree)
df <- data.frame("Employee"=c("Bill","James","Amy","Jen","Henry"),
"Supervisor"=c("Jen","Jen","Steve","Amy","Amy"))
dt <- FromDataFrameNetwork(df)
#here your org chart:
print(dt)
Let's find Jennas' subordinates along with their level in the hierarchy:
Get(FindNode(dt, 'Jen')$leaves, 'level')
It will return like this:
Bill James
4 4
Just for fun, add a staff budget:
dt$Set(salary = c(100000, 80000, 60000, 40000, 35000, 70000))
Salary and total salary stamp
print(dt, 'salary', sal_subordinates = function(node) Aggregate(node, 'salary', sum))
It will print like this:
levelName salary sal_subordinates
1 Steve 100000 80000
2 Β°--Amy 80000 130000
3 Β¦--Jen 60000 75000
4 Β¦ Β¦--Bill 40000 40000
5 Β¦ Β°--James 35000 35000
6 Β°--Henry 70000 70000
The data.tree vignettes file contains many more examples of working with hierarchical data and aggregation.
source to share