Aggregate values ββfor 2 variables
I have a dataframe that looks something like this.
AgeBracket No of People No of Jobs
18-25 2 5
18-25 2 2
26-34 4 6
35-44 4 0
26-34 2 3
35-44 1 7
45-54 3 2
From this I want to combine the data so that it looks like this:
AgeBracket 1Person 2People 3People 4People
18-25 0 3.5 0 0
26-34 0 3 0 6
35-44 7 0 0 0
45-54 0 0 2 0
So along the Y-axis is the age bracket and along the X (top row) the number of people, while the cells show the average number of jobs for that age group and the number of people.
I am guessing it has something to do with aggregation, but cannot find anything like this on any site.
source to share
Assuming df is your data.frame, you can aggregate with an average function using BaseR , but I think the way is data.table
faster than Imo suggests:
agg <- aggregate(No.of.Jobs ~ AgeBracket + No.of.People,data=df,mean)
fin <- reshape2::dcast(agg,AgeBracket ~ No.of.People)
fin[is.na(fin)] <- 0
names(fin) <- c("AgeBracket",paste0("People",1:4))
As suggested by @Imo the one-liner could be as follows:
reshape2::dcast(df, AgeBracket ~ No.of.People, value.var="No.of.Jobs", fun.aggregate=mean, fill=0)
we just need to rename the columns after that.
Output:
AgeBracket People1 People2 People3 People4
1 18-25 0 3.5 0 0
2 26-34 0 3.0 0 6
3 35-44 7 0.0 0 0
4 45-54 0 0.0 2 0
source to share
Here is the data.table method using dcast
.
library(data.table)
setnames(dcast(df, AgeBracket ~ People, value.var="Jobs", fun.aggregate=mean, fill=0),
c("AgeBracket", paste0(sort(unique(df$People)), "Person")))[]
Here dcast
zooms in by putting people in separate variables. fun.aggregate is used to calculate the average number of jobs in the ageBracket-person cells. fill is set to 0.
setnames
used to rename variables since the default is an integer. and []
at the end is used to print the result.
AgeBracket 1Person 2Person 3Person 4Person
1: 18-25 0 3.5 0 0
2: 26-34 0 3.0 0 6
3: 35-44 7 0.0 0 0
4: 45-54 0 0.0 2 0
This can be stretched over two lines, which is probably more readable.
# reshape wide and calculate means
df.wide <- dcast(df, AgeBracket ~ People, value.var="Jobs", fun.aggregate=mean, fill=0)
# rename variables
setnames(df.wide, c("AgeBracket", paste0(names(df.wide)[-1], "Person")))
source to share