Aggregate values ​​for 2 variables

I have a dataframe that looks something like this.

AgeBracket    No of People     No of Jobs
18-25               2               5
18-25               2               2
26-34               4               6
35-44               4               0
26-34               2               3 
35-44               1               7
45-54               3               2

      

From this I want to combine the data so that it looks like this:

AgeBracket     1Person    2People    3People    4People
18-25             0          3.5        0          0
26-34             0           3         0          6
35-44             7           0         0          0
45-54             0           0         2          0

      

So along the Y-axis is the age bracket and along the X (top row) the number of people, while the cells show the average number of jobs for that age group and the number of people.

I am guessing it has something to do with aggregation, but cannot find anything like this on any site.

+3


source to share


2 answers


Assuming df is your data.frame, you can aggregate with an average function using BaseR , but I think the way is data.table

faster than Imo suggests:

agg <- aggregate(No.of.Jobs ~ AgeBracket + No.of.People,data=df,mean)
fin <- reshape2::dcast(agg,AgeBracket ~ No.of.People)
fin[is.na(fin)] <- 0
names(fin) <- c("AgeBracket",paste0("People",1:4))

      

As suggested by @Imo the one-liner could be as follows:

reshape2::dcast(df, AgeBracket ~ No.of.People, value.var="No.of.Jobs", fun.aggregate=mean, fill=0)

      



we just need to rename the columns after that.

Output:

 AgeBracket People1 People2 People3 People4
1      18-25       0     3.5       0       0
2      26-34       0     3.0       0       6
3      35-44       7     0.0       0       0
4      45-54       0     0.0       2       0

      

+3


source


Here is the data.table method using dcast

.

library(data.table)

setnames(dcast(df, AgeBracket ~ People, value.var="Jobs", fun.aggregate=mean, fill=0),
         c("AgeBracket", paste0(sort(unique(df$People)), "Person")))[]

      

Here dcast

zooms in by putting people in separate variables. fun.aggregate is used to calculate the average number of jobs in the ageBracket-person cells. fill is set to 0.

setnames

used to rename variables since the default is an integer. and []

at the end is used to print the result.



   AgeBracket 1Person 2Person 3Person 4Person
1:      18-25       0     3.5       0       0
2:      26-34       0     3.0       0       6
3:      35-44       7     0.0       0       0
4:      45-54       0     0.0       2       0

      

This can be stretched over two lines, which is probably more readable.

# reshape wide and calculate means
df.wide <- dcast(df, AgeBracket ~ People, value.var="Jobs", fun.aggregate=mean, fill=0)
# rename variables
setnames(df.wide, c("AgeBracket", paste0(names(df.wide)[-1], "Person")))

      

+4


source







All Articles