Using aggregate with variable names for column names
I have the following dataframe:
a <- c(1,1,4)
b <- c(1,0,2)
c <- data.frame(a=a,b=b)
str(c)
# a b
#1 1 1
#2 1 0
#3 4 2
I would like to aggregate a data frame c like this:
aggregate(b~a,FUN=mean,data=c)
# a b
#1 1 0.5
#2 4 2.0
However my main problem is that I will be using a variable for the column name
So:
d <- 'a'
If I try to compile this d variable that contains the column name, I obviously get the error:
aggregate(b~d,FUN=mean,data=c)
#Error in model.frame.default(formula = b ~ d, data = c) : variable lengths differ (found for 'd')
This works, but then I get silly column names. I would like to avoid the extra step of renaming columns:
aggregate(c[,'b']~c[,d],FUN=mean,data=c)
# c[, d] c[, "b"]
#1 1 0.5
#2 4 2.0
How do I fill in and also get the correct column names on the first try? (There may be no way to do this)
You may try
aggregate(c['b'], c[d], FUN=mean)
# a b
# 1 1 0.5
# 2 4 2.0
Another option if you are using a method formula
would be usingsetNames
setNames(aggregate(b~get(d), FUN=mean, data=c), colnames(c))
# a b
#1 1 0.5
#2 4 2.0
If you are not tied to aggregate(...)
in R base, here is a data.table solution.
library(data.table)
setDT(c)[,list(b=mean(b)),by=d,with=TRUE]
# a b
# 1: 1 0.5
# 2: 4 2.0
You can use cbind
to set names to aggregate
. This method also shows that you can leave an argument data
. Therefore, if we are using your original plan, you can do
aggregate(cbind(b = c[, "b"]) ~ cbind(a = c[, "a"]), FUN = mean)
# a b
# 1 1 0.5
# 2 4 2.0
The way I solved it was to plot the formula parameter in paste:
aggregate(formula(paste0("b ~ ", d)), data = c, FUN = mean)
This way, you can easily pass as many variables for colnames as complex formulas as desired.