Using aggregate with variable names for column names

Question

Using aggregate with variable names for column names

I have the following dataframe:

a <- c(1,1,4)
b <- c(1,0,2)
c <- data.frame(a=a,b=b)
str(c)
#  a  b
#1 1  1
#2 1  0
#3 4  2

I would like to aggregate a data frame c like this:

aggregate(b~a,FUN=mean,data=c)
#  a   b
#1 1 0.5
#2 4 2.0

However my main problem is that I will be using a variable for the column name

So:

d <- 'a'

If I try to compile this d variable that contains the column name, I obviously get the error:

aggregate(b~d,FUN=mean,data=c)
#Error in model.frame.default(formula = b ~ d, data = c) : variable lengths differ (found for 'd')

This works, but then I get silly column names. I would like to avoid the extra step of renaming columns:

aggregate(c[,'b']~c[,d],FUN=mean,data=c)
#  c[, d] c[, "b"]
#1    1      0.5
#2    4      2.0

How do I fill in and also get the correct column names on the first try? (There may be no way to do this)

+3

r aggregate dataframe

Michal 01 dec. 14 at 18:47

source to share

4 answers

If you are not tied to aggregate(...)

in R base, here is a data.table solution.

library(data.table)
setDT(c)[,list(b=mean(b)),by=d,with=TRUE]
#    a   b
# 1: 1 0.5
# 2: 4 2.0

+3

jlhoward 01 dec. 14 at 20:47

source to share

You can use cbind

to set names to aggregate

. This method also shows that you can leave an argument data

. Therefore, if we are using your original plan, you can do

aggregate(cbind(b = c[, "b"]) ~ cbind(a = c[, "a"]), FUN = mean)
#   a   b
# 1 1 0.5
# 2 4 2.0

+1

Rich scriven 01 dec. 14 at 19:27

source to share

The way I solved it was to plot the formula parameter in paste:

aggregate(formula(paste0("b ~ ", d)), data = c, FUN = mean)

This way, you can easily pass as many variables for colnames as complex formulas as desired.

0

Serenthia May 23 '17 at 15:53

source to share

akrun · Accepted Answer · 2014-12-01T19:10:24+0000

You may try

aggregate(c['b'], c[d], FUN=mean)
#   a   b
# 1 1 0.5
# 2 4 2.0

Another option if you are using a method formula

would be usingsetNames

 setNames(aggregate(b~get(d), FUN=mean, data=c), colnames(c))
 #  a   b
 #1 1 0.5
 #2 4 2.0

Using aggregate with variable names for column names

More articles: