Removing a buttonhole in the foot

I have a loop that I would like to get rid of, I just cannot figure out how to do it. Let's say I have a dataframe:

tmp = data.frame(Gender = rep(c("Male", "Female"), each = 6), 
                 Ethnicity = rep(c("White", "Asian", "Other"), 4),
                 Score = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

      

Then I want to calculate the average for each level in the Gender and Ethnicity columns, which will give:

$Female
[1] 9.5

$Male
[1] 3.5

$Asian
[1] 6.5

$Other
[1] 7.5

$White
[1] 5.5

      

It's easy enough to do, but I don't want to use loops - I'm going for speed. So I currently have the following:

for(i in c("Gender", "Ethnicity"))
    print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))

      

Obviously this is using a loop and is where I am stuck.

There might be a function that already does things that I don't know about. I looked at the aggregate, but I don't think what I want.

+3


source to share


6 answers


You can nest application functions.



sapply(c("Gender", "Ethnicity"),
       function(i) {
         print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))
       })

      

+2


source


You can sapply()

on names

from tmp

, with the exception Score

, and then use by()

(or aggregate()

):

> sapply(setdiff(names(tmp),"Score"),function(xx)by(tmp$Score,tmp[,xx],mean))
$Gender
tmp[, xx]: Female
[1] 9.5
------------------------------------------------------------ 
tmp[, xx]: Male
[1] 3.5

$Ethnicity
tmp[, xx]: Asian
[1] 6.5
------------------------------------------------------------ 
tmp[, xx]: Other
[1] 7.5
------------------------------------------------------------ 
tmp[, xx]: White
[1] 5.5

      



However, this internally uses a loop, so it won't speed up ...

+3


source


Using dplyr

 library(dplyr)
 library(tidyr)
 tmp[,1:2] <- lapply(tmp[,1:2], as.character)
 tmp %>% 
     gather(Var1, Var2, Gender:Ethnicity) %>%
     unite(Var, Var1, Var2) %>% 
     group_by(Var) %>% 
     summarise(Score=mean(Score))

  #              Var Score
  #1 Ethnicity_Asian   6.5
  #2 Ethnicity_Other   7.5
  #3 Ethnicity_White   5.5
  #4   Gender_Female   9.5
  #5     Gender_Male   3.5

      

+2


source


You can use the code:

c(tapply(tmp$Score,tmp$Gender,mean),tapply(tmp$Score,tmp$Ethnicity,mean))

      

+2


source


Try reshape2 package.

require(reshape2)

#demo
melted<-melt(tmp)
casted.gender<-dcast(melted,Gender~variable,mean) #for mean of each gender
casted.eth<-dcast(melted,Ethnicity~variable,mean) #for mean of each ethnicity

#now, combining to do for all variables at once
variables<-colnames(tmp)[-length(colnames(tmp))]

casting<-function(var.name){
    return(dcast(melted,melted[,var.name]~melted$variable,mean))
}

lapply(variables, FUN=casting)

      

output:

[[1]]
  melted[, var.name] Score
1             Female   9.5
2               Male   3.5

[[2]]
  melted[, var.name] Score
1              Asian   6.5
2              Other   7.5
3              White   5.5

      

+1


source


You should probably reconsider the output you are generating. A list containing all ethnic and gender variables together is probably not the best way to graph, analyze, or present your data. Your best bet is to break down and write two lines of code instead, using perhapstapply

tapply(tmp$Score, tmp$Gender, mean)
tapply(tmp$Score, tmp$Ethnicity, mean)

      

or aggregate

aggregate(Score ~ Gender, tmp, mean)
aggregate(Score ~ Ethnicity, tmp, mean)

      

And then you might want to take a look at your interactions, even if you suggested that the aggregate doesn't do what you really want.

with(tmp, tapply(Score, list(Gender, Ethnicity), mean))
aggregate(Score ~ Gender + Ethnicity, tmp, mean)

      

This not only improves the separation and presentation of the fundamental ideas represented by the variables, but your R commands are more expressive and reflect the intent in the data to separately encode those variables.

If your real task is to navigate to a series of variables, any of them can be put in a loop, but I would assume that you still want the result not to be a single list, but as a list of vectors or data.frames.

0


source







All Articles