Removing a buttonhole in the foot

Question

Removing a buttonhole in the foot

I have a loop that I would like to get rid of, I just cannot figure out how to do it. Let's say I have a dataframe:

tmp = data.frame(Gender = rep(c("Male", "Female"), each = 6), 
                 Ethnicity = rep(c("White", "Asian", "Other"), 4),
                 Score = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))

Then I want to calculate the average for each level in the Gender and Ethnicity columns, which will give:

$Female
[1] 9.5

$Male
[1] 3.5

$Asian
[1] 6.5

$Other
[1] 7.5

$White
[1] 5.5

It's easy enough to do, but I don't want to use loops - I'm going for speed. So I currently have the following:

for(i in c("Gender", "Ethnicity"))
    print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))

Obviously this is using a loop and is where I am stuck.

There might be a function that already does things that I don't know about. I looked at the aggregate, but I don't think what I want.

+3

r

nathaneastwood 24 Sep 14 at 14:50

source to share

6 answers

You can sapply()

on names

from tmp

, with the exception Score

, and then use by()

(or aggregate()

):

> sapply(setdiff(names(tmp),"Score"),function(xx)by(tmp$Score,tmp[,xx],mean))
$Gender
tmp[, xx]: Female
[1] 9.5
------------------------------------------------------------ 
tmp[, xx]: Male
[1] 3.5

$Ethnicity
tmp[, xx]: Asian
[1] 6.5
------------------------------------------------------------ 
tmp[, xx]: Other
[1] 7.5
------------------------------------------------------------ 
tmp[, xx]: White
[1] 5.5

However, this internally uses a loop, so it won't speed up ...

+3

Stephan Kolassa 24 Sep 14 at 14:57

source to share

Using dplyr

 library(dplyr)
 library(tidyr)
 tmp[,1:2] <- lapply(tmp[,1:2], as.character)
 tmp %>% 
     gather(Var1, Var2, Gender:Ethnicity) %>%
     unite(Var, Var1, Var2) %>% 
     group_by(Var) %>% 
     summarise(Score=mean(Score))

  #              Var Score
  #1 Ethnicity_Asian   6.5
  #2 Ethnicity_Other   7.5
  #3 Ethnicity_White   5.5
  #4   Gender_Female   9.5
  #5     Gender_Male   3.5

+2

akrun 24 Sep 14 at 14:57

source to share

You can use the code:

c(tapply(tmp$Score,tmp$Gender,mean),tapply(tmp$Score,tmp$Ethnicity,mean))

+2

anonR 24 Sep 14 at 14:59

source to share

Try reshape2 package.

require(reshape2)

#demo
melted<-melt(tmp)
casted.gender<-dcast(melted,Gender~variable,mean) #for mean of each gender
casted.eth<-dcast(melted,Ethnicity~variable,mean) #for mean of each ethnicity

#now, combining to do for all variables at once
variables<-colnames(tmp)[-length(colnames(tmp))]

casting<-function(var.name){
    return(dcast(melted,melted[,var.name]~melted$variable,mean))
}

lapply(variables, FUN=casting)

output:

[[1]]
  melted[, var.name] Score
1             Female   9.5
2               Male   3.5

[[2]]
  melted[, var.name] Score
1              Asian   6.5
2              Other   7.5
3              White   5.5

+1

tohweizhong 24 Sep 14 at 15:25

source to share

You should probably reconsider the output you are generating. A list containing all ethnic and gender variables together is probably not the best way to graph, analyze, or present your data. Your best bet is to break down and write two lines of code instead, using perhapstapply

tapply(tmp$Score, tmp$Gender, mean)
tapply(tmp$Score, tmp$Ethnicity, mean)

or aggregate

aggregate(Score ~ Gender, tmp, mean)
aggregate(Score ~ Ethnicity, tmp, mean)

And then you might want to take a look at your interactions, even if you suggested that the aggregate doesn't do what you really want.

with(tmp, tapply(Score, list(Gender, Ethnicity), mean))
aggregate(Score ~ Gender + Ethnicity, tmp, mean)

This not only improves the separation and presentation of the fundamental ideas represented by the variables, but your R commands are more expressive and reflect the intent in the data to separately encode those variables.

If your real task is to navigate to a series of variables, any of them can be put in a loop, but I would assume that you still want the result not to be a single list, but as a list of vectors or data.frames.

0

John 24 Sep 14 at 19:35

source to share

arvi1000 · Accepted Answer · 2014-09-24T14:58:03+0000

You can nest application functions.

sapply(c("Gender", "Ethnicity"),
       function(i) {
         print(lapply(split(tmp$Score, tmp[, i]), function(x) mean(x)))
       })

Removing a buttonhole in the foot

More articles: