Apply the function to all lines except the current one (dplyr)

I am learning the dplyr

R package and still love it very much. One thing I need to do is create a new column whose value for each row will be the result of applying the function to all rows except the current one (possibly a subset by group), but I can't figure out how.

A contrived example would be

library(datasets)
library(dplyr)

data(mtcars)
x <- mtcars %>% mutate(name=rownames(mtcars)) %>% filter(cyl==4) %>% select(name,cyl,mpg)

# This is what I want to do more elegantly
x$othermpg <- NA
for (i in 1:nrow(x))
    x$othermpg[i] <- mean(x$mpg[-i])

      

Here, the column othermpg

gets the average of mpg

all cars except the one in the current row.

Can anyone help please?

+3


source to share


2 answers


This can be done in data.table

 library(data.table)
 setDT(x)[, N:= 1:.N][, othermpg2:=mean(x[N!= .BY, mpg]), by=N][,N:=NULL]
  x
  #             name cyl  mpg othermpg othermpg2
  #1:     Datsun 710   4 22.8    27.05     27.05
  #2:      Merc 240D   4 24.4    26.89     26.89
  #3:       Merc 230   4 22.8    27.05     27.05
  #4:       Fiat 128   4 32.4    26.09     26.09
  #5:    Honda Civic   4 30.4    26.29     26.29
  #6: Toyota Corolla   4 33.9    25.94     25.94
  #7:  Toyota Corona   4 21.5    27.18     27.18
  #8:      Fiat X1-9   4 27.3    26.60     26.60
  #9:  Porsche 914-2   4 26.0    26.73     26.73
 #10:   Lotus Europa   4 30.4    26.29     26.29
 #11:     Volvo 142E   4 21.4    27.19     27.19

      

  • The idea was to create a row / column index first N:=1:.N

  • Group based on this index by=N]

  • x[N!= .BY, mpg]

    gives strings mpg

    that are not equal to the grouping variable
  • do mean
  • N:=NULL

    Since N is not required, this column is discarded.

Or you could try (Inspired from @ thelatemail's answer)

 setDT(x)[, N:=1:.N]
 setkey(x, N)
 x[,othermpg2 := mean(x[!.BY, mpg]), by=N][,N:=NULL]

      



Or without creating N

(from @Jon Clayden comments)

 setDT(x)[, othermpg2:=mean(x[name!=.BY,mpg]), by=name]

      

Using dplyr

this works

x %>% 
mutate(N=1:n()) %>% 
do( data.frame(.,othermpg2=sapply(.$N, function(i) mean(.$mpg[!.$N %in% i]))))
  #             name cyl  mpg othermpg  N othermpg2
  #1      Datsun 710   4 22.8    27.05  1     27.05
  #2       Merc 240D   4 24.4    26.89  2     26.89
  #3        Merc 230   4 22.8    27.05  3     27.05
  #4        Fiat 128   4 32.4    26.09  4     26.09
  #5     Honda Civic   4 30.4    26.29  5     26.29
  #6  Toyota Corolla   4 33.9    25.94  6     25.94
  #7   Toyota Corona   4 21.5    27.18  7     27.18
  #8       Fiat X1-9   4 27.3    26.60  8     26.60
  #9   Porsche 914-2   4 26.0    26.73  9     26.73
  #10   Lotus Europa   4 30.4    26.29 10     26.29
  #11     Volvo 142E   4 21.4    27.19 11     27.19

      

+2


source


In the sample example, you can use the following code to avoid the for loop. Basically, instead of excluding the "current" line, you simply subtract it from the total:

library(dplyr)

x %>% mutate(othermpg2 = (sum(mpg)-mpg) / (length(mpg) -1 ))

#             name cyl  mpg othermpg othermpg2
#1      Datsun 710   4 22.8    27.05     27.05
#2       Merc 240D   4 24.4    26.89     26.89
#3        Merc 230   4 22.8    27.05     27.05
#4        Fiat 128   4 32.4    26.09     26.09
#5     Honda Civic   4 30.4    26.29     26.29
#6  Toyota Corolla   4 33.9    25.94     25.94
#7   Toyota Corona   4 21.5    27.18     27.18
#8       Fiat X1-9   4 27.3    26.60     26.60
#9   Porsche 914-2   4 26.0    26.73     26.73
#10   Lotus Europa   4 30.4    26.29     26.29
#11     Volvo 142E   4 21.4    27.19     27.19

      



If you feel this cannot be applied to your real data, edit your question with a more representative example.

+1


source







All Articles