How to apply multicore when using sapply?

R 3.1.2
library(RcppRoll)

      

my data.frame

y=
  V1 V2 V3 V4 V5 V6 V7 V8 V9 
1  1  2  3  4  5  6  7  8  9  
2 16 17 18 19 20 21 22 23 24 
3 31 32 33 34 35 36 37 38 NA  
4 46 47 48 49 50 51 52 53 54  

      

my function:

    sapply(y, RcppRoll::roll_mean, n = 3, na.rm = T)

      

I have no problem and it works fine, but very slow when using my huge data. I wonder how we speed up performance sapply

using multiple cores, or even use for loop instead?

@Khashaa Yes, I tried and faster, but I have a problem with the output:

output:

> 
      [,1] [,2] [,3] 
[1,]   16   17   18 

      

this is the cause of the problem for the rest of my code, so I want to change like:

       V1 V2 V3
[1,]   16 17 18

      

ANY idea about this?

+3


source to share


2 answers


For this particular example, you don't need sapply

. Just roll_mean(as.matrix(y), 3, na.rm=T)

enough

y <- runif(1e7) 
dim(y) <- c(1e3, 1e4)
y <- data.frame(y)
system.time(sapply(y, RcppRoll::roll_mean, n = 3, na.rm = T))
#   user  system elapsed 
# 14.120   0.451  18.960 
system.time(RcppRoll::roll_mean(as.matrix(y), 3, na.rm=T))
#   user  system elapsed 
#  0.329   0.000   0.329 
# About 60x times faster

      



The only difference from the result sapply

is colnames

that you can change as follows

res <- RcppRoll::roll_mean(as.matrix(y), 3, na.rm=T)
colnames(res) <- colnames(y)
res
#     V1 V2 V3 V4 V5 V6 V7 V8   V9
#[1,] 16 17 18 19 20 21 22 23 16.5
#[2,] 31 32 33 34 35 36 37 38 39.0

      

+3


source


This will work:

mclapply(y, roll_mean, n=3, na.rm=TRUE, mc.cores=detectCores())

      



or

laply(y, .fun=roll_mean, n=3, na.rm=TRUE, .parallel=TRUE)

      

+1


source







All Articles