How to apply multicore when using sapply?
R 3.1.2
library(RcppRoll)
my data.frame
y=
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 1 2 3 4 5 6 7 8 9
2 16 17 18 19 20 21 22 23 24
3 31 32 33 34 35 36 37 38 NA
4 46 47 48 49 50 51 52 53 54
my function:
sapply(y, RcppRoll::roll_mean, n = 3, na.rm = T)
I have no problem and it works fine, but very slow when using my huge data. I wonder how we speed up performance sapply
using multiple cores, or even use for loop instead?
@Khashaa Yes, I tried and faster, but I have a problem with the output:
output:
>
[,1] [,2] [,3]
[1,] 16 17 18
this is the cause of the problem for the rest of my code, so I want to change like:
V1 V2 V3
[1,] 16 17 18
ANY idea about this?
source to share
For this particular example, you don't need sapply
. Just roll_mean(as.matrix(y), 3, na.rm=T)
enough
y <- runif(1e7)
dim(y) <- c(1e3, 1e4)
y <- data.frame(y)
system.time(sapply(y, RcppRoll::roll_mean, n = 3, na.rm = T))
# user system elapsed
# 14.120 0.451 18.960
system.time(RcppRoll::roll_mean(as.matrix(y), 3, na.rm=T))
# user system elapsed
# 0.329 0.000 0.329
# About 60x times faster
The only difference from the result sapply
is colnames
that you can change as follows
res <- RcppRoll::roll_mean(as.matrix(y), 3, na.rm=T)
colnames(res) <- colnames(y)
res
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#[1,] 16 17 18 19 20 21 22 23 16.5
#[2,] 31 32 33 34 35 36 37 38 39.0
source to share