DoParallel with its own generated functions
I created a sequential simulation in R to repeat the process 10,0000 times. It takes about 70 minutes, so I decided to try the same in parallel with the doParallel package.
My foreach loop calls inv.predict function which is not in any existing package. When I run the code, I get an error.
cl <- makeCluster(4)
registerDoParallel(cl)
ptm <- proc.time()
clupeaformis_cr <- foreach(i = 1:100, .packages = c("spider", "investr", "mgcv")) %dopar% {
clupeaformis_cr <- rep(NA, i)
clupeaformis_haplo_rand <- haploAccum(clupeaformis_aligned, method = "random", permutations = 1000)
N <- clupeaformis_haplo_rand$sequences
H <- clupeaformis_haplo_rand$n.haplotypes
d <- data.frame(N, H)
clupeaformis_cr <- gam(H ~ s(N, bs = "cr", k = 20), optimizer = c("outer", "bfgs"), data = d)
clupeaformis_cr[i] <- inv.predict(clupeaformis_cr, y = 21, x.name = "N", interval = TRUE,
lower = 1, upper = 1000000)
}
proc.time() - ptm
stopCluster(cl)
Error in { :
task 1 failed - "no applicable method for 'inv.predict' applied to an object of class "c('gam', 'glm', 'lm')""
I'm not sure why this does not work in parallel, but in a regular loop. This question is related to another that I posted yesterday.
Any help is greatly appreciated.
+3
source to share
1 answer
I found a solution for this.
The general solution is this:
cl <- makeCluster(4) # set number of cluster/cores for parallelization
registerDoParallel(cl) # register clusters
ptm <- proc.time() # start timer
my_function <- foreach(i = 1:1000, .packages = c(...)) %dopar% { # foreach loop
my_vec <- rep(NA, i) # preallocate results to a empty vector
cl <- makeCluster(detectCores() - 1) # set number of cluster/cores for parallelization
registerDoParallel(cl) # register clusters
my_vec[i] <- YOUR CODE HERE # filled vector with results
}
proc.time() - ptm # stop timer
stopCluster(cl)
Be aware that running in parallel can actually slow down your workflow if too many messages (clusters / cores) are reported at once.
0
source to share