Multi-core graph generation
I have a loop for
that generates through png()
and dev.off()
graph and saves it in the working directory.
The loop I have is similar to the following example
test.df<-data.frame(id=1:25000, x=rnorm(25000),y=rnorm(25000))
for (i in test.df$id){
plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
}
The cycle for
starts and generates thousands of graphs. Is it possible to have it run in parallel on all 8 cores of my system so that I can get graphs faster?
PS. Code example . My original problem and plots are much more complicated. Don't use a virus for example.
source to share
If you are using a newer version of R this should be easy. The trick is to create a function that can be run on any core in any order. First, we create our dataframe:
test.df = data.frame(id=1:250, x=rnorm(250),y=rnorm(250))
Then we create a function that runs on each core:
#I could also pass the row or the entire data frame
myplot = function(id) {
fname = paste0("/tmp/plot", id, ".png")
png(fname)
plot(test.df$x[id], test.df$y[id],
xlab="chi",ylab="psi")
dev.off()
return(fname)
}
Then I download the package parallel
(this comes with the R base)
library(parallel)
and then use mclapply
no_of_cores = 8
##Non windows
mclapply(1:nrow(test.df), myplot,
mc.cores = no_of_cores)
##All OS's
cl = makeCluster(no_of_cores)
clusterExport(cl, "test.df")
parSapply(cl, 1:nrow(test.df), myplot)
stopCluster(cl)
There are two advantages here:
- The package
parallel
comes with R, so we don't need to install anything extra -
We can turn off the "parallel" part:
sapply(1:nrow(test.df), myplot)
source to share
With the package, foreach
you must change the minimum kernel code. Also you can choose any backend of your choice regarding OS or other issues.
##
## Working dir and data generation
##
setwd("/path/to")
N <- 25000
test.df<-data.frame(id=1:N, x=rnorm(N),y=rnorm(N))
##
## Making a cluster
##
require(doSNOW) # Or any other backend of your choice
NC <- 8 # Number of nodes in cluster, i.e. cores
cl <- makeCluster(rep("localhost", NC), type="SOCK")
registerDoSNOW(cl)
##
## Core loop
##
foreach(i=1:N) %dopar% {
png(paste("plot",i,".png",sep=""))
plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
dev.off()
}
##
## Stop cluster
##
stopCluster(cl)
It's easy to go to one core: just replace %dopar%
with %do%
.
source to share