Multi-core graph generation

I have a loop for

that generates through png()

and dev.off()

graph and saves it in the working directory.

The loop I have is similar to the following example

test.df<-data.frame(id=1:25000, x=rnorm(25000),y=rnorm(25000))

for (i in test.df$id){
  plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
}

      

The cycle for

starts and generates thousands of graphs. Is it possible to have it run in parallel on all 8 cores of my system so that I can get graphs faster?

PS. Code example . My original problem and plots are much more complicated. Don't use a virus for example.

+3


source to share


3 answers


If you are using a newer version of R this should be easy. The trick is to create a function that can be run on any core in any order. First, we create our dataframe:

test.df = data.frame(id=1:250, x=rnorm(250),y=rnorm(250))

      

Then we create a function that runs on each core:

#I could also pass the row or the entire data frame
myplot = function(id) {
  fname = paste0("/tmp/plot", id, ".png")
  png(fname)
  plot(test.df$x[id], test.df$y[id], 
      xlab="chi",ylab="psi")
  dev.off()
  return(fname)
}

      

Then I download the package parallel

(this comes with the R base)



library(parallel)

      

and then use mclapply

no_of_cores = 8
##Non windows
mclapply(1:nrow(test.df), myplot, 
         mc.cores = no_of_cores)

##All OS's
cl = makeCluster(no_of_cores)
clusterExport(cl, "test.df")
parSapply(cl, 1:nrow(test.df), myplot)
stopCluster(cl)

      

There are two advantages here:

  • The package parallel

    comes with R, so we don't need to install anything extra
  • We can turn off the "parallel" part:

    sapply(1:nrow(test.df), myplot)
    
          

+6


source


With the package, foreach

you must change the minimum kernel code. Also you can choose any backend of your choice regarding OS or other issues.

##
## Working dir and data generation
##
setwd("/path/to")
N <- 25000
test.df<-data.frame(id=1:N, x=rnorm(N),y=rnorm(N))

##
## Making a cluster
##
require(doSNOW) # Or any other backend of your choice
NC <- 8         # Number of nodes in cluster, i.e. cores
cl <- makeCluster(rep("localhost", NC), type="SOCK")
registerDoSNOW(cl)

## 
## Core loop
##
foreach(i=1:N) %dopar% {
  png(paste("plot",i,".png",sep=""))
  plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
  dev.off()
}

##
## Stop cluster
##
stopCluster(cl)

      



It's easy to go to one core: just replace %dopar%

with %do%

.

+4


source


Since it is mclapply

not supported on windows, I suggest a solution for Windows users using the package parallel

.

cl <- makeCluster(8)
parSapply(cl, 1:20, fun, fun.args)

      

+3


source







All Articles