Avoid increasing memory in a foreach loop in R

I am trying to create summary statistics that combine two different sets of spatial data: a large raster file and a polygon file. The idea is to get a summary of the raster values ​​in each polygon.

Since the raster is too large to process it at once, I am trying to create subtasks and process them in parallel, that is, process each polygon from at SpatialPolgyonsDataframe


The code works fine, however after about 100 interactions I am running into memory issues. Here is my code and what I intend to do:

# session setup

# multicore processing. 
# assign three clusters to be used for current R session
cluster = makeCluster(3, type = "SOCK",outfile="")
getDoParWorkers()# check if it worked

# load base data

# bring both data-sets to a common CRS
spodf.malha.2007<-spTransform(spodf.malha.2007,CRSobj = CRS(projargs = proj4string(r.terra.2008)))
proj4string(r.terra.2008)==proj4string(spodf.malha.2007) # should be TRUE

# create a function to extract areas

# apply it one one subset to see if it is working

## parallel loop
# define package(s) to be use in the parallel loop

# try a parallel loop for the first 6 polygons
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))


here is a list that looks like this.



9     10 
193159   2567 


7    9   10   12   14   16 
17  256 1084  494   67   15 


3      5      6      7      9     10     11     12 
2199   1327   8840   8579 194437   1061   1073   1834 
14     16 
222   1395 


3      6      7      9     10     12     16 
287    102    728 329057   1004   1057     31 


3      5      6      7      9     12     16 
21      6     20    495 184261   4765     28 


6    7    9   10   12   14 
161  161  386  943  205 1515 


So the result is pretty small and shouldn't be the source of a memory allocation problem. Thus, than the next loop over the entire polygon dataset, which has> 32,000 rows, creates a memory allocation that is greater than 8 GB after about 100 iterations.

# apply the parallel loop on the whole dataset
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     # gc(reset=TRUE) # does not resolve the problem
                     # closeAllConnections()  # does not resolve the problem


What am I doing wrong?

edit I tried (as suggested in the comments) to delete the object after each iteration in the inner loop, but that didn't solve the problem. Also, I tried to solve possible problems with multiple data importers by first sending the objects to the environment:

clusterExport(cl = cluster,
              varlist = c("r.terra.2008","function.landcover.sum","spodf.malha.2007"))


without significant changes. My R version is 3.4 on linux platform, so presumably the link patch from the fist comment should already be included in this version. I also tried the package parallel

as suggested in the first comment, but no differences showed up.


source to share

All Articles