Avoid increasing memory in a foreach loop in R

I am trying to create summary statistics that combine two different sets of spatial data: a large raster file and a polygon file. The idea is to get a summary of the raster values ​​in each polygon.

Since the raster is too large to process it at once, I am trying to create subtasks and process them in parallel, that is, process each polygon from at SpatialPolgyonsDataframe

once.

The code works fine, however after about 100 interactions I am running into memory issues. Here is my code and what I intend to do:

# session setup
library("raster")
library("rgdal")

# multicore processing. 
library("foreach")
library("doSNOW")
# assign three clusters to be used for current R session
cluster = makeCluster(3, type = "SOCK",outfile="")
registerDoSNOW(cluster)
getDoParWorkers()# check if it worked

# load base data
r.terra.2008<-raster("~/terra.tif")
spodf.malha.2007<-readOGR("~/,"composed")

# bring both data-sets to a common CRS
proj4string(r.terra.2008)
proj4string(spodf.malha.2007)
spodf.malha.2007<-spTransform(spodf.malha.2007,CRSobj = CRS(projargs = proj4string(r.terra.2008)))
proj4string(r.terra.2008)==proj4string(spodf.malha.2007) # should be TRUE

# create a function to extract areas
function.landcover.sum<-function(r.landuse,spodf.pol){
  return(table(extract(r.landuse,spodf.pol)))}

# apply it one one subset to see if it is working
function.landcover.sum(r.terra.2008,spodf.malha.2007[1,])

## parallel loop
# define package(s) to be use in the parallel loop
l.packages<-c("raster","sp")

# try a parallel loop for the first 6 polygons
l.results<-foreach(i=1:6,
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     return(function.landcover.sum(r.terra.2008,spodf.malha.2007[i,]))
                     }

      

here is a list that looks like this.

l.results

[[1]]

9     10 
193159   2567 

[[2]]

7    9   10   12   14   16 
17  256 1084  494   67   15 

[[3]]

3      5      6      7      9     10     11     12 
2199   1327   8840   8579 194437   1061   1073   1834 
14     16 
222   1395 

[[4]]

3      6      7      9     10     12     16 
287    102    728 329057   1004   1057     31 

[[5]]

3      5      6      7      9     12     16 
21      6     20    495 184261   4765     28 

[[6]]

6    7    9   10   12   14 
161  161  386  943  205 1515 

      

So the result is pretty small and shouldn't be the source of a memory allocation problem. Thus, than the next loop over the entire polygon dataset, which has> 32,000 rows, creates a memory allocation that is greater than 8 GB after about 100 iterations.

# apply the parallel loop on the whole dataset
l.results<-foreach(i=1:nrow(spodf.malha.2007),
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     return(function.landcover.sum(r.terra.2008,spodf.malha.2007[i,]))
                     # gc(reset=TRUE) # does not resolve the problem
                     # closeAllConnections()  # does not resolve the problem
                   }

      

What am I doing wrong?

edit I tried (as suggested in the comments) to delete the object after each iteration in the inner loop, but that didn't solve the problem. Also, I tried to solve possible problems with multiple data importers by first sending the objects to the environment:

clusterExport(cl = cluster,
              varlist = c("r.terra.2008","function.landcover.sum","spodf.malha.2007"))

      

without significant changes. My R version is 3.4 on linux platform, so presumably the link patch from the fist comment should already be included in this version. I also tried the package parallel

as suggested in the first comment, but no differences showed up.

+3


source to share





All Articles