Avoid increasing memory in a foreach loop in R

Question

Avoid increasing memory in a foreach loop in R

I am trying to create summary statistics that combine two different sets of spatial data: a large raster file and a polygon file. The idea is to get a summary of the raster values in each polygon.

Since the raster is too large to process it at once, I am trying to create subtasks and process them in parallel, that is, process each polygon from at SpatialPolgyonsDataframe

once.

The code works fine, however after about 100 interactions I am running into memory issues. Here is my code and what I intend to do:

# session setup
library("raster")
library("rgdal")

# multicore processing. 
library("foreach")
library("doSNOW")
# assign three clusters to be used for current R session
cluster = makeCluster(3, type = "SOCK",outfile="")
registerDoSNOW(cluster)
getDoParWorkers()# check if it worked

# load base data
r.terra.2008<-raster("~/terra.tif")
spodf.malha.2007<-readOGR("~/,"composed")

# bring both data-sets to a common CRS
proj4string(r.terra.2008)
proj4string(spodf.malha.2007)
spodf.malha.2007<-spTransform(spodf.malha.2007,CRSobj = CRS(projargs = proj4string(r.terra.2008)))
proj4string(r.terra.2008)==proj4string(spodf.malha.2007) # should be TRUE

# create a function to extract areas
function.landcover.sum<-function(r.landuse,spodf.pol){
  return(table(extract(r.landuse,spodf.pol)))}

# apply it one one subset to see if it is working
function.landcover.sum(r.terra.2008,spodf.malha.2007[1,])

## parallel loop
# define package(s) to be use in the parallel loop
l.packages<-c("raster","sp")

# try a parallel loop for the first 6 polygons
l.results<-foreach(i=1:6,
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     return(function.landcover.sum(r.terra.2008,spodf.malha.2007[i,]))
                     }

here is a list that looks like this.

l.results

[[1]]

9     10 
193159   2567 

[[2]]

7    9   10   12   14   16 
17  256 1084  494   67   15 

[[3]]

3      5      6      7      9     10     11     12 
2199   1327   8840   8579 194437   1061   1073   1834 
14     16 
222   1395 

[[4]]

3      6      7      9     10     12     16 
287    102    728 329057   1004   1057     31 

[[5]]

3      5      6      7      9     12     16 
21      6     20    495 184261   4765     28 

[[6]]

6    7    9   10   12   14 
161  161  386  943  205 1515

So the result is pretty small and shouldn't be the source of a memory allocation problem. Thus, than the next loop over the entire polygon dataset, which has> 32,000 rows, creates a memory allocation that is greater than 8 GB after about 100 iterations.

# apply the parallel loop on the whole dataset
l.results<-foreach(i=1:nrow(spodf.malha.2007),
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     return(function.landcover.sum(r.terra.2008,spodf.malha.2007[i,]))
                     # gc(reset=TRUE) # does not resolve the problem
                     # closeAllConnections()  # does not resolve the problem
                   }

What am I doing wrong?

edit I tried (as suggested in the comments) to delete the object after each iteration in the inner loop, but that didn't solve the problem. Also, I tried to solve possible problems with multiple data importers by first sending the objects to the environment:

clusterExport(cl = cluster,
              varlist = c("r.terra.2008","function.landcover.sum","spodf.malha.2007"))

without significant changes. My R version is 3.4 on linux platform, so presumably the link patch from the fist comment should already be included in this version. I also tried the package parallel

as suggested in the first comment, but no differences showed up.

+3

foreach parallel-processing memory r raster

joaoal June 12. 17 at 19:38

source to share

No one has answered this question yet

Check out similar questions:

3044

Making a memory leak with Java

1906

How does PHP foreach work?

1480

Is there a reason for C # to reuse variable in foreach?

1249

How does the Java loop for each loop work?

782

How do you get the index of the current iteration of the foreach loop?

687

LINQ equivalent of foreach for IEnumerable <T>

510

Is there a foreach loop in Go?

472

Calling delete in foreach loop in Java

0

Foreach code works for% do% but not for% dopar%

0

R: Limiting temporary file in Foreach loops

Avoid increasing memory in a foreach loop in R

More articles: