Avoid increasing memory in a foreach loop in R

I am trying to create summary statistics that combine two different sets of spatial data: a large raster file and a polygon file. The idea is to get a summary of the raster values ​​in each polygon.

Since the raster is too large to process it at once, I am trying to create subtasks and process them in parallel, that is, process each polygon from at SpatialPolgyonsDataframe

once.

The code works fine, however after about 100 interactions I am running into memory issues. Here is my code and what I intend to do:

# session setup
library("raster")
library("rgdal")

# multicore processing. 
library("foreach")
library("doSNOW")
# assign three clusters to be used for current R session
cluster = makeCluster(3, type = "SOCK",outfile="")
registerDoSNOW(cluster)
getDoParWorkers()# check if it worked

# load base data
r.terra.2008<-raster("~/terra.tif")
spodf.malha.2007<-readOGR("~/,"composed")

# bring both data-sets to a common CRS
proj4string(r.terra.2008)
proj4string(spodf.malha.2007)
spodf.malha.2007<-spTransform(spodf.malha.2007,CRSobj = CRS(projargs = proj4string(r.terra.2008)))
proj4string(r.terra.2008)==proj4string(spodf.malha.2007) # should be TRUE

# create a function to extract areas
function.landcover.sum<-function(r.landuse,spodf.pol){
  return(table(extract(r.landuse,spodf.pol)))}

# apply it one one subset to see if it is working
function.landcover.sum(r.terra.2008,spodf.malha.2007[1,])

## parallel loop
# define package(s) to be use in the parallel loop
l.packages<-c("raster","sp")

# try a parallel loop for the first 6 polygons
l.results<-foreach(i=1:6,
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     return(function.landcover.sum(r.terra.2008,spodf.malha.2007[i,]))
                     }

      

here is a list that looks like this.

l.results

[[1]]

9     10 
193159   2567 

[[2]]

7    9   10   12   14   16 
17  256 1084  494   67   15 

[[3]]

3      5      6      7      9     10     11     12 
2199   1327   8840   8579 194437   1061   1073   1834 
14     16 
222   1395 

[[4]]

3      6      7      9     10     12     16 
287    102    728 329057   1004   1057     31 

[[5]]

3      5      6      7      9     12     16 
21      6     20    495 184261   4765     28 

[[6]]

6    7    9   10   12   14 
161  161  386  943  205 1515 

      

So the result is pretty small and shouldn't be the source of a memory allocation problem. Thus, than the next loop over the entire polygon dataset, which has> 32,000 rows, creates a memory allocation that is greater than 8 GB after about 100 iterations.

# apply the parallel loop on the whole dataset
l.results<-foreach(i=1:nrow(spodf.malha.2007),
                   .packages = l.packages) %dopar% {
                     print(paste("Processing Polygon ",i, ".",sep=""))
                     return(function.landcover.sum(r.terra.2008,spodf.malha.2007[i,]))
                     # gc(reset=TRUE) # does not resolve the problem
                     # closeAllConnections()  # does not resolve the problem
                   }

      

What am I doing wrong?

edit I tried (as suggested in the comments) to delete the object after each iteration in the inner loop, but that didn't solve the problem. Also, I tried to solve possible problems with multiple data importers by first sending the objects to the environment:

clusterExport(cl = cluster,
              varlist = c("r.terra.2008","function.landcover.sum","spodf.malha.2007"))

      

without significant changes. My R version is 3.4 on linux platform, so presumably the link patch from the fist comment should already be included in this version. I also tried the package parallel

as suggested in the first comment, but no differences showed up.

+3
foreach parallel-processing memory r raster


source to share


No one has answered this question yet

Check out similar questions:

3044
Making a memory leak with Java
1906
How does PHP foreach work?
1480
Is there a reason for C # to reuse variable in foreach?
1249
How does the Java loop for each loop work?
782
How do you get the index of the current iteration of the foreach loop?
687
LINQ equivalent of foreach for IEnumerable <T>
510
Is there a foreach loop in Go?
472
Calling delete in foreach loop in Java
0
Foreach code works for% do% but not for% dopar%
0
R: Limiting temporary file in Foreach loops



All Articles
Loading...
X
Show
Funny
Dev
Pics