Efficient memory prediction with RandomForest in R

Question

Efficient memory prediction with RandomForest in R

TL; DR I want to know memory efficient ways for batch prediction with randomForest models built on large datasets (hundreds of functions, 10k rows).

More details

I am working with a large dataset (over 3GB, in memory) and want to do a simple binary classification using randomForest

. Since my data is proprietary I cannot share it, but let's say the following code works

library(randomForest)
library(data.table)

myData <- fread("largeDataset.tsv")
myFeatures <- myData[, !c("response"), with = FALSE]
myResponse <- myData[["response"]]

toBePredicted <- fread("unlabeledData.tsv")

rfObj <- randomForest(x = myFeatures, y = myResponse, ntree = 100L)

predictedLabels <- predict(rfObj, toBePredicted)

However, it takes up several GB of memory.

I know that I can save memory by disabling a bunch of proximity and importance measures and arguments keep.*

:

rfObjWithPreds <- randomForest(x = myFeatures,
                               y = myResponse,
                               proximity = FALSE,
                               localImp = FALSE,
                               importance = FALSE,
                               ntree = 100L,
                               keep.forest = FALSE,
                               keep.inbag = FALSE,
                               xtest = toBePredicted)

However, I am now wondering if this is the most memory efficient way of getting predictions for toBePredicted

. Another way I could do this is to grow the forest in parallel and do garbage collection actively:

library(doParallel)
registerDoParallel(ncores = 5)

subForestVotes <- foreach(subForestNumber = iter(seq.int(5)),
                          .combine = cbind) %dopar% {
    rfObjWithPreds <- randomForest(x = myFeatures,
                               y = myResponse,
                               proximity = FALSE,
                               localImp = FALSE,
                               importance = FALSE,
                               ntree = 100L,
                               keep.forest = FALSE,
                               keep.inbag = FALSE,
                               xtest = toBePredicted)
   output <- rfObjWithPreds[["test"]][["votes"]]
   rm(rfObjWithPreds)
   return(output)
}

Does anyone have a smarter way to forecast effectively toBePredicted

?

+3

parallel-processing r random-forest

StevieP 10 oct. At 9:54 am

source to share

No one has answered this question yet

See similar questions:

1

How to reduce the size of the randomForest object

or similar:

459

Tricks for managing available memory in an R session

8

parallel generation of random forests using scikit-learn

3

How can I reduce the size of the RandomForest model?

2

NAs in rasters and randomForest :: pred ()

2

parallel forecasting with cforest / randomforest prediction (with doSNOW)

1

increase memory in R

0

R- Cannot be predicted with randomForest

0

random prediction prediction in r

0

randomForest did not predict serial samples

0

How to convert random forest prediction probabilities into one secret answer?

Efficient memory prediction with RandomForest in R

More articles: