R caretEnsemble warning: indexes not defined in trControl

I have some r / caret code that fits multiple cross-checked models for some data, but I am getting a warning message that I am having difficulty finding any information. Is this something I should be worried about?

library(datasets)
library(caret)
library(caretEnsemble)

# load data
data("iris")

# establish cross-validation structure
set.seed(32)
trainControl <- trainControl(method="repeatedcv", number=5, repeats=3, savePredictions=TRUE, search="random")

# fit several (cross-validated) models 
algorithmList <- c('lda',         # Linear Discriminant Analysis 
                   'rpart' ,      # Classification and Regression Trees
                   'svmRadial')   # SVM with RBF Kernel

models <- caretList(Species~., data=iris, trControl=trainControl, methodList=algorithmList)

      

log output:

Warning messages:
1: In trControlCheck(x = trControl, y = target) :
  x$savePredictions == TRUE is depreciated. Setting to 'final' instead.
2: In trControlCheck(x = trControl, y = target) :
  indexes not defined in trControl.  Attempting to set them ourselves, so each model in the ensemble will have the same resampling indexes.

      

... I thought my trainControl object defining a cross validation framework (3x-5x cross validation) would generate a set of indices for the cv splits. So I am confused as to why I received this message.

+3


source to share


1 answer


trainControl

by default does not generate indices for you, it acts as a way to pass all parameters to each trainable model.

When we search github issues regarding a bug, we can find this specific issue .

You need to make sure each model matches EXACTLY resampling. caretEnsemble builds an ensemble by merging test cases together for each cross validation, and you will get incorrect results if each reference has different observations in it.

Before approaching your models, you need to build a trainControl object and manually set the indices on that object.

eg. myControl <- trainControl(index=createFolds(y, 10))

...

We are working on an interface to caretEnsemble that handles building a resampling strategy for you and then fitting multiple models using those patterns, but they are not finished yet.

Repeat that the check exists for some reason. You need to set index in trainControl and pass the EXACT SUCH indices to each model you want to ensemble.



So what does it mean when you specify number = 5

and repeats = 3

, the models do not actually get a given index for which patterns belong to each fold, but rather generate them themselves.

Therefore, to ensure consistency of models with each other as to which samples belong to the folds, you must specify index = createFolds(iris$Species, 5)

in the objecttrainControl

# new trainControl object with index specified
trainControl <- trainControl(method = "repeatedcv",
                             number = 5,
                             index = createFolds(iris$Species, 5),
                             repeats = 3,
                             savePredictions = "all",
                             search = "random")

      

+2


source







All Articles