Converting predicted probabilities after downsampling to actual probabilities in classification (using mlr)
If I use undersampling in the case of an unbalanced target binary to train the model, the prediction method calculates the probabilities assuming a balanced dataset. How can I convert these probabilities to actual probabilities for unbalanced data? Is the conversion argument / function implemented in the mlr package or other package? For example:
a <- data.frame(y=factor(sample(0:1, prob = c(0.1,0.9), replace=T, size=100)))
a$x <- as.numeric(a$y)+rnorm(n=100, sd=1)
task <- makeClassifTask(data=a, target="y", positive="0")
learner <- makeLearner("classif.binomial", predict.type="prob")
learner <- makeUndersampleWrapper(learner, usw.rate = 0.1, usw.cl = "1")
model <- train(learner, task, subset = 1:50)
pred <- predict(model, task, subset = 51:100)
head(pred$data)
source to share
A very simple but powerful method was proposed by [Dal Pozzolo et al., 2015] .
It is specifically designed to solve the problem of sizing (i.e. converting your classifier's predicted probabilities to probabilities in an unbalanced case) in the case of downsampling.
You just need to correct the predicted probability p_s using the following formula:
p = beta * p_s / ((beta-1) * p_s + 1)
where beta is the ratio of the number of instances of the majority number after under-declaration by the numbers of the majority number in the original training set.
Other Methods Other methods have been proposed that are not specifically targeted at downsampling offset. Among them, the most popular are the following:
- Platts scaling or sigmoid method (Platt, 1999)
- Isotonic regression method ( Zadrozny and Elkan, 2001 )
They are both implemented in R
source to share