Converting predicted probabilities after downsampling to actual probabilities in classification (using mlr)

Question

Converting predicted probabilities after downsampling to actual probabilities in classification (using mlr)

If I use undersampling in the case of an unbalanced target binary to train the model, the prediction method calculates the probabilities assuming a balanced dataset. How can I convert these probabilities to actual probabilities for unbalanced data? Is the conversion argument / function implemented in the mlr package or other package? For example:

a <- data.frame(y=factor(sample(0:1, prob = c(0.1,0.9), replace=T, size=100)))
a$x <- as.numeric(a$y)+rnorm(n=100, sd=1)
task <- makeClassifTask(data=a, target="y", positive="0")
learner <- makeLearner("classif.binomial", predict.type="prob")
learner <- makeUndersampleWrapper(learner, usw.rate = 0.1, usw.cl = "1")
model <- train(learner, task, subset = 1:50)
pred <- predict(model, task, subset = 51:100)
head(pred$data)

+3

r classification predict mlr

tover 18 jul. 17 at 10:44

source to share

1 answer

Pop · Accepted Answer · 2017-07-18T10:54:03+0000

A very simple but powerful method was proposed by [Dal Pozzolo et al., 2015] .

It is specifically designed to solve the problem of sizing (i.e. converting your classifier's predicted probabilities to probabilities in an unbalanced case) in the case of downsampling.

You just need to correct the predicted probability p_s using the following formula:

   p = beta * p_s / ((beta-1) * p_s + 1)

where beta is the ratio of the number of instances of the majority number after under-declaration by the numbers of the majority number in the original training set.

Other Methods Other methods have been proposed that are not specifically targeted at downsampling offset. Among them, the most popular are the following:

Platts scaling or sigmoid method (Platt, 1999)
Isotonic regression method ( Zadrozny and Elkan, 2001 )

They are both implemented in R

Converting predicted probabilities after downsampling to actual probabilities in classification (using mlr)

More articles: