Classification - Using Factor Levels
I am currently working on a predictive model for the churn problem.
Whenever I try to run the following model, I get this error: At least one of the class levels is not a valid R variable name. This will cause errors when generating the class probabilities, since the variable names will be converted to X0, X1. Use factor levels that can be used as valid R variable names.
fivestats <- function(...) c( twoClassSummary(...), defaultSummary(...))
fitControl.default <- trainControl(
method = "repeatedcv"
, number = 10
, repeats = 1
, verboseIter = TRUE
, summaryFunction = fivestats
, classProbs = TRUE
, allowParallel = TRUE)
set.seed(1984)
rpartGrid <- expand.grid(cp = seq(from = 0, to = 0.1, by = 0.001))
rparttree.fit.roc <- train(
churn ~ .
, data = training.dt
, method = "rpart"
, trControl = fitControl.default
, tuneGrid = rpartGrid
, metric = 'ROC'
, maximize = TRUE
)
In the attached image you see my data, I have already converted some data from chr to a factor variable.
I don't understand what my problem is, if I were to convert all data to factors, then for example the variable total_airtime_out would probably have about 9000 factors.
Thanks for any help!
source to share
There is no way to reproduce your error, but I realized that the error message tells you everything you need to know:
At least one of the class levels is not a valid R variable name . This will lead to errors when class probabilities are generated as the variable names will be converted to X0, X1. Please use factor levels that can be used as valid R variable names.
My accent. Looking at your response variable, its levels "0"
and "1"
, these are invalid variable names in R (you can't do 0 <- "my value"
). Presumably this problem goes away if you rename the levels of the response variable with something like
levels(training.dt$churn) <- c("first_class", "second_class")
according to this Q .
source to share
In addition to @einar's correct answer, here's the dplyr syntax to convert factor levels:
training.dt %>%
mutate(churn = factor(churn,
levels = make.names(levels(churn))))
I prefer slightly to change only the labels of the factor levels, since the levels change the underlying data, for example:
training.dt %>%
mutate(churn = factor(churn,
labels = make.names(levels(churn))))
source to share