Classification - Using Factor Levels

I am currently working on a predictive model for the churn problem.
Whenever I try to run the following model, I get this error: At least one of the class levels is not a valid R variable name. This will cause errors when generating the class probabilities, since the variable names will be converted to X0, X1. Use factor levels that can be used as valid R variable names.

fivestats <- function(...) c( twoClassSummary(...), defaultSummary(...))
fitControl.default    <- trainControl( 
    method  = "repeatedcv"
  , number  = 10
  , repeats = 1 
  , verboseIter = TRUE
  , summaryFunction  = fivestats
  , classProbs = TRUE
  , allowParallel = TRUE)
set.seed(1984)

rpartGrid             <-  expand.grid(cp = seq(from = 0, to = 0.1, by = 0.001))
rparttree.fit.roc <- train( 
    churn ~ .
  , data      = training.dt  
  , method    = "rpart"
  , trControl = fitControl.default
  , tuneGrid  = rpartGrid
  , metric = 'ROC'
  , maximize = TRUE
)

      

In the attached image you see my data, I have already converted some data from chr to a factor variable.

DATA OVERVIEW

I don't understand what my problem is, if I were to convert all data to factors, then for example the variable total_airtime_out would probably have about 9000 factors.

Thanks for any help!

+6


source to share


3 answers


There is no way to reproduce your error, but I realized that the error message tells you everything you need to know:

At least one of the class levels is not a valid R variable name . This will lead to errors when class probabilities are generated as the variable names will be converted to X0, X1. Please use factor levels that can be used as valid R variable names.

My accent. Looking at your response variable, its levels "0"

and "1"

, these are invalid variable names in R (you can't do 0 <- "my value"

). Presumably this problem goes away if you rename the levels of the response variable with something like



levels(training.dt$churn) <- c("first_class", "second_class")

according to this Q .

+19


source


How about this basic function:

 make.names(churn) ~ .,

      



"make syntactically valid names from character vectors"?

Source

+3


source


In addition to @einar's correct answer, here's the dplyr syntax to convert factor levels:

training.dt  %>% 
  mutate(churn = factor(churn, 
          levels = make.names(levels(churn))))

      

I prefer slightly to change only the labels of the factor levels, since the levels change the underlying data, for example:

training.dt  %>% 
  mutate(churn = factor(churn, 
          labels = make.names(levels(churn))))

      

0


source







All Articles