R ctree strange error

I have weird problem for loops with ctree data. If I write this code in a loop then R freezes.

data = read.csv("train.csv") #data description https://www.kaggle.com/c/titanic-gettingStarted/data

treet = ctree(Survived ~ ., data = data)
print(plot(treet))

      

Sometimes I get the error "More than 52 levels in the predictor factor, truncated for printing" and my tree is showing in a very strange way. Sometimes it works fine. Indeed, really strange!

My loop code:

functionPlot <- function(traine, i) {
  print(i) # print only once, then RStudio freezes
  tempd <- ctree(Survived ~ ., data = traine)
  print(plot(tempd))
}

for(i in 1:2) {
  smp_size <- floor(0.70 * nrow(data))
  train_ind <- sample(seq_len(nrow(data)), size = smp_size)
  set.seed(100 + i)
  train <- data[train_ind, ]
  test <- data[-train_ind, ]
#
  functionPlot(train,i)
}

      

+3


source to share


1 answer


The function ctree()

expects (a) that the appropriate classes (numeric, multiplier, etc.) are used for each variable and that (b) only useful predicates are used in the model formula.

As for (b), you provided variables that are actually just symbols (like Name

) and not factors. This had to either be pre-processed appropriately or omitted from the analysis.

Even if you don't, you won't get the best results, because some variables (like Survived

and Pclass

) are encoded numerically, but are really categorical variables that should be factors. If you look at the scripts from https://www.kaggle.com/c/titanic/forums/t/13390/introducing-kaggle-scripts , you will also see how data preparation can be done. Here I am using

titanic <- read.csv("train.csv")
titanic$Survived <- factor(titanic$Survived,
  levels = 0:1, labels = c("no", "yes"))
titanic$Pclass <- factor(titanic$Pclass)
titanic$Name <- as.character(titanic$Name)

      

As for (b), I will then move on to invoke ctree()

only those variables that have been sufficiently preprocessed for meaningful analysis. (And I'm using the new recommended implementation from the package partykit

.)



library("partykit")
ct <- ctree(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked,
  data = titanic)
plot(ct)
print(ct)

      

This gives the following graphical output:

ctree for titanic data

And the next output:

Model formula:
Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked

Fitted party:
[1] root
|   [2] Sex in female
|   |   [3] Pclass in 1, 2: yes (n = 170, err = 5.3%)
|   |   [4] Pclass in 3
|   |   |   [5] Fare <= 23.25: yes (n = 117, err = 41.0%)
|   |   |   [6] Fare > 23.25: no (n = 27, err = 11.1%)
|   [7] Sex in male
|   |   [8] Pclass in 1
|   |   |   [9] Age <= 52: no (n = 88, err = 43.2%)
|   |   |   [10] Age > 52: no (n = 34, err = 20.6%)
|   |   [11] Pclass in 2, 3
|   |   |   [12] Age <= 9
|   |   |   |   [13] Pclass in 3: no (n = 71, err = 18.3%)
|   |   |   |   [14] Pclass in 2: yes (n = 13, err = 30.8%)
|   |   |   [15] Age > 9: no (n = 371, err = 11.3%)

Number of inner nodes:    7
Number of terminal nodes: 8

      

+3


source







All Articles