R: how to calculate the sensitivity and specificity of the rpart tree
library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
levels=c("very inactive", "inactive", "active", "very active"),
ordered=TRUE),
Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
prp(mytree, type = 4, extra = 101, leaf.round = 0, fallen.leaves = TRUE,
varlen = 0, tweak = 1.2)
Then using printcp
I can see the results of the cross validation
> printcp(mytree)
Classification tree:
rpart(formula = Fraud ~ RearEnd + Whiplash + Activity, data = train,
method = "class", minsplit = 2, minbucket = 1, cp = -1)
Variables actually used in tree construction:
[1] Activity RearEnd Whiplash
Root node error: 5/10 = 0.5
n= 10
CP nsplit rel error xerror xstd
1 0.6 0 1.0 2.0 0.0
2 0.2 1 0.4 0.4 0.3
3 -1.0 3 0.0 0.4 0.3
So the root node error is 0.5 and I understand it is a misclassification error. But I'm having trouble calculating sensitivity (proportion of true positives) and specifics (proportion of true negatives). How can I calculate them based on the output rpart
?
(the above example is from http://gormanalysis.com/decision-trees-in-r-using-rpart/ )
source to share
You can use a package caret
for this:
Data:
library(rpart)
train <- data.frame(ClaimID = c(1,2,3,4,5,6,7,8,9,10),
RearEnd = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE),
Whiplash = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE),
Activity = factor(c("active", "very active", "very active", "inactive", "very inactive", "inactive", "very inactive", "active", "active", "very active"),
levels=c("very inactive", "inactive", "active", "very active"),
ordered=TRUE),
Fraud = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE))
mytree <- rpart(Fraud ~ RearEnd + Whiplash + Activity, data = train, method = "class", minsplit = 2, minbucket = 1, cp=-1)
Decision
library(caret)
#calculate predictions
preds <- predict(mytree, train)
#calculate sensitivity
> sensitivity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1
#calculate specificity
> specificity(factor(preds[,2]), factor(as.numeric(train$Fraud)))
[1] 1
Both sensitivity
both specificity
take predictions as the first argument and the observable values ββ(the response variable, i.e. train$Fraud
) as the second argument .
According to the documentation, both predictions and observed values ββmust be passed to functions as factors that have the same levels.
Both the specificity and the sensitivity are 1 in this case because the predictions are 100% accurate.
source to share