set threshold for the probability result from decision tree

6k Views Asked by At

I tried to calculate the confusion matrix after I conduct the decision tree model

# tree model
tree <- rpart(LoanStatus_B ~.,data=train, method='class')
# confusion matrix
pdata <- predict(tree, newdata = test, type = "class")
confusionMatrix(data = pdata, reference = test$LoanStatus_B, positive = "1")

How can I set the threshold to my confusion matrix, say maybe I want probability above 0.2 as default, which is the binary outcome.

1

There are 1 best solutions below

2
On BEST ANSWER

Several things to note here. Firstly, make sure you're getting class probabilities when you do your predictions. With prediction type ="class" you were just getting discrete classes, so what you wanted would've been impossible. So you'll want to make it "p" like mine below.

library(rpart)
data(iris)

iris$Y <- ifelse(iris$Species=="setosa",1,0)

# tree model
tree <- rpart(Y ~Sepal.Width,data=iris, method='class')

# predictions
pdata <- as.data.frame(predict(tree, newdata = iris, type = "p"))
head(pdata)

# confusion matrix
table(iris$Y, pdata$`1` > .5)

Next note that .5 here is just an arbitrary value -- you can change it to whatever you want.

I don't see a reason to use the confusionMatrix function, when a confusion matrix can be created simply this way and allows you to acheive your goal of easily changing the cutoff.

Having said that, if you do want to use the confusionMatrix function for your confusion matrix, then just create a discrete class prediction first based on your custom cutoff like this:

pdata$my_custom_predicted_class <- ifelse(pdata$`1` > .5, 1, 0)

Where, again, .5 is your custom chosen cutoff and can be anything you want it to be.

caret::confusionMatrix(data = pdata$my_custom_predicted_class, 
                  reference = iris$Y, positive = "1")
Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 94 19
         1  6 31

               Accuracy : 0.8333          
                 95% CI : (0.7639, 0.8891)
    No Information Rate : 0.6667          
    P-Value [Acc > NIR] : 3.661e-06       

                  Kappa : 0.5989          
 Mcnemar's Test P-Value : 0.0164          

            Sensitivity : 0.6200          
            Specificity : 0.9400          
         Pos Pred Value : 0.8378          
         Neg Pred Value : 0.8319          
             Prevalence : 0.3333          
         Detection Rate : 0.2067          
   Detection Prevalence : 0.2467          
      Balanced Accuracy : 0.7800          

       'Positive' Class : 1