Optimize threshold to always be a particular value of the sensitivity/true positive rate

535 Views Asked by At

How can I code in r for the threshold of a predictive model to automatically be a value such that the sensitivity is a particular proportion/value for all runs of the model?

For example, given the following scenarios:

  1. At threshold of 0.2; True positive = 20, False negative = 60 i.e. sensitivity of 0.25
  2. At threshold of 0.35; True positive = 60, False negative = 20 i.e. sensitivity of 0.8

How do I write an r code that automatically always picks the threshold for sensitivity 0.8 i.e. scenario 2 from above? For context, I'm using the caret modelling framework.

These links on threshold optimization did not help much:

http://topepo.github.io/caret/using-your-own-model-in-train.html#Illustration5

Obtaining threshold values from a ROC curve

1

There are 1 best solutions below

0
On

(1)

Say you have a data with values and true labels. Here, 5 false and 5 true

df <- data.frame(value = c(1,2,3,5,8,4,6,7,9,10),
             truth = c(rep(0,5), rep(1,5)))

At threshold 9, 9 and 10 were detected as true positive, sensitivity = 40% At threshold 6 (or anything between 5 and 6), (6,7,9,10) were detected, sensitivity = 80%

To see the ROC curve, you can use the pROC package

library(pROC)
roc.demo <- roc(truth ~ value, data = df)
par(pty = "s") # make it square
plot(roc.demo) # plot ROC curve

ROC curve demo

If you want percentage, do below

roc.demo <- roc(truth ~ value, data = df, percent = T)

and replace 0.8 with 80 in below.

You can get the thresholds from the roc object

roc.demo$thresholds[roc.demo$sensitivities == 0.8]

You might see it says 4.5 and 5.5

You may also use roc.demo$sensitivities > 0.79 & roc.demo$sensitivities < 0.81

(2)

Alternatively, if you just want a threshold and don't care about the specificity, you may try the quantile function

quantile(df$value[df$truth == 1], 
     probs = c(0.00, 0.10, 0.20, 0.30), type = 1) # percentile giving the closest number

probs=0.20 corresponds to 80% sensitivity

0% 10% 20% 30% 

 4   4   4   6 

Anything threshold between 4 and 6 is what you are looking for. You may change the probs as you need.

Hopefully, it helps.