Threshold moving to find the best cost for imbalanced dataset classification

243 Views Asked by At

I need to confirm my understanding of the threshold moving process to find the best cost of misclassification (binary) for imbalanced dataset.

  1. Split data into train and test.
  2. Fit the model on train data set.
  3. Obtain the predicted probabilities for train data
  4. Perform threshold moving to get the best threshold giving the least misclassification cost and compute confusion matrix.
  5. With the selected best threshold , predict class on test data probabilities and compute the test cost.
  6. Repeat steps 1 to 5 , for 'n' folds and compute the average test cost.

Can somebody please confirm this is the right way of threshold moving ?

Thanks !

Edit: When I cross validated with 5 folds , noticed that threshold that gives the least cost is not the same for all folds. So then , how should I proceed ? I am finding the average cost across the 5 folds, but how do I interpret the different thresholds ?

0

There are 0 best solutions below