I am developing a machine learning scikit-learn model on an imbalanced dataset (binary classification). Looking at the confusion matrix and the F1 score, I expect a lower average precision score but I almost get a perfect score and I can't figure out why. This is the output I am getting:
Confusion matrix on the test set:
[[6792 199]
[ 0 173]]
F1 score: 0.63
Test AVG precision score: 0.99
I am giving the avg precision score function of scikit-learn probabilities which is what the package says to use. I was wondering where the problem could be.
The confusion matrix and f1 score are based on a hard prediction, which in sklearn is produced by cutting predictions at a probability threshold of 0.5 (for binary classification, and assuming the classifier is really probabilistic to begin with [so not SVM e.g.]). The average precision in contrast is computed using all possible probability thresholds; it can be read as the area under the precision-recall curve.
So a high
average_precision_score
and lowf1_score
suggests that your model does extremely well at some threshold that is not 0.5.