Average precision score too high looking at the confusion matrix

514 Views Asked by At

I am developing a machine learning scikit-learn model on an imbalanced dataset (binary classification). Looking at the confusion matrix and the F1 score, I expect a lower average precision score but I almost get a perfect score and I can't figure out why. This is the output I am getting:

Confusion matrix on the test set:

[[6792  199]
[   0  173]]

F1 score: 0.63

Test AVG precision score: 0.99

I am giving the avg precision score function of scikit-learn probabilities which is what the package says to use. I was wondering where the problem could be.

1

There are 1 best solutions below

2
On

The confusion matrix and f1 score are based on a hard prediction, which in sklearn is produced by cutting predictions at a probability threshold of 0.5 (for binary classification, and assuming the classifier is really probabilistic to begin with [so not SVM e.g.]). The average precision in contrast is computed using all possible probability thresholds; it can be read as the area under the precision-recall curve.

So a high average_precision_score and low f1_score suggests that your model does extremely well at some threshold that is not 0.5.