What is the difference between needs_proba and needs_threshold in scikit-learn make_scorer function?

306 Views Asked by At

Difference between them is not well explained in make_scorer documentation. I observed that if needs_proba or needs_threshold is set to True, scoring function receives pred_proba instead of y_pred. However, it is not possible to set them both True. It gives error as

ValueError: Set either needs_proba or needs_threshold to True, but not both

The documentation for needs_threshold says:

For example average_precision or the area under the roc curve can not be computed using discrete predictions alone.

which I understood as needs_threshold should be set to True, if scoring is average_precision or roc_auc_score. However, it works the same whether needs_threshold is True or False.

Can you help me to understand the difference between them and usage of needs_threshold?

1

There are 1 best solutions below

0
On

Per the note further down the docs page, needs_threshold tries for decision_function before falling back to predict_proba. For rank-ordering metrics like roc_auc_score and average_precision, there won't be a difference.

I suppose you could desire a metric that takes either raw decision function output or the (calibrated?) probability outputs. For example, in an SVC, the decision function is (signed) distance from the separating plane, which you might like to compute the average among misclassified examples of, whereas you might also want a metric that makes use of the resulting class probabilities (after a Platt calibration, which happens internally when the SVC's probability=True).