How to use "sagemaker.workflow.quality_check_step.ModelQualityCheckConfig" for a Logistic Regression problem

255 Views Asked by At

I would like to know, how to use "sagemaker.workflow.quality_check_step.ModelQualityCheckConfig" for a Logistic Regression problem. In this link: sagemaker-pipeline-model-monitor-clarify-steps.ipynb, there is an example that shows how to use this class in a regression problem.

Regression example

But my case is different because I'm using a XGBoost Classifier that uses binary:logistic as objective function, this model gives my the probability of having a possitive class.

The results of the batch transform step of my use cases look like the following:

Output of the batch transform step of my use case

The first column is the label values and the second is the inferences of the model.

So, I don't know how to use this class (ModelQualityCheckConfig) for a binary classification problem like my case. Is there a way in which I can specify a threshold for the predictions or something like that?

I tried something like this:

First try

And of course the processing job failed because It says that the inference values have too many classes and this is because those values are the probabilities and not just 0 or 1.

I would like to know if there is a way in which I can specify a threshold for the predictions.

1

There are 1 best solutions below

0
On

You can specify the threshold for Binary classification using attribute probability_threshold_attribute="0.5", Remember, for binary classification your training dataset label should be 1/0 not 1.0/0.0. Typecast label into int. As mentioned in this post: https://repost.aws/questions/QU8EShrCQdT7eJQq8PUGyGmQ/sagemaker-model-quality-check-step-for-binary-classification-is-failing-with-error-message-more-than-two-classes-are-not-supported-in-binary-classification