I would like to know, how to use "sagemaker.workflow.quality_check_step.ModelQualityCheckConfig" for a Logistic Regression problem. In this link: sagemaker-pipeline-model-monitor-clarify-steps.ipynb, there is an example that shows how to use this class in a regression problem.
But my case is different because I'm using a XGBoost Classifier that uses binary:logistic as objective function, this model gives my the probability of having a possitive class.
The results of the batch transform step of my use cases look like the following:
Output of the batch transform step of my use case
The first column is the label values and the second is the inferences of the model.
So, I don't know how to use this class (ModelQualityCheckConfig) for a binary classification problem like my case. Is there a way in which I can specify a threshold for the predictions or something like that?
I tried something like this:
And of course the processing job failed because It says that the inference values have too many classes and this is because those values are the probabilities and not just 0 or 1.
I would like to know if there is a way in which I can specify a threshold for the predictions.
You can specify the threshold for Binary classification using attribute
probability_threshold_attribute="0.5",
Remember, for binary classification your training dataset label should be 1/0 not 1.0/0.0. Typecast label into int. As mentioned in this post: https://repost.aws/questions/QU8EShrCQdT7eJQq8PUGyGmQ/sagemaker-model-quality-check-step-for-binary-classification-is-failing-with-error-message-more-than-two-classes-are-not-supported-in-binary-classification