I am facing an error regarding the Python SHAP library. While it is no problem to create force plots based on the log odds, I am not able to create force plots based on probabilities. The goal is to have base_values and shap_values which sum up to the predicted probability.
This works:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xgboost as xgb
import sklearn
import shap
X, y = shap.datasets.iris()
X_display, y_display = shap.datasets.iris(display=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.2, random_state = 42)
#fit xgboost model
params = {
'objective': "multi:softprob",
'eval_metric': "mlogloss",
'num_class': 3
}
xgb_fit = xgb.train(
params = params
, dtrain = xgb.DMatrix(data = X_train, label = y_train)
)
#create shap values and perform tests
explainer = shap.TreeExplainer(xgb_fit)
shap_values = explainer.shap_values(X_train)
And this does not work:
explainer = shap.TreeExplainer(
model = xgb_fit
, data = X_train
, feature_perturbation='interventional'
, model_output = 'probability'
)
Used packages:
matplotlib 3.4.1
numpy 1.20.2
pandas 1.2.4
scikit-learn 0.24.1
shap 0.39.0
xgboost 1.4.1
To see how your raw scores for multiclass classification add up in probability space try
KernelExplainer
:Sanity check:
(or if you wish
np.unique(y_train, return_counts=True)[1]/len(y_train)
)