I'm trying to create a force_plot for my Random Forest model that has two classes (1 and 2), but I am a bit confused about the parameters for the force_plot.
I have two different force_plot parameters I can provide the following:
shap.force_plot(explainer.expected_value[0], shap_values[0], choosen_instance, show=True, matplotlib=True)
shap.force_plot(explainer.expected_value[1], shap_values[1], choosen_instance, show=True, matplotlib=True)
So my questions are:
When creating the force_plot, I must supply expected_value. For my model I have two expected values: [0.20826239 0.79173761], how do I know which to use? My understanding of expected value is that it is the average prediction of my model on train data. Are there two values because I have both class_1 and class_2? So for class_1, the average prediction is 0.20826239 and class_2, it is 0.79173761?
The next parameter is the shap_values, for my chosen instance:
index B G R Prediction 113833 107 119 237 2
I get the following SHAP_values:
[array([[ 0.01705462, -0.01812987, 0.23416978]]),
array([[-0.01705462, 0.01812987, -0.23416978]])]
I don't quite understand why I get two sets of SHAP values? Is one for class_1 and one for class_2? I have been trying to compare the images I attached, given both sets of SHAP values and expected value but I can't really explain what is going on in terms of the prediction.
Let's try reproducible:
Then, your SHAP expected values are:
This is what your model would predict on average given background dataset (fed to explainer above):
Then, if you have a datapoint of interest:
You can achieve exactly the same with SHAP values:
Note, they are symmetrical, because what increase chances towards class
1
decreases chances for0
by the same amount.With base values and SHAP values, the probabilities (or chances for a data point to end up in leaf
0
or1
) are:Note, this is same as model predictions.