SHAP's base value and predicted value using isolation forest are big

642 Views Asked by At

I used isolation forest model to do outlier detection and I also tried to build shap_force plot to see features.

The isolation forest model I build is:

model = IsolationForest(n_estimators=50, max_samples='auto', contamination=float(0.2), max_features=1.0,random_state= 0)
model.fit(df)
pred = model.predict(df)
df['anomaly_label']=pred

And i tried to get the shap values:

def shap_plot(j):
    explainerModel = shap.TreeExplainer(model)
    shap_values_Model = explainerModel.shap_values(df,check_additivity=False)
    p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], S.iloc[[j]])
    return(p)

And some examples I got are like:

shap value

shap value

The base value and predicted values are big which exceed the range. I wonder why does this happen? Are there any ways to solve this problem?

1

There are 1 best solutions below

0
On

shap is explaining the anomaly score of the samples rather than the hard predictions. (This is quite common; e.g. in probabilistic classifiers, often the explanations take place in the log-odds space by default, rather than in probability space or hard classifications.) The score_samples and decision_function methods of IsolationForest are more relevant here than predict. That said, it looks like the explanation is on another transformation of those scores; see this comment in the PR that added isolation forests to the TreeExplainer.