I used isolation forest model to do outlier detection and I also tried to build shap_force plot to see features.
The isolation forest model I build is:
model = IsolationForest(n_estimators=50, max_samples='auto', contamination=float(0.2), max_features=1.0,random_state= 0)
model.fit(df)
pred = model.predict(df)
df['anomaly_label']=pred
And i tried to get the shap values:
def shap_plot(j):
explainerModel = shap.TreeExplainer(model)
shap_values_Model = explainerModel.shap_values(df,check_additivity=False)
p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], S.iloc[[j]])
return(p)
And some examples I got are like:
The base value and predicted values are big which exceed the range. I wonder why does this happen? Are there any ways to solve this problem?
shap is explaining the anomaly score of the samples rather than the hard predictions. (This is quite common; e.g. in probabilistic classifiers, often the explanations take place in the log-odds space by default, rather than in probability space or hard classifications.) The
score_samples
anddecision_function
methods ofIsolationForest
are more relevant here thanpredict
. That said, it looks like the explanation is on another transformation of those scores; see this comment in the PR that added isolation forests to theTreeExplainer
.