LIME ML Interpreter mode Classification or Regression for Isolation Forest (Anomaly Detection)

907 Views Asked by At

i am trying to find anomalies in my dataset of 1000+ documents. I'm using LIME ML Interpreter to be able to explain the model (Isolation Forest) predictions. In one parameter "mode" i am able to choose between Classification and Regression. I do not have a set of documents with a known anomaly. Since Isolation Forest is a unsupervised learning method and classifcation is a type of supervised learning which is used to clasify observations into two or more classses i ended up using regression. On the other side i have the outcome anomaly or no anomaly.

What is right to use here?

Best Regards, Elle

3

There are 3 best solutions below

0
On

The other option I see to this is to hold out 10-20% of the data set during IsoForest tree building. On this holdout to score the model and get the anomaly score (or avg tree depth) and build the explainer on this. Then in scoring new data, LIME will treat it as a regression problem...I am not sure how well this will work though...

0
On

For us, what we have done is as follows:

  1. Use Isolation Forest to get anomalies.
  2. Treat 1 and -1 returned by Isolation Forest as class labels and build a Random Forest classifier.
  3. Pass this Random Forest classifier to LIME to get explanation of anomalous points.

We are also trying to find a better option instead of building second level Random Forest classifier.

0
On

Not directly about LIME, but Shapley values can be used to create similar explanations for IsolationForest. See this answer.