i am trying to find anomalies in my dataset of 1000+ documents. I'm using LIME ML Interpreter to be able to explain the model (Isolation Forest) predictions. In one parameter "mode" i am able to choose between Classification and Regression. I do not have a set of documents with a known anomaly. Since Isolation Forest is a unsupervised learning method and classifcation is a type of supervised learning which is used to clasify observations into two or more classses i ended up using regression. On the other side i have the outcome anomaly or no anomaly.
What is right to use here?
Best Regards, Elle
The other option I see to this is to hold out 10-20% of the data set during IsoForest tree building. On this holdout to score the model and get the anomaly score (or avg tree depth) and build the explainer on this. Then in scoring new data, LIME will treat it as a regression problem...I am not sure how well this will work though...