I'm currently developing an unsupervised anomaly detection system using One-Class SVM, where my dataset includes the following features: date, location ID, daily numbers, day of the week, and a binary indicator for holidays. I've experimented with two approaches:
- Model 1: I train separate One-Class SVM models for each location ID.
- Model 2: I combine all location IDs into one model.
However, I've observed that Model 2 fails to detect certain obvious anomalies that Model 1 captures effectively. I'm seeking advice on the optimal approach and additional steps to analyse why Model 2 fails to detect these anomalies.
I used these features for training the data as I combined all of the locationID in one model. I have also tried excluded the locationID but same issue happened.
numerical_features = ['daily_numbers']
categorical_features = ['locationID','dayofweek','nathol']
The model doesn't detect the obvious anomaly on Saturday on this location ID 8
The model doesn't detect the obvious anomaly on Wednesday on this location ID 11
To add context to the datasets, I also notice the daily number for this location ID 8 is low compared to other locations, does this cause the model unable to detect the obvious anomaly for this location?
Average daily numbers of each location ID
Should I continue with separate models for each location ID, or is there a better way to combine them? Furthermore, what techniques can I use to understand the reasons behind the model's anomaly detection performance? Any insights or suggestions would be greatly appreciated. Thank you!