Let's say I fit IsolationForest()
algorithm from scikit-learn on time-series based Dataset1 or dataframe1 df1
and save the model using the methods mentioned here & here. Now I want to update my model for new dataset2 or df2
.
My findings:
- this workaround about Incremental learning from sklearn:
...learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time, there will be only a small amount of instances in the main memory. Choosing a good size for the mini-batch that balances relevancy and memory footprint could involve tuning.
but Sadly IF algorithm doesn't support estimator.partial_fit(newdf)
How I can update the trained on Dataset1 and saved IF model with a new Dataset2?
You can simply reuse the
.fit()
call available to the estimator on the new data.This would be preferred, especially in a time series, as the signal changes and you do not want older, non-representative data to be understood as potentially normal (or anomalous).
If old data is important, you can simply join the older training data and newer input signal data together, and then call
.fit()
again.Also sidenote, according to sklearn documentation, it is better to use
joblib
thanpickle
An MRE with resources below: