I'm working on very large pandas dataframes that hold time series with significant feature drift. The drift is often sudden (e.g., the features would be 1.5-2.0x times larger than a few periods earlier).
I found several solutions to detect 'concept drift'. One convenient option is river. However, the solution is not vectorized.
Clearly, vectorized approaches are much, much faster - the easiest for example using the pandas built-ins to take moving averages and look whether those change/jump df.groupby().mean().rolling().
What are vectorized ways to handle the above task?
One vectorized way to detect differences between successive rows is
df[col].diff(). See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.htmlIf you need to look at this inside known windows, you could perhaps combine this with a rolling average and threshold: