What is a vectorized way to detect feature drift in python/pandas columns?

48 Views Asked by KingOtto At 11 March 2024 at 16:14

I'm working on very large pandas dataframes that hold time series with significant feature drift. The drift is often sudden (e.g., the features would be 1.5-2.0x times larger than a few periods earlier).

I found several solutions to detect 'concept drift'. One convenient option is river. However, the solution is not vectorized.

Clearly, vectorized approaches are much, much faster - the easiest for example using the pandas built-ins to take moving averages and look whether those change/jump df.groupby().mean().rolling().

What are vectorized ways to handle the above task?

Original Q&A

There are 1 best solutions below

mudskipper On 11 March 2024 at 17:36

One vectorized way to detect differences between successive rows is df[col].diff(). See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html

If you need to look at this inside known windows, you could perhaps combine this with a rolling average and threshold:

df[col].diff().rolling(window=5).mean() > threshold

What is a vectorized way to detect feature drift in python/pandas columns?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in FILTERING

Related Questions in FEATURE-ENGINEERING

Related Questions in DRIFT

Trending Questions

Popular # Hahtags

Popular Questions