The source code in question is
import numpy as np
dd=lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))
df.rolling(window=period, min_periods=1).apply(dd)
It takes an extremely long time to execute the above 2 lines of code. It is with latest pandas version(1.4.0). The dataframe has 3000 rows and 2000 columns only.
Same code with previous pandas version(0.23.x) provides result much faster.
I've tried with other suggessions and questions like Slow performance of pandas groupby/apply but are of not much help.
period is a int variable with value 250.
These are not a solution, at most workarounds for simple cases like the example function. But it confirms the suspicion that the processing speed of
df.rolling.applyis anything but optimal.Using a much smaller dataset for obvious reasons
Running time with
pandasv1.3.5Against a
numpyimplementation8.72*1000 / 3.39 = 2572.27x speedup.Processing columns in chunks
Using
pandasnumbaengineWe can get even faster with
pandassupport fornumbajitted functions. Unfortunatelynumba v0.55.1can't compileufunc.accumulate. We have to write our own implementation ofnp.fmax.accumulate(no guarantees on my implementation). Please note that the first call is slower because the function needs to be compiled.We can use the familiar pandas interface and it's ~1.16x faster than my chunked
numpyapproach fordf.shape(2000,2000).