In python I wrote a generator which returns what I call a 'value-centred' sliding window over the data. For example, given the data:
v = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
which when called like this produces:
for index, window in sliding_window_iter(v, 3):
print("value={} window={}".format(v[index], window))
value=1 window=[1, 2, 3]
value=2 window=[1, 2, 3]
value=3 window=[2, 3, 4]
value=4 window=[3, 4, 5]
value=5 window=[4, 5, 6]
value=6 window=[5, 6, 7]
value=7 window=[6, 7, 8]
value=8 window=[7, 8, 9]
value=9 window=[8, 9, 10]
value=10 window=[8, 9, 10]
As you can see it produces a tuple as output: (centre_value, window)
How might I re-implement this in polars?
The closest thing to it would be:
df.group_by_dynamic with include_boundaries=True but obviously it's not the same.
I'd prefer an implementation that is stream oriented (i.e.: which does not require reading in all the data into memory).
TLDR. For a window of size
2*k + 1the following can be used.Explanation
In sounds like the value-centered sliding window is defined only for odd window sizes (such that there is a unique center). In the following, we therefore consider window sizes of the form
window_size = 2*k + 1for some positivek.Example data.
Indeed, shifting the result of
polars.DataFrame.rollingwith aperiod = 2*k + 1by-kmostly does the correct thing here.Note. I visualise the result by aggregating the window elements into a list, but really any aggregation could be used.
This is correct except for
krows, where the window does not contain enough elements, andkrows, which are missing after the shift.Now, the idea is to use a simple
pl.when().then()construct to also overwrite the firstkrows withNone. This way, the first and lastkrows are missing.Finally, we can use a forward/backward fill to fill the missing rows with the desired values.
Note. The initial shift from before was moved inside the
pl.when().then()construct. It would've also been possible to set theoffsetparameter ofpl.DataFrame.rollingto-k-1. However, then we'd need to set the first and lastkrows of the column toNone.