How does the 'on' parameter in pandas.Rolling work?

62 Views Asked by guillaume V At 19 February 2024 at 10:39

I'm trying to apply an aggregation function to a DataFrame based on a periodic column, for example a date :

df = pd.DataFrame({"values": [3,2,1,1000,1,3,1],"day":[1,2,3,1,2,3,1]})
df
   values  day
0     3    1
1     2    2
2     1    3
3  1000    1
4     1    2
5     3    3
6     1    1

I'd like to find the max of 2 consecutive "day 1", "day 2" and so on. I got the result I wanted using this line, which first sorts the values according to the day column:

df.sort_values("day").rolling(2).sum().sort_index()

I thought this was the behavior of the "on" parameter in te DataFrame.rolling method, since according to the pandas documentation:

on : str, optional

For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index.

df.rolling(2,on="day").max()

However, when I try that, the only effect it has is to ignore the column in the calculation, but there is no re-sorting.

   values  day
0     NaN    1
1     3.0    2
2     2.0    3
3  1000.0    1
4  1000.0    2
5     3.0    3
6     3.0    1

Why does sorting the DataFrame work ? When calculating the rolling window, I thought it would only take into account the value of the index (unchanged after sorting) and not its location in the dataframe.

Original Q&A

There are 1 best solutions below

Oliver Longhurst On 19 February 2024 at 17:35 BEST ANSWER

To answer your implicit first question: Why was the effect it had to ignore the column in the calculation?

Pandas docs:

on : str, optional ... Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

I suspect day is an integer column, and thus was ignored.

For your second question: Why does sorting work, is that not how the on parameter is supposed to behave?

the on parameter allows you to specify a column, and Pandas assumes it is already sorted.

What is the on parameter meant for?

Pandas uses the index by default, but if, for example, you have a DataFrame with a datetime column different from the index and you want to calculate a 7-day rolling average based on dates in that column, you can specify this column with the on parameter.

That is the gist

How does the 'on' parameter in pandas.Rolling work?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in ROLLING-COMPUTATION

Trending Questions

Popular # Hahtags

Popular Questions