I have an issue that is easily solved with dplyr in R, yet can't seem to find an easy way in Python. I have a df with id(=customerid), s(=store), m(=month) and ttl(=total purchase) as columns. I would like to calculate multiple new columns on id+s - for example last 3 months purchase and minimum purchase.
Example (last two are new columns):
id s m ttl ttl_3 min_id_s
1 A 1/1/2020 7 nan 3
1 A 2/1/2020 3 nan 3
1 A 3/1/2020 7 17 3
1 A 4/1/2020 6 16 3
1 A 5/1/2020 7 20 3
1 A 6/1/2020 7 20 3
1 B 1/1/2020 6 nan 6
1 B 2/1/2020 10 nan 6
1 B 3/1/2020 8 24 6
1 B 4/1/2020 8 26 6
1 B 5/1/2020 10 26 6
1 B 6/1/2020 8 26 6
2 A 1/1/2020 4 nan 1
2 A 2/1/2020 3 15 1
2 A 3/1/2020 10 17 1
2 A 4/1/2020 6 19 1
2 A 5/1/2020 4 20 1
2 A 6/1/2020 1 11 1
I've tried the following:
grp = df.groupby(['id','s'])
df = df.assign(ttl_3 = grp['ttl'].apply(lambda x: x.rolling(window=3)).sum(), min_id_s = grp['ttl'].min())
I get the following error:
Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method
I know that it can be solved without assign, but then I'll have to have a line per each new column, and as I need plenty of these, I'm looking for a workaround.
I've also looked into add_columns with pyjanitor, but it doesn't seem to work with groupby.
In R, the following code solves the issue and in the mutate I can proceed adding columns:
df = df %>% group_by(id, s) %>% mutate(ttl_3 = runner(ttl, k=3, f=sum), min_id_s = min(ttl))
With python syntax and pandas the logic is nearly identical
Trying to come as close to dplyr as I could.
Group by
idandsand compute the new columns. Multiplerollingcolumns could be computed with theaggmethod.You can use
assign, but you also have to write one line per aggregation.Out:
Aggregate multiple columns for
rollingwithaggOut:
Aggregating multiple columns from multiple columns for
rollingwithagg. The columns have to be numerical.transformdoesn't support multiple aggregations as far as I know.Generate a random numerical column for a rolling aggregation
groupbywithout selected columns to aggregate from multiple columns. For example with a custom functionOut: