Efficient pandas rolling count over date range by group - Python 2.7.13 Windows - Pandas 0.20.2

1.6k Views Asked by At

I'm working to perform a rolling time-aware count within a grouped subset of a data frame, which would dynamically create new columns for each ActivityType an and ContactId as of a certain date.

I believe I'm using the correct syntax when calling the groupby and rolling methods, but I'm still receiving an TypeError: incompatible index of inserted column with frame index - which I'm not sure why.

Your time and help is greatly appreciated.

Expected output:

ContactId ActivityType       Date  Call_90D  Meeting_90D  Conference_90D  Webinar_90D
David     Call          01/01/2015      1         0            0               0
David     Meeting       01/17/2015      1         1            0               0
David     Call          02/12/2015      2         1            0               0
Ryan      Conference    01/01/2015      0         0            1               0
Ryan      Meeting       01/17/2015      0         1            1               0
Ryan      Webinar       02/12/2015      0         1            1               1
Louis     Call          03/29/2015      1         0            0               0
Louis     Call          03/30/2015      2         0            0               0
Louis     Webinar       09/01/2015      0         0            0               1
Bryan     Call          01/12/2015      1         0            0               0
Bryan     Call          01/17/2015      2         0            0               0
Bryan     Call          02/01/2015      3         0            0               0  

code

import pandas as pd

data = [['David', 'Call', '1/1/2015'], ['David', 'Meeting', '1/17/2015'], ['David', 'Call', '2/12/2015'],
    ['Ryan', 'Conference', '1/1/2015'], ['Ryan', 'Meeting', '1/17/2015'], ['Ryan', 'Webinar', '2/12/2015'],
    ['Louis', 'Call', '3/29/2015'], ['Louis', 'Call', '3/30/2015'], ['Louis', 'Webinar', '9/1/2015'],
    ['Bryan', 'Call', '1/12/2015'], ['Bryan', 'Call', '1/17/2015'], ['Bryan', 'Call', '2/1/2015']]

df = pd.DataFrame(data, columns=['ContactId', 'ActivityType', 'Date'])
df.Date = pd.to_datetime(df.Date)
activities = df['ActivityType'].values.tolist()
for a in activities:
    df[a] = df['ActivityType'].apply(lambda x: 1 if x == a else 0)
    new_col_name = '%s_90D' % a
    df[new_col_name] = df.groupby(by='ContactId', as_index=False, group_keys=False).rolling(window='90D', on='Date')[a].sum()
    del df[a]
print(df)

Error:

line 20, in <module>
df[new_col_name] = df.groupby(by='ContactId', as_index=False, group_keys=False).rolling(window='90D', on='Date')[a].sum()
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2331, in __setitem__
self._set_item(key, value)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2397, in _set_item
value = self._sanitize_column(key, value)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2547, in _sanitize_column
value = reindexer(value)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2542, in reindexer
raise TypeError('incompatible index of inserted column '
TypeError: incompatible index of inserted column with frame index
0

There are 0 best solutions below