how to take average of the values per three days in a timeseries data

211 Views Asked by At

I have a time-series data that look like this:

datetime    | value
2016-01-01  | 1
2016-01-02  | 14
2016-01-03  | 3
2016-01-04  | 15
2016-01-05  | 5
2016-01-06  | 4
2016-01-07  | 7
2016-01-08  | 15

What I want to obtain is the average per three days (as well as per a week) and keep the last day as the index like this:

datetime    | value
2016-01-03  | 6
2016-01-06  | 8
2016-01-08  | 11

You may notice that the remaining 2 records need to be just averaged. Is there an handy way of doing this either in pandas dataframe or graphlab SFrame? If someone can share a relevant resource, that would be great!

I appreciate any help!

2

There are 2 best solutions below

0
On BEST ANSWER

You can use agg by numpy array created by floor division:

print (np.arange(len(df.index)) // 3)
[0 0 0 1 1 1 2 2]

df = df.groupby(np.arange(len(df.index)) // 3).agg({'datetime': 'last', 'value': 'mean'})
print (df)
     datetime  value
0  2016-01-03      6
1  2016-01-06      8
2  2016-01-08     11
0
On

Numpy array supports even-space selection:

>>> import numpy as np
>>> l = np.arange(10)
>>> l
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> l[0::3]
array([0, 3, 6, 9])

Pandas Series most of the time support Numpy features. In your case, instead of from index 0, iterate from 0 to 2 in for loop should do the work.