Datetime upsampling

367 Views Asked by At

I have a dataframe like such:

rows = [['bob', '01/2017', 12],
        ['bob', '02/2017', 14],
        ['bob', '03/2017', 16],
        ['julia', '01/2017', 18],
        ['julia', '02/2017', 16],
        ['julia', '03/2017', 24]]

df = pd.DataFrame(rows, columns = ['name','date','val'])

Assuming that each month has four weeks (i will use a lookup to match month to num weeks, but for simplicity assume 4), I want to create a row for each person for each week of the month where the value is the months value divided by 4 (or n_weeks).

I tried using .resample() and .asfreq() but they told me I needed a unique index.

How can I do this in pandas?

EDIT

Ok so i got this:

weekly = df.groupby('name').apply(lambda g: g.set_index('date').resample('w').pad().reset_index()).reset_index(drop=True)

weekly.val/4

    date    name    val
0   2017-01-01  bob 3
1   2017-01-08  bob 3
2   2017-01-15  bob 3
3   2017-01-22  bob 3
4   2017-01-29  bob 3
5   2017-02-05  bob 3.5
6   2017-02-12  bob 3.5
7   2017-02-19  bob 3.5
8   2017-02-26  bob 3.5
9   2017-03-05  bob 4
10  2017-01-01  julia   4.5
11  2017-01-08  julia   4.5
12  2017-01-15  julia   4.5
13  2017-01-22  julia   4.5
14  2017-01-29  julia   4.5
15  2017-02-05  julia   4
16  2017-02-12  julia   4
17   2017-02-19 julia   4
18  2017-02-26  julia   4
19  2017-03-05  julia   6

My problem is still that it's not distributing the last month of each group.

1

There are 1 best solutions below

0
On BEST ANSWER

So someone answered this partially but then deleted it before I could copy it, but I think i figured out what they were going for:

So from this dataframe (created in the question)

    name    date    val
0   bob 01/2017 12
1   bob 02/2017 14
2   bob 03/2017 16
3   julia   01/2017 18
4   julia   02/2017 16
5   julia   03/2017 24

I can do:

from pandas.tseries.offsets import *
df['date']=pd.to_datetime(df.date)

min_date = df.date.min()+MonthBegin(0)
max_date = df.date.max()+MonthEnd(0)
dr = pd.date_range(min_date, max_date,freq='w')

weekly = df.groupby('name').apply(lambda g: g.set_index('date')
         .reindex(dr,method='pad').reset_index()).reset_index(drop=True)

and get

    index      name val
0   2017-01-01  bob 12
1   2017-01-08  bob 12
2   2017-01-15  bob 12
3   2017-01-22  bob 12
4   2017-01-29  bob 12
5   2017-02-05  bob 14
6   2017-02-12  bob 14
7   2017-02-19  bob 14
8   2017-02-26  bob 14
9   2017-03-05  bob 16
10  2017-03-12  bob 16
11  2017-03-19  bob 16
12  2017-03-26  bob 16
13  2017-01-01  julia   18
14  2017-01-08  julia   18
15  2017-01-15  julia   18
16  2017-01-22  julia   18
17  2017-01-29  julia   18
18  2017-02-05  julia   16
19  2017-02-12  julia   16
20  2017-02-19  julia   16
21  2017-02-26  julia   16
22  2017-03-05  julia   24
23  2017-03-12  julia   24
24  2017-03-19  julia   24
25  2017-03-26  julia   24