Using python pandas (but open to any other solution), I would like to up-sample a DataFrame, while keeping the last date from my input file. The default way of working of DataFrame.resample
is to compute the last day of the month. Here is my example:
>>> import pandas as pd
>>> import numpy as np
>>> begin = pd.datetime(2013,1,1)
>>> end = pd.datetime(2013,2,20)
>>> dtrange = pd.date_range(begin, end, freq='5D')
>>> values = np.random.rand(len(dtrange))*10
>>> df = pd.DataFrame({'values': values}, index=dtrange)
>>> df
values
2013-01-01 7.763089
2013-01-06 6.032173
2013-01-11 9.747979
2013-01-16 0.856741
2013-01-21 7.111047
2013-01-26 2.654279
2013-01-31 5.222770
2013-02-05 9.578846
2013-02-10 5.088311
2013-02-15 4.193273
2013-02-20 3.345553
>>> df.resample('M', how='last')
values
2013-01-31 5.222770
2013-02-28 3.345553
The output that I expect is:
values
2013-01-31 5.222770
2013-02-20 3.345553
Please note the date 2013-02-20. This is the true date from my input data, and not a date created by resample
.
Perhaps not the most fancy way, but you can always
groupby
your time frequency and apply a custom function returning what you want.A function to return the last value from the DataFrame:
Then groupby the month frequency and apply the function:
Which results in: