I have
- a list of dates (size=7713):
[date(1994,5,25), ..., date(2023,12,19)]
- a list of times (basically 5 minute interval of a day, size=288):
[time(0, 0), ..., time(23, 55)]
I want to build a DatetimeIndex, where its full "product" combine, i.e., combine all time with each date. In the end, the size of the index is 7713 * 288.
My current solution is
datetimes = [pd.Timestamp.combine(d, t) for d in dates for t in times]
index = pd.DatetimeIndex(datetimes)
However it is super slow
- combining takes around 9 seconds
- building index takes around 9 seconds again
I tried directly give datetimes to pd.DataFrame(index=datetimes, ....)
, it takes 6 seconds (better than 9 seconds if I build index beforehand).
Do we have much faster way for this?
Your code is slow because it runs
pd.Timestamp.combine
in a double for loop.Instead, use
pd.merge
with typecross
to build the "product" first. Then do eitherdatetime.combine
or a string summation as vector operations to create a joint set. Is either one of these two faster?This gives you the desired result:
Or, keep the date and time separate as a multiindex (depending on what analysis is coming down the road, maybe pivoting directly on time of day is good enough). Should spare you the effort/time needed to
combine
: