I've been googling for this for a while and haven't found a proper solution. I have a time series with a couple of million rows that has a rather odd structure:
VisitorID Time VisitDuration
1 01.01.2014 00:01 80 seconds
2 01.01.2014 00:03 37 seconds
I would want to know how many people are on the website during a certain moment. For this I would have to transform this data into something much bigger:
Time VisitorsPresent
01.01.2014 00:01 1
01.01.2014 00:02 1
01.01.2014 00:03 2
...
But doing something like this seems highly inefficient. My code would be:
dates = {}
for index, row in data.iterrows():
for i in range(0,int(row["duration"])):
dates[index+pd.DateOffset(seconds=i)] = dates.get(index+pd.DateOffset(seconds=i), 1) + 1
Then I could transfer this into a series and be able to resample it:
result = pd.Series(dates)
result.resample("5min",how="mean").plot()
Could you point me to a right direction?
EDIT---
Hi HYRY Here is a head()
uid join_time_UTC duration 0 1 2014-03-07 16:58:01 2953 1 2 2014-03-07 17:13:14 1954 2 3 2014-03-07 17:47:38 223
Create some dummy data first:
Here is what the data looks like:
Then do the value count:
here is the counts:
finally resample and plot:
the output is: