I was hoping you guys could help me with a Python coding question. I have a CSV file with over 5 million data points taken over the course of 24 hours. The file contains two columns - UTC time and intensity values. I would like to average all intensity data into 3 minute-ranges every 6 minutes (e.g. mean(x-1.5 min : x+1.5 min), where x is an array of 0, 6, 12, 18, etc.). I have attempted multiple avenues, but nothing seems to be working.
I am calling the data using:
df=pd.read_csv("file.csv",names=['Date','Intensity'], na_values='nan')
Data example:
df=
'Date' 'Intensity'
2016-06-17 00:00:00.054951500 15299
2016-06-17 00:00:00.056950500 13248
2016-06-17 00:00:00.058948500 12838
2016-06-17 00:00:00.060942500 8920
2016-06-17 00:00:00.062947500 12631
2016-06-17 00:00:00.064942500 13455
... (continues on for 24 hours)....
The idea is to make a 3 minute average every 6 minutes (excluding the first and last average). For example, the resulting data should be calculated and look like:
df=
'Date' 'Intensity'
average(0 : 1.5 min)= 2016-06-17 00:00:00 14554
average(4.5 : 7.5 min)= 2016-06-17 00:06:00 18243
average (10.5 : 13.5 min)= 2016-06-17 00:12:00 20244
average (16.5 : 19.5 min)= 2016-06-17 00:18:00 10722
...
As of yet I have tried using : pd.Timedelta command (which I can't seem to get right), as well as first converting all times to seconds and then calculating from there (but that seems to be far too much work). Does anyone have idea as to what I should do?
Thank you!
I don't know how well do you know pandas, but assuming that you can load a data from CSV to DataFrame, what you can do is: