Averaging a specific time range for every 6 minutes in a CSV file with Python

104 Views Asked by At

I was hoping you guys could help me with a Python coding question. I have a CSV file with over 5 million data points taken over the course of 24 hours. The file contains two columns - UTC time and intensity values. I would like to average all intensity data into 3 minute-ranges every 6 minutes (e.g. mean(x-1.5 min : x+1.5 min), where x is an array of 0, 6, 12, 18, etc.). I have attempted multiple avenues, but nothing seems to be working.

I am calling the data using:

df=pd.read_csv("file.csv",names=['Date','Intensity'], na_values='nan') 

Data example:

df=
'Date'                                   'Intensity'
2016-06-17 00:00:00.054951500            15299
2016-06-17 00:00:00.056950500            13248
2016-06-17 00:00:00.058948500            12838
2016-06-17 00:00:00.060942500            8920
2016-06-17 00:00:00.062947500            12631
2016-06-17 00:00:00.064942500            13455 
... (continues on for 24 hours)....

The idea is to make a 3 minute average every 6 minutes (excluding the first and last average). For example, the resulting data should be calculated and look like:

df=
                         'Date'                 'Intensity'
average(0 : 1.5 min)=        2016-06-17 00:00:00     14554              
average(4.5 : 7.5 min)=      2016-06-17 00:06:00     18243              
average (10.5 : 13.5 min)=   2016-06-17 00:12:00     20244              
average (16.5 : 19.5 min)=   2016-06-17 00:18:00     10722  
... 

As of yet I have tried using : pd.Timedelta command (which I can't seem to get right), as well as first converting all times to seconds and then calculating from there (but that seems to be far too much work). Does anyone have idea as to what I should do?

Thank you!

1

There are 1 best solutions below

3
On BEST ANSWER

I don't know how well do you know pandas, but assuming that you can load a data from CSV to DataFrame, what you can do is:

import pandas as pd

# load data from csv into intesity_df
...

# group by time
intensity_df = intensity_df.groupby(pd.Grouper(key='date', freq='3min'))