I have a excel data file with the date time delimited. I want to check whether the data is continuous or not. i have tried to check it by counting the total number of columns and then matching with the supposed number of datapoints(number of rows). but this method has limitation when there is actually gaps. I'd need to identify those.
https://docs.google.com/spreadsheets/d/1T_xQIi-6nT3XLjIZthUuGuBfG2rDlXqlv95brYn5HyA/edit?usp=sharing
Data:
date month year hour min sec normalised parameter
0 30 10 10 15 50 0 0.03256
1 30 10 10 16 0 0 0.02802
2 30 10 10 16 10 0 0.06172
3 30 10 10 16 20 0 0.05575
4 30 10 10 16 30 0 0.01257
5 30 10 10 16 40 0 0.04290
6 30 10 10 16 50 0 0.01149
7 30 10 10 17 0 0 0.02578
8 30 10 10 17 10 0 0.14456
9 30 10 10 17 20 0 0.15088
10 30 10 10 17 30 0 0.16388
11 30 10 10 17 40 0 0.13482
12 30 10 10 17 50 0 0.14490
13 30 10 10 18 0 0 0.15439
14 30 10 10 18 10 0 0.13844
15 30 10 10 18 20 0 0.15166
16 30 10 10 18 30 0 0.15164
17 30 10 10 18 40 0 0.13921
18 30 10 10 18 50 0 0.13138
19 30 10 10 19 0 0 0.13686
20 30 10 10 19 10 0 0.12743
21 30 10 10 19 20 0 0.12653
22 30 10 10 19 30 0 0.13013
23 30 10 10 19 40 0 0.14165
24 30 10 10 19 50 0 0.14974
25 30 10 10 20 0 0 0.15876
26 30 10 10 20 10 0 0.12383
27 30 10 10 20 20 0 0.09934
28 30 10 10 20 30 0 0.08373
29 30 10 10 20 40 0 0.11064
30 30 10 10 20 50 0 0.09099
31 30 10 10 21 0 0 0.09023
32 30 10 10 21 10 0 0.09880
33 30 10 10 21 20 0 0.08617
please help me out.
thanks,
sincerely
Ranjan
Edit, This is the code that I tried.
import csv
with open('/home/sci_lab/Desktop/shankar_test/RCM_data/l1-492.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter = ',')
line_count = 0
print(f' jda dt/mn/yr/hr/min')
for row in csv_reader:
if line_count == 0:
line_count += 1
else:
print(f'\t{row[0]} {row[1]} {row[2]} {row[4]} {row[5]}')
line_count += 1
print(f' Processed {line_count} lines')
If the number of processed lines equals to the exact number of supposed rows, if it was continuous.
But the limitation is the fact that the gaps won;t be known, if there's some of those.
Use
pandas
library to do agroupby
and thencount
the number of records minute columnOutput
If the value is not 6, that hour does not have a value for all 10 minutes.
EDIT: You can filter for values that have 6