Checking continuity of a excel datafile

192 Views Asked by At

I have a excel data file with the date time delimited. I want to check whether the data is continuous or not. i have tried to check it by counting the total number of columns and then matching with the supposed number of datapoints(number of rows). but this method has limitation when there is actually gaps. I'd need to identify those.

https://docs.google.com/spreadsheets/d/1T_xQIi-6nT3XLjIZthUuGuBfG2rDlXqlv95brYn5HyA/edit?usp=sharing

Data:

    date  month  year  hour  min  sec  normalised parameter
0     30     10    10    15   50    0               0.03256
1     30     10    10    16    0    0               0.02802
2     30     10    10    16   10    0               0.06172
3     30     10    10    16   20    0               0.05575
4     30     10    10    16   30    0               0.01257
5     30     10    10    16   40    0               0.04290
6     30     10    10    16   50    0               0.01149
7     30     10    10    17    0    0               0.02578
8     30     10    10    17   10    0               0.14456
9     30     10    10    17   20    0               0.15088
10    30     10    10    17   30    0               0.16388
11    30     10    10    17   40    0               0.13482
12    30     10    10    17   50    0               0.14490
13    30     10    10    18    0    0               0.15439
14    30     10    10    18   10    0               0.13844
15    30     10    10    18   20    0               0.15166
16    30     10    10    18   30    0               0.15164
17    30     10    10    18   40    0               0.13921
18    30     10    10    18   50    0               0.13138
19    30     10    10    19    0    0               0.13686
20    30     10    10    19   10    0               0.12743
21    30     10    10    19   20    0               0.12653
22    30     10    10    19   30    0               0.13013
23    30     10    10    19   40    0               0.14165
24    30     10    10    19   50    0               0.14974
25    30     10    10    20    0    0               0.15876
26    30     10    10    20   10    0               0.12383
27    30     10    10    20   20    0               0.09934
28    30     10    10    20   30    0               0.08373
29    30     10    10    20   40    0               0.11064
30    30     10    10    20   50    0               0.09099
31    30     10    10    21    0    0               0.09023
32    30     10    10    21   10    0               0.09880
33    30     10    10    21   20    0               0.08617

please help me out.

thanks,

sincerely

Ranjan

Edit, This is the code that I tried.

import csv

with open('/home/sci_lab/Desktop/shankar_test/RCM_data/l1-492.csv') as csv_file:

    csv_reader = csv.reader(csv_file, delimiter = ',')

    line_count = 0

    print(f'      jda           dt/mn/yr/hr/min')

    for row in csv_reader:

        if line_count == 0:

            line_count += 1

        else:

            print(f'\t{row[0]} {row[1]} {row[2]} {row[4]} {row[5]}')

            line_count += 1

    print(f' Processed {line_count} lines')

If the number of processed lines equals to the exact number of supposed rows, if it was continuous.

But the limitation is the fact that the gaps won;t be known, if there's some of those.

1

There are 1 best solutions below

2
On BEST ANSWER

Use pandas library to do a groupby and then count the number of records minute column

import pandas as pd

csv_path = '/home/sci_lab/Desktop/shankar_test/RCM_data/l1-492.csv'
df =  pd.read_csv(csv_path)
gdf = df.groupby(['year', 'month', 'date', 'hour'])['min'].count()

Output

year  month  date  hour
10    10     30    15      1
                   16      6
                   17      6
                   18      6
                   19      6
                   20      6
                   21      3

If the value is not 6, that hour does not have a value for all 10 minutes.

EDIT: You can filter for values that have 6

gdf.loc[gdf.eq(6)]