finding the duration of a wave from 1st major trough to next one for temperature data

125 Views Asked by At

hi have an algorithm to detect troughs and peeks of the series data for temperature but it needs some polishing

this is the graph of complete data complete graph

this is zoomed in version zoomed

as shown the algorithm returns even slight though so calculating time duration from 1st to next returns noisy data

so i need to polish the algorithm to consider only major throughs for example enter image description here

this is the algorithm i am using

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# Read the data from the CSV file
df = pd.read_csv('Nusrat.csv')

# Convert the first column to datetime format
df['column1'] = pd.to_datetime(df['column1'])

# Convert the second column to numeric type
df['column2'] = df['column2'].astype(int)

time_data = df['time']

# df['Column1'] = pd.to_datetime(df['Column1'])

period = 3

dn = [i for i in range(period, len(df) - period - 1) if
      (df.loc[i, 'column2'] < df.loc[i - period:i - 1, 'column2']).all() == True
      and (df.loc[i, 'column2'] <= df.loc[i + 1:i + period, 'column2']).all() == True]

up = [i for i in range(period, len(df) - period - 1) if
      (df.loc[i, 'column2'] > df.loc[i - period:i - 1, 'column2']).all() == True
      and (df.loc[i, 'column2'] >= df.loc[i + 1:i + period, 'column2']).all() == True]


fig, ax = plt.subplots()
ax.plot(df['column1'], df['column2'])
ax.plot(df.loc[dn, 'column1'], df.loc[dn, 'column2'], 'o', color='green', markersize=5)
ax.plot(df.loc[up, 'column1'], df.loc[up, 'column2'], 'o', color='red', markersize=5)
fig.autofmt_xdate()
plt.show()

difference = df.loc[dn, 'column1'].diff()
def time_difference(start_index, end_index):
    start_time = datetime.strptime(time_data[start_index], '%H:%M:%S.%f')
    end_time = datetime.strptime(time_data[end_index], '%H:%M:%S.%f')
    time_delta = end_time - start_time
    return time_delta
# print(dn)
# print(dn[0],"--",dn[1])

for i in range(len(dn)-1):
#     print(dn[i], "--", dn[i+1])
    print(df['column1'][dn[i]],(df['column1'][dn[i+1]]))
    print(time_difference(dn[i], dn[i+1]))

this is the data set in text as i cant provide the file here its not complete data

Column1,Column2,Time
2023-03-14 14:00:59.0,195.80,14:00:59.0
2023-03-14 14:02:06.0,174.20,14:02:06.0
2023-03-14 14:03:14.0,156.76,14:03:14.0
2023-03-14 14:04:21.0,142.36,14:04:21.0
2023-03-14 14:05:29.0,131.00,14:05:29.0
2023-03-14 14:06:37.0,122.00,14:06:37.0
2023-03-14 14:07:44.0,114.91,14:07:44.0
2023-03-14 14:08:52.0,109.18,14:08:52.0
2023-03-14 14:10:00.0,104.56,14:10:00.0
2023-03-14 14:11:07.0,100.74,14:11:07.0
2023-03-14 14:12:15.0,97.93,14:12:15.0
2023-03-14 14:13:22.0,95.45,14:13:22.0
2023-03-14 14:14:30.0,93.43,14:14:30.0
2023-03-14 14:15:37.0,91.85,14:15:37.0
2023-03-14 14:16:45.0,90.73,14:16:45.0
2023-03-14 14:17:53.0,89.49,14:17:53.0
2023-03-14 14:19:00.0,88.59,14:19:00.0
2023-03-14 14:20:08.0,87.91,14:20:08.0
2023-03-14 14:21:15.0,87.13,14:21:15.0
2023-03-14 14:22:23.0,86.68,14:22:23.0
2023-03-14 14:23:30.0,86.23,14:23:30.0
2023-03-14 14:24:38.0,86.23,14:24:38.0
2023-03-14 14:25:45.0,108.61,14:25:45.0
2023-03-14 14:26:53.0,142.70,14:26:53.0
2023-03-14 14:28:01.0,175.89,14:28:01.0
2023-03-14 14:29:08.0,203.79,14:29:08.0
2023-03-14 14:30:16.0,225.84,14:30:16.0
2023-03-14 14:31:23.0,241.25,14:31:23.0
2023-03-14 14:32:31.0,253.29,14:32:31.0
2023-03-14 14:33:39.0,262.18,14:33:39.0
2023-03-14 14:34:46.0,262.29,14:34:46.0
2023-03-14 14:35:54.0,262.29,14:35:54.0
2023-03-14 14:37:01.0,262.29,14:37:01.0
2023-03-14 14:38:09.0,260.83,14:38:09.0
2023-03-14 14:39:16.0,235.51,14:39:16.0
2023-03-14 14:40:24.0,208.85,14:40:24.0
2023-03-14 14:41:31.0,185.45,14:41:31.0
2023-03-14 14:42:39.0,166.33,14:42:39.0
2023-03-14 14:43:46.0,150.35,14:43:46.0
2023-03-14 14:44:54.0,137.41,14:44:54.0
2023-03-14 14:46:01.0,127.06,14:46:01.0
2023-03-14 14:47:09.0,118.96,14:47:09.0
2023-03-14 14:48:17.0,112.55,14:48:17.0
2023-03-14 14:49:24.0,107.15,14:49:24.0
2023-03-14 14:50:32.0,103.10,14:50:32.0
2023-03-14 14:51:39.0,99.61,14:51:39.0
2023-03-14 14:52:47.0,96.80,14:52:47.0
2023-03-14 14:53:54.0,94.55,14:53:54.0
2023-03-14 14:55:02.0,92.75,14:55:02.0
2023-03-14 14:56:09.0,91.18,14:56:09.0
2023-03-14 14:57:17.0,97.70,14:57:17.0
2023-03-14 14:58:24.0,127.06,14:58:24.0
2023-03-14 14:59:32.0,161.04,14:59:32.0
2023-03-14 15:00:39.0,190.85,15:00:39.0
2023-03-14 15:01:47.0,214.81,15:01:47.0
2023-03-14 15:02:55.0,233.38,15:02:55.0
2023-03-14 15:04:02.0,247.21,15:04:02.0
2023-03-14 15:05:10.0,256.66,15:05:10.0
2023-03-14 15:06:17.0,262.29,15:06:17.0
2023-03-14 15:07:25.0,262.29,15:07:25.0
2023-03-14 15:08:32.0,262.29,15:08:32.0
2023-03-14 15:09:40.0,262.29,15:09:40.0
2023-03-14 15:10:47.0,262.29,15:10:47.0
2023-03-14 15:11:55.0,246.31,15:11:55.0
2023-03-14 15:13:02.0,219.65,15:13:02.0
2023-03-14 15:14:10.0,194.56,15:14:10.0
2023-03-14 15:15:17.0,173.53,15:15:17.0
2023-03-14 15:16:25.0,156.43,15:16:25.0
2023-03-14 15:17:33.0,142.03,15:17:33.0
2023-03-14 15:18:40.0,130.78,15:18:40.0
2023-03-14 15:19:48.0,121.89,15:19:48.0
2023-03-14 15:20:55.0,114.80,15:20:55.0
2023-03-14 15:22:03.0,109.18,15:22:03.0
2023-03-14 15:23:10.0,104.68,15:23:10.0
2023-03-14 15:24:18.0,101.19,15:24:18.0
2023-03-14 15:25:25.0,98.26,15:25:25.0
2023-03-14 15:26:33.0,95.90,15:26:33.0
2023-03-14 15:27:41.0,93.88,15:27:41.0
2023-03-14 15:28:48.0,92.41,15:28:48.0
2023-03-14 15:29:56.0,91.06,15:29:56.0
2023-03-14 15:31:03.0,89.94,15:31:03.0
2023-03-14 15:32:11.0,89.04,15:32:11.0
2023-03-14 15:33:18.0,88.03,15:33:18.0
2023-03-14 15:34:26.0,87.35,15:34:26.0
2023-03-14 15:35:33.0,86.79,15:35:33.0
2023-03-14 15:36:41.0,86.34,15:36:41.0
2023-03-14 15:37:49.0,86.34,15:37:49.0
2023-03-14 15:38:56.0,108.39,15:38:56.0
2023-03-14 15:40:04.0,142.59,15:40:04.0
2023-03-14 15:41:11.0,175.33,15:41:11.0
2023-03-14 15:42:19.0,203.00,15:42:19.0
2023-03-14 15:43:26.0,224.94,15:43:26.0
2023-03-14 15:44:34.0,240.91,15:44:34.0
2023-03-14 15:45:41.0,252.39,15:45:41.0
2023-03-14 15:46:49.0,260.71,15:46:49.0
2023-03-14 15:47:56.0,262.29,15:47:56.0
2023-03-14 15:49:04.0,262.29,15:49:04.0
2023-03-14 15:50:11.0,262.29,15:50:11.0
2023-03-14 15:51:19.0,259.14,15:51:19.0
2023-03-14 15:52:26.0,233.60,15:52:26.0
2023-03-14 15:53:34.0,207.39,15:53:34.0
2023-03-14 15:54:41.0,183.99,15:54:41.0
2023-03-14 15:55:49.0,164.98,15:55:49.0
2023-03-14 15:56:57.0,149.00,15:56:57.0
2023-03-14 15:58:04.0,136.06,15:58:04.0
2023-03-14 15:59:12.0,125.94,15:59:12.0
2023-03-14 16:00:19.0,117.84,16:00:19.0
2023-03-14 16:01:27.0,111.43,16:01:27.0
2023-03-14 16:02:35.0,106.25,16:02:35.0
2023-03-14 16:03:42.0,102.31,16:03:42.0
2023-03-14 16:04:50.0,98.94,16:04:50.0
2023-03-14 16:05:57.0,96.35,16:05:57.0
2023-03-14 16:07:05.0,95.34,16:07:05.0
2023-03-14 16:08:12.0,117.84,16:08:12.0
2023-03-14 16:09:20.0,150.91,16:09:20.0
2023-03-14 16:10:27.0,183.09,16:10:27.0
2023-03-14 16:11:35.0,209.30,16:11:35.0
2023-03-14 16:12:42.0,229.33,16:12:42.0
2023-03-14 16:13:50.0,244.29,16:13:50.0
2023-03-14 16:14:57.0,255.65,16:14:57.0
2023-03-14 16:16:05.0,262.29,16:16:05.0
2023-03-14 16:17:13.0,262.29,16:17:13.0
2023-03-14 16:18:20.0,262.29,16:18:20.0
2023-03-14 16:19:28.0,262.29,16:19:28.0
2023-03-14 16:20:35.0,255.43,16:20:35.0
2023-03-14 16:21:43.0,229.44,16:21:43.0
2023-03-14 16:22:51.0,203.56,16:22:51.0
2023-03-14 16:23:58.0,181.06,16:23:58.0
2023-03-14 16:25:06.0,162.84,16:25:06.0
2023-03-14 16:26:13.0,147.54,16:26:13.0
2023-03-14 16:27:21.0,135.28,16:27:21.0
2023-03-14 16:28:28.0,125.60,16:28:28.0
2023-03-14 16:29:36.0,118.06,16:29:36.0
2023-03-14 16:30:43.0,111.76,16:30:43.0
2023-03-14 16:31:51.0,106.81,16:31:51.0
2023-03-14 16:32:59.0,102.88,16:32:59.0
2023-03-14 16:34:06.0,99.73,16:34:06.0
2023-03-14 16:35:14.0,97.25,16:35:14.0
2023-03-14 16:36:22.0,95.23,16:36:22.0
2023-03-14 16:37:29.0,93.54,16:37:29.0
2023-03-14 16:38:37.0,92.08,16:38:37.0
2023-03-14 16:39:44.0,90.84,16:39:44.0
2023-03-14 16:40:52.0,89.94,16:40:52.0
2023-03-14 16:42:00.0,89.04,16:42:00.0
2023-03-14 16:43:07.0,88.36,16:43:07.0
2023-03-14 16:44:15.0,87.80,16:44:15.0
2023-03-14 16:45:22.0,87.13,16:45:22.0
2023-03-14 16:46:30.0,86.68,16:46:30.0
2023-03-14 16:47:37.0,99.95,16:47:37.0
2023-03-14 16:48:45.0,132.58,16:48:45.0
2023-03-14 16:49:52.0,166.55,16:49:52.0
2023-03-14 16:51:00.0,195.58,16:51:00.0
2023-03-14 16:52:07.0,219.31,16:52:07.0
2023-03-14 16:53:15.0,236.86,16:53:15.0
2023-03-14 16:54:23.0,249.80,16:54:23.0
2023-03-14 16:55:30.0,259.93,16:55:30.0
2023-03-14 16:56:38.0,262.29,16:56:38.0
2023-03-14 16:57:45.0,262.29,16:57:45.0
2023-03-14 16:58:53.0,262.29,16:58:53.0
2023-03-14 17:00:00.0,262.29,17:00:00.0
2023-03-14 17:01:08.0,262.29,17:01:08.0
2023-03-14 17:02:15.0,262.29,17:02:15.0
2023-03-14 17:03:23.0,262.29,17:03:23.0
2023-03-14 17:04:31.0,262.29,17:04:31.0
2023-03-14 17:05:38.0,256.66,17:05:38.0
2023-03-14 17:06:46.0,229.10,17:06:46.0
2023-03-14 17:07:53.0,202.89,17:07:53.0
2023-03-14 17:09:01.0,180.28,17:09:01.0
2023-03-14 17:10:08.0,161.94,17:10:08.0
2023-03-14 17:11:16.0,147.09,17:11:16.0
2023-03-14 17:12:24.0,134.94,17:12:24.0
2023-03-14 17:13:31.0,125.38,17:13:31.0
2023-03-14 17:14:39.0,117.84,17:14:39.0
2023-03-14 17:15:46.0,111.88,17:15:46.0
2023-03-14 17:16:54.0,107.26,17:16:54.0
2023-03-14 17:18:02.0,103.33,17:18:02.0
2023-03-14 17:19:09.0,100.18,17:19:09.0
2023-03-14 17:20:17.0,97.70,17:20:17.0
2023-03-14 17:21:24.0,95.79,17:21:24.0
2023-03-14 17:22:32.0,94.10,17:22:32.0
2023-03-14 17:23:40.0,92.75,17:23:40.0
2023-03-14 17:24:47.0,91.74,17:24:47.0
2023-03-14 17:25:55.0,90.61,17:25:55.0
2023-03-14 17:27:02.0,89.83,17:27:02.0
2023-03-14 17:28:10.0,89.04,17:28:10.0
2023-03-14 17:29:17.0,88.59,17:29:17.0
2023-03-14 17:30:25.0,88.03,17:30:25.0
2023-03-14 17:31:32.0,87.69,17:31:32.0
2023-03-14 17:32:40.0,87.24,17:32:40.0
2023-03-14 17:33:47.0,86.90,17:33:47.0
2023-03-14 17:34:55.0,86.56,17:34:55.0
2023-03-14 17:36:03.0,86.23,17:36:03.0
2023-03-14 17:37:10.0,85.89,17:37:10.0
2023-03-14 17:38:18.0,85.66,17:38:18.0
2023-03-14 17:39:25.0,85.44,17:39:25.0
2023-03-14 17:40:33.0,85.21,17:40:33.0
2023-03-14 17:41:40.0,85.10,17:41:40.0
2023-03-14 17:42:48.0,92.30,17:42:48.0
2023-03-14 17:43:55.0,121.89,17:43:55.0
2023-03-14 17:45:03.0,156.65,17:45:03.0
2023-03-14 17:46:11.0,187.48,17:46:11.0
2023-03-14 17:47:18.0,212.34,17:47:18.0
2023-03-14 17:48:26.0,231.24,17:48:26.0
2023-03-14 17:49:33.0,245.41,17:49:33.0

the result i want is an arry that tells start time and end time of the wave with its duration time as it already works

the result

2023-04-03 23:20:09 2023-04-03 23:29:09
0:09:00
2023-04-03 23:29:09 2023-04-03 23:48:17
0:19:08
2023-04-03 23:48:17 2023-04-04 00:06:19
-1 day, 0:18:02
2023-04-04 00:06:19 2023-04-04 00:58:07
0:51:48
2023-04-04 00:58:07 2023-04-04 01:16:08
0:18:01
2023-04-04 01:16:08 2023-04-04 01:30:47
0:14:39
2023-04-04 01:30:47 2023-04-04 01:42:07
0:11:20
2023-04-04 01:42:07 2023-04-04 01:59:01
0:16:54
2023-04-04 01:59:01 2023-04-04 02:21:32
0:22:31
2023-04-04 02:21:32 2023-04-04 02:30:33
0:09:01
2023-04-04 02:30:33 2023-04-04 02:40:42
0:10:09
2023-04-04 02:40:42 2023-04-04 03:02:07
0:21:25
2023-04-04 03:02:07 2023-04-04 03:34:47
0:32:40
2023-04-04 03:34:47 2023-04-04 03:44:54
0:10:07
2023-04-04 03:44:54 2023-04-04 04:01:48
0:16:54
2023-04-04 04:01:48 2023-04-04 04:15:18
0:13:30
2023-04-04 04:15:18 2023-04-04 04:41:12
0:25:54
2023-04-04 04:41:12 2023-04-04 04:59:14
0:18:02
2023-04-04 04:59:14 2023-04-04 05:19:30
0:20:16
2023-04-04 05:19:30 2023-04-04 05:36:24
0:16:54
2023-04-04 05:36:24 2023-04-04 06:15:49
0:39:25
2023-04-04 06:15:49 2023-04-04 06:33:49
0:18:00
2023-04-04 06:33:49 2023-04-04 06:46:13
0:12:24
2023-04-04 06:46:13 2023-04-04 07:09:52
0:23:39
2023-04-04 07:09:52 2023-04-04 07:27:53
0:18:01
2023-04-04 07:27:53 2023-04-04 07:47:02
0:19:09
2023-04-04 07:47:02 2023-04-04 08:03:56
0:16:54
2023-04-04 08:03:56 2023-04-04 08:14:03
0:10:07
2023-04-04 08:14:03 2023-04-04 08:27:34
0:13:31
2023-04-04 08:27:34 2023-04-04 08:48:58
0:21:24
2023-04-04 08:48:58 2023-04-04 09:07:00
0:18:02
2023-04-04 09:07:00 2023-04-04 09:43:15
0:36:15
2023-04-04 09:43:15 2023-04-04 10:04:38
0:21:23
2023-04-04 10:04:38 2023-04-04 10:24:59
0:20:21
2023-04-04 10:24:59 2023-04-04 10:40:45
0:15:46
2

There are 2 best solutions below

2
On

I believe this should do most of what you want:

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# Read the data from the CSV file
df = pd.read_csv('data.csv')

# Convert the first column to datetime format
df['Column1'] = pd.to_datetime(df['Column1'])

# Convert the second column to numeric type
df['Column2'] = df['Column2'].astype(int)

time_data = df['Time']

def normalize(arr):
    copy = np.array(arr, dtype=np.float64)
    copy -= copy.mean()
    copy /= copy.max()
    return copy

col1 = np.array(df['Column1']).astype(np.int64)
col2 = normalize(df['Column2'])
first_derivative = normalize(np.diff(col2)/np.diff(col1))
sigma = 7e-2
peaks = np.where((col2[:-1]>0) & (first_derivative < sigma) & (first_derivative > -sigma))
valleys = np.where((col2[:-1]<0) & (first_derivative < sigma) & (first_derivative > -sigma))

plt.figure(figsize=(16,9))
plt.plot(col1,normalize(df['Column2']),'k')
plt.plot(col1[:-1] , first_derivative)
plt.plot(col1[peaks], col2[peaks], 'r*')
plt.plot(col1[valleys], col2[valleys], 'g*')
plt.show()

there is 1 step left which I will add later as I don't have time to do it now: you need to group all the peaks and valleys so at each section there is only 1 peak/valley. Then you can calculate metrics such as duration, frequency, etc. from that data.

Result of running this code on your provided CSV: enter image description here

Edit: Here is the second version of this code:

col1 = np.array(df['Column1']).astype(np.int64)
col2 = normalize(df['Column2'])
# We can use second derivative to find peaks. If there is peak in second derivative,
# we have a valley in the original data and if there is a valley in second derivative,
# there is a peak in the original data. Since second derivative should be fairly clean,
# we can use percentiles to find peaks and valleys in second derivative. This approach
# is not ideal but should still be fairly good for normalazied data.
first_derivative = normalize(np.diff(col2)/np.diff(col1))
second_derivative = normalize(np.diff(first_derivative)/np.diff(col1[:-1]))

# I found 3 by trial and error, so it's not a perfect value
percentile_threshold=3
peak_percentile = np.percentile(second_derivative,percentile_threshold)
valley_percentile = np.percentile(second_derivative,100-percentile_threshold)
peaks = np.where(second_derivative<peak_percentile)
valleys = np.where(second_derivative>valley_percentile)

# Since there are still multiple peaks and vallyes, we can choose to only select the first
# or last detected peak/valley (here I choose the last one). "gap" determines the minimum gap
# required between two peaks/valleys. Here I choose a gap of 10 incides but you can enforce this
# via time or other metrics. The hstack is a quick and dirty hack to make sure the last
# peak/valley is not missed

gap = 10
peaks = np.hstack((peaks[0],peaks[0][-1]+gap+1))
peaks = np.array([peaks[i] for i in range(len(peaks)-1) if peaks[i+1]-peaks[i]>gap])
valleys = np.hstack((valleys[0],valleys[0][-1]+gap+1))
valleys = np.array([valleys[i] for i in range(len(valleys)-1) if valleys[i+1]-valleys[i]>gap])


plt.figure(figsize=(16,9))
plt.plot(col1,col2,'k')
plt.plot(col1[:-2] , second_derivative)
plt.plot(col1[peaks], col2[peaks], 'r*')
plt.plot(col1[valleys], col2[valleys], 'g*')
plt.plot(col1,[peak_percentile for _ in range(len(col1))], 'r')
plt.plot(col1,[valley_percentile for _ in range(len(col1))], 'g')
plt.show()

final result: enter image description here

This solution works even if there is an upward or downward trend (but not both) in your data (points moving up or down). To prove this I can add a linear trend: enter image description here

or even a quadratic trend: enter image description here

As you can see the position of detected peaks and valleys remains the same.

All of that said, this was a hacky solution I came up with in less than an hour all together (not to mention that I'm not that good with math), I wouldn't trust it or use it for production code but if you're just playing around, it should do the trick.

0
On

I suggest you to have a look at scipy library, which is great when dealing with signal processing and especially the signal module.

First, it will help you minimize and simplify your code a lot, by using find_peaks method for example (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html) to find local minimas and extremas.

Here is what you can get from the data sample provided in your question:

import pandas as pd
from matplotlib import pyplot as plt
from scipy.signal import find_peaks

fig, ax = plt.subplots()
input = pd.read_csv("test.csv")

# List indices of signal peaks
peaks, _ = find_peaks(input["Column2"].values, height=0)
neg_peaks, _ = find_peaks(1/input["Column2"].values, height=0)

# Create columns in pandas DataFrame specifying which points are peaks
input["peaks"] = input.apply(lambda row: (row.name in peaks), axis=1)
input["neg_peaks"] = input.apply(lambda row: (row.name in neg_peaks), axis=1)

# Plotting
input.plot(ax=ax)
input[input["peaks"]].plot(ax=ax, linestyle="", marker="o")
input[input["neg_peaks"]].plot(ax=ax, linestyle="", marker="o")
plt.show()

Second, you can make use of find_peaks parameters to adjust the detection sensibility or you can just apply some filtering method to your signal like savgol_filter or gaussian_filter