Combine chain of dates where dates are less than few seconds apart

45 Views Asked by At

I have a list of timestamps for data acquisition. timestamps close together belong to a cycle. I want to enumerate these cycles. So whenever between two timestamps is more then 100 seconds then it will create the next cycle.

CREATE TABLE [Cycles](
    [Cycle] [int] NOT NULL,
    [CycleStart] [datetime] NOT NULL,
    [CycleEnd] [datetime] NOT NULL,
 CONSTRAINT [PK_Cycles] PRIMARY KEY CLUSTERED 
(
    [Cycle] DESC
))
INSERT INTO [Cycles] VALUES
(10,'2023-12-04T9:00:00','2023-12-04T10:00:00'),
(11,'2023-12-04T21:00:00','2023-12-04T22:00:00'),
(12,'2023-12-04T23:00:00','2023-12-05T00:00:00')
CREATE TABLE [Data](
    [datatimestamp] [datetime] NOT NULL,
 CONSTRAINT [PK_Data] PRIMARY KEY NONCLUSTERED 
(
    [datatimestamp] ASC
))
INSERT INTO [Data] VALUES
('2023-12-05T00:05:20'),
('2023-12-05T00:05:21'),
('2023-12-05T00:05:22'),
('2023-12-05T00:10:01'),
('2023-12-05T00:10:02'),
('2023-12-05T00:10:03')

So I would need to add to Cycles 13 and 14

Here is what I could do as a Select:

DECLARE @lastCycle int = (SELECT TOP 1 Cycle FROM Cycles ORDER BY Cycle DESC);
DECLARE @lastCycleEnd datetime = (SELECT TOP 1 CycleEnd FROM Cycles ORDER BY Cycle DESC);
WITH marks AS (
    SELECT datatimestamp, 
    CASE 
        WHEN DATEDIFF(Second, LAG(datatimestamp, 1, DATEADD(Second, -101, datatimestamp)) OVER (ORDER BY datatimestamp), datatimestamp) > 100 
        THEN 1 ELSE 0 
    END AS NextC
    FROM [Data] 
    WHERE datatimestamp > @lastCycleEnd 
)
SELECT @lastCycle + ROW_NUMBER() OVER (ORDER BY d.datatimestamp) AS Cycle, d.datatimestamp AS CycleBegin 
FROM [Data] d
INNER JOIN marks m On m.datatimestamp = d.datatimestamp
WHERE m.NextC = 1

Which will return the new Cycles and their CycleStarts, as the result for the example data would look like this:

Cycle CycleStart
13 2023-12-05 00:05:20
14 2023-12-05 00:10:01

How can I get the CycleEnd aswell as the third column?

2

There are 2 best solutions below

3
GarethD On BEST ANSWER

Instead of filtering on NextC as the final step, once you have your data from the marks CTE. i.e.

datatimestamp NextC
2023-12-05T00:05:20 1
2023-12-05T00:05:21 0
2023-12-05T00:05:22 0
2023-12-05T00:10:01 1
2023-12-05T00:10:02 0
2023-12-05T00:10:03 0

You need to then do SUM(NextC) OVER(ORDER BY datatimestamp), which will give you a value for each group by timestamps, i.e.

datatimestamp Cycle
2023-12-05T00:05:20 1
2023-12-05T00:05:21 1
2023-12-05T00:05:22 1
2023-12-05T00:10:01 2
2023-12-05T00:10:02 2
2023-12-05T00:10:03 2

You can then group on this column and get the min and max datetimes to get your start/end. So your final query would be:

DECLARE @lastCycle int = (SELECT TOP 1 Cycle FROM Cycles ORDER BY Cycle DESC);
DECLARE @lastCycleEnd datetime = (SELECT TOP 1 CycleEnd FROM Cycles ORDER BY Cycle DESC);
WITH marks AS (
    SELECT datatimestamp, 
   CASE 
        WHEN DATEDIFF(Second, LAG(datatimestamp, 1, DATEADD(Second, -101, datatimestamp)) OVER (ORDER BY datatimestamp), datatimestamp) > 100 
        THEN 1 ELSE 0 
    END AS NextC
    FROM [Data] 
    WHERE datatimestamp > @lastCycleEnd 
), marks2 AS (
  SELECT m.DataTimeStamp, SUM(m.NextC) OVER (ORDER BY m.DataTimeStamp) AS Cycle
  FROM marks AS m)
SELECT @lastCycle + ROW_NUMBER() OVER (ORDER BY m.Cycle) AS Cycle, 
  MIN(m.datatimestamp) AS CycleBegin ,
  MAX(m.datatimestamp) AS CycleEnd
FROM marks2 m 
GROUP BY m.Cycle;

Example on db<>fiddle

0
Salman A On

You are close. Add an additional step that calculates running sum of NextC ordered by date. This will number each "set" of timestamps; perform a group by on that column.

with cte1 as (
    select datatimestamp, case when datediff(second, lag(datatimestamp) over (order by datatimestamp), datatimestamp) < 100 then 0 else 1 end as nextc
    from (values
        ('2023-12-05 00:05:20'),
        ('2023-12-05 00:05:21'),
        ('2023-12-05 00:05:22'),
        ('2023-12-05 00:10:01'),
        ('2023-12-05 00:10:02'),
        ('2023-12-05 00:10:03')
    ) as t(datatimestamp)
), cte2 as (
    select *, sum(nextc) over (order by datatimestamp) as grpnum
    from cte1
)
select min(datatimestamp), max(datatimestamp)
from cte2
group by grpnum