Stored procedure to find Peaks Separated by Seven Days

78 Views Asked by At

What could be the SQL Server stored procedure to select the highest three peaks separated by seven days.

I have tried the below code:

CREATE PROCEDURE SelectPeaksAndInsertIntoPeakFact 
AS
BEGIN
    -- Initialize starting date
    DECLARE @currentDate DATE = (SELECT MIN(dateTime) FROM Peak);   

    -- Initialize peak type
    DECLARE @peakType VARCHAR(5) = 'Peak1'; 

    WHILE @currentDate <= (SELECT MAX(dateTime) FROM Peak)
    BEGIN
        INSERT INTO Peak_fact (DataDate, peak, peakType)
            SELECT
                p.dateTime AS DataDate,
                MAX(p.SA_data) OVER (ORDER BY p.dateTime ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS peak,
                @peakType AS peakType
            FROM 
                Peak p
            WHERE 
                p.dateTime BETWEEN @currentDate AND DATEADD(DAY, 6, @currentDate);

        SET @currentDate = DATEADD(DAY, 7, @currentDate); -- Skip 7 days 
        SET @peakType = CASE @peakType WHEN 'Peak1' THEN 'Peak2' WHEN 'Peak2' THEN 'Peak3' ELSE 'Peak1' END; -- Cycle peak types
    END;
END;

But the result is incorrect, because for the Table 1

Value Date Notes
12 1/1/2023 Start 1
15 1/2/2023
13 1/3/2023
11 1/4/2023
10 1/5/2023
5 1/6/2023
31 1/7/2023 End 1 / Peak
9 1/8/2023
31 1/9/2023
21 1/10/2023
22 1/11/2023
34 1/12/2023
8 1/13/2023
5 1/14/2023
33 1/15/2023 Start 2 / Peak
22 1/16/2023
9 1/17/2023
18 1/18/2023
19 1/19/2023
22 1/20/2023
29 1/21/2023 End 2
10 1/22/2023
9 1/23/2023 Start 3
8 1/24/2023
29 1/25/2023
33 1/26/2023
55 1/27/2023
81 1/28/2023 Peak
3 1/29/2023 End 3
88 1/30/2023
44 1/31/2023
10 2/1/2023
20 2/2/2023
44 2/3/2023
83 2/4/2023
69 2/5/2023 Start 4
99 2/6/2023 Peak
20 2/7/2023
40 2/8/2023
10 2/9/2023
32 2/10/2023
33 2/11/2023 End 4
83 2/12/2023
63 2/13/2023
92 2/14/2023 Start 5 / Peak
65 2/15/2023
42 2/16/2023
84 2/17/2023
21 2/18/2023
23 2/19/2023
12 2/20/2023 End 5
92 2/21/2023
65 2/22/2023 Start 6
74 2/23/2023
83 2/24/2023 Peak
10 2/25/2023

The output should be what is in Table 2:

date Value Peak
1/7/2023 31 peak1
1/15/2023 33 peak2
1/28/2023 81 peak3
2/6/2023 99 peak1
2/14/2023 92 peak2
2/24/2023 83 peak3
1

There are 1 best solutions below

0
T N On

I have two approaches that you can consider. One fixes up your current stored procedure to produce your posted desired results. The second uses a more traditional definition of a local maximum.

Your original stored procedure had several issues that I have fixed up below.

  1. The original logic did not have a WHERE condition to exclude rows that were not peak rows. This caused all dates in the selected range to be inserted into the results, instead of just the selected peak. To fix this, the MAX() window function logic was moved to a subselect and a WHERE s.SA_data = peak condition was added to the outer. I also added a TOP 1 and ORDER BY to pick just the first peak date if there are multiple peak value matches.
  2. The @CurrentDate logic was adding 7 to the current value instead of calculating a new start based on skipping 7 days following each selected peak. Fixing this requires capturing the selected peak date and using it for both the results insert and the updated @CurrentDate calculation - SET @currentDate = DATEADD(DAY, 8, @Datadate).
  3. The (SELECT MAX(dateTime) FROM Peak) ending value calculation for the WHILE loop was moved to a variable to avoid repeated/redundant queries. This doesn't affect the results, but improves performance slightly.
  4. The MAX() window function logic is already contained in a query that limits the rows to the desired 7-day range, so a separate ROWS BETWEEN ... qualifier is unnecessary and has been dropped. The remaining empty OVER() qualifier then just refers to the entire 7-day range. This doesn't affect the results, but simplifies the code.

The updated stored procedure would then be:

CREATE PROCEDURE SelectPeaksAndInsertIntoPeakFact 
AS
BEGIN
    DECLARE @currentDate DATE = (SELECT MIN(dateTime) FROM Peak); -- Initialize starting date
    DECLARE @lastDate DATE = (SELECT MAX(dateTime) FROM Peak);
    DECLARE @peakType VARCHAR(5) = 'Peak1'; -- Initialize peak type

    WHILE @currentDate <= @lastDate
    BEGIN
        DECLARE @Datadate DATE;
        DECLARE @peak INT;
      
        SELECT TOP 1
            @Datadate = s.dateTime,
            @peak = s.peak
        FROM (
            SELECT
                p.*,
                MAX(p.SA_data) OVER() AS peak
            FROM Peak p
            WHERE p.dateTime BETWEEN @currentDate AND DATEADD(DAY, 6, @currentDate)
        ) s
        WHERE s.SA_data = peak
        ORDER BY s.dateTime;

        INSERT INTO Peak_fact (DataDate, peak, peakType)
        VALUES (@Datadate, @peak, @peakType);

        SET @currentDate = DATEADD(DAY, 8, @Datadate); -- Skip 7 days 
        SET @peakType = CASE @peakType WHEN 'Peak1' THEN 'Peak2' WHEN 'Peak2' THEN 'Peak3' ELSE 'Peak1' END; -- Cycle peak types
    END;
END

Results:

DataDate peak peakType
2023-01-07 31 Peak1
2023-01-15 33 Peak2
2023-01-28 81 Peak3
2023-02-06 99 Peak1
2023-02-14 92 Peak2
2023-02-24 83 Peak3

As an alternative more traditional local peak calculation, you can define a peak as a point where the current value equals or exceeds all other values within a sliding +/- 7-day window (or chose your window size). In the case of level plateaus having one or more adjacent or closely-spaced equal peak values, the earliest can be chosen.

SELECT
    s.dateTime AS DataDate,
    s.SA_data AS peak,
    CONCAT(
        'Peak',
        (((ROW_NUMBER() OVER(ORDER BY s.dateTime)) - 1) % 3 + 1)
        ) AS peakType
FROM (
    SELECT
        *,
        MAX(p.SA_data) OVER(
           ORDER BY p.dateTime ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING
           ) AS Max7Before,
        MAX(p.SA_data) OVER(
           ORDER BY p.dateTime ROWS BETWEEN 1 FOLLOWING AND 7 FOLLOWING
           ) AS Max7After
    FROM Peak p
) s
WHERE s.SA_data > s.Max7Before
AND s.SA_data >= s.Max7After
ORDER BY s.dateTime;

The above logic will not select a peak from among the middle of a (mostly) continuously ascending or descending series of values. Only when the values form a true local maximum (within a +/- 7 day window) will a peak value be selected.

Results:

DataDate peak peakType
2023-01-12 34 Peak1
2023-02-06 99 Peak2
2023-02-14 92 Peak3

Notes:

  • 2023-01-07 (31) was not selected because a larger value (34) exists for 2023-01-12 (+5 days).
  • 2023-01-12 (34) was selected because that is a true local maximum.
  • 2023-01-15 (33) was not selected because a larger value (34) exists for 2023-01-12 (-3 days).
  • 2023-01-28 (81) was not selected because a larger value (88) exists for 2023-01-30 (+2 days).
  • 2023-01-30 (88) was not selected because a larger value (99) exists for 2023-02-06 (+7 days).
  • 2023-02-06 (99) was selected because that is a true local maximum.
  • 2023-02-14 (92) was selected because that is a true local maximum.
  • 2023-02-21 (92) was not selected bacause it is equal to the 2023-02-14 value (-7 days).
  • 2023-02-24 (83) was not selected because a larger value (92) exists for 2023-02-21 (-3 days).

Adjusting the window size to a range other than +/- 7 days may increase or decrease the number of detected local maximums.

See this db<>fiddle for a demo of both approaches.