What could be the SQL Server stored procedure to select the highest three peaks separated by seven days.
I have tried the below code:
CREATE PROCEDURE SelectPeaksAndInsertIntoPeakFact
AS
BEGIN
-- Initialize starting date
DECLARE @currentDate DATE = (SELECT MIN(dateTime) FROM Peak);
-- Initialize peak type
DECLARE @peakType VARCHAR(5) = 'Peak1';
WHILE @currentDate <= (SELECT MAX(dateTime) FROM Peak)
BEGIN
INSERT INTO Peak_fact (DataDate, peak, peakType)
SELECT
p.dateTime AS DataDate,
MAX(p.SA_data) OVER (ORDER BY p.dateTime ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS peak,
@peakType AS peakType
FROM
Peak p
WHERE
p.dateTime BETWEEN @currentDate AND DATEADD(DAY, 6, @currentDate);
SET @currentDate = DATEADD(DAY, 7, @currentDate); -- Skip 7 days
SET @peakType = CASE @peakType WHEN 'Peak1' THEN 'Peak2' WHEN 'Peak2' THEN 'Peak3' ELSE 'Peak1' END; -- Cycle peak types
END;
END;
But the result is incorrect, because for the Table 1
| Value | Date | Notes |
|---|---|---|
| 12 | 1/1/2023 | Start 1 |
| 15 | 1/2/2023 | |
| 13 | 1/3/2023 | |
| 11 | 1/4/2023 | |
| 10 | 1/5/2023 | |
| 5 | 1/6/2023 | |
| 31 | 1/7/2023 | End 1 / Peak |
| 9 | 1/8/2023 | |
| 31 | 1/9/2023 | |
| 21 | 1/10/2023 | |
| 22 | 1/11/2023 | |
| 34 | 1/12/2023 | |
| 8 | 1/13/2023 | |
| 5 | 1/14/2023 | |
| 33 | 1/15/2023 | Start 2 / Peak |
| 22 | 1/16/2023 | |
| 9 | 1/17/2023 | |
| 18 | 1/18/2023 | |
| 19 | 1/19/2023 | |
| 22 | 1/20/2023 | |
| 29 | 1/21/2023 | End 2 |
| 10 | 1/22/2023 | |
| 9 | 1/23/2023 | Start 3 |
| 8 | 1/24/2023 | |
| 29 | 1/25/2023 | |
| 33 | 1/26/2023 | |
| 55 | 1/27/2023 | |
| 81 | 1/28/2023 | Peak |
| 3 | 1/29/2023 | End 3 |
| 88 | 1/30/2023 | |
| 44 | 1/31/2023 | |
| 10 | 2/1/2023 | |
| 20 | 2/2/2023 | |
| 44 | 2/3/2023 | |
| 83 | 2/4/2023 | |
| 69 | 2/5/2023 | Start 4 |
| 99 | 2/6/2023 | Peak |
| 20 | 2/7/2023 | |
| 40 | 2/8/2023 | |
| 10 | 2/9/2023 | |
| 32 | 2/10/2023 | |
| 33 | 2/11/2023 | End 4 |
| 83 | 2/12/2023 | |
| 63 | 2/13/2023 | |
| 92 | 2/14/2023 | Start 5 / Peak |
| 65 | 2/15/2023 | |
| 42 | 2/16/2023 | |
| 84 | 2/17/2023 | |
| 21 | 2/18/2023 | |
| 23 | 2/19/2023 | |
| 12 | 2/20/2023 | End 5 |
| 92 | 2/21/2023 | |
| 65 | 2/22/2023 | Start 6 |
| 74 | 2/23/2023 | |
| 83 | 2/24/2023 | Peak |
| 10 | 2/25/2023 |
The output should be what is in Table 2:
| date | Value | Peak |
|---|---|---|
| 1/7/2023 | 31 | peak1 |
| 1/15/2023 | 33 | peak2 |
| 1/28/2023 | 81 | peak3 |
| 2/6/2023 | 99 | peak1 |
| 2/14/2023 | 92 | peak2 |
| 2/24/2023 | 83 | peak3 |
I have two approaches that you can consider. One fixes up your current stored procedure to produce your posted desired results. The second uses a more traditional definition of a local maximum.
Your original stored procedure had several issues that I have fixed up below.
WHEREcondition to exclude rows that were not peak rows. This caused all dates in the selected range to be inserted into the results, instead of just the selected peak. To fix this, theMAX()window function logic was moved to a subselect and aWHERE s.SA_data = peakcondition was added to the outer. I also added aTOP 1andORDER BYto pick just the first peak date if there are multiple peak value matches.@CurrentDatelogic was adding 7 to the current value instead of calculating a new start based on skipping 7 days following each selected peak. Fixing this requires capturing the selected peak date and using it for both the results insert and the updated@CurrentDatecalculation -SET @currentDate = DATEADD(DAY, 8, @Datadate).(SELECT MAX(dateTime) FROM Peak)ending value calculation for theWHILEloop was moved to a variable to avoid repeated/redundant queries. This doesn't affect the results, but improves performance slightly.MAX()window function logic is already contained in a query that limits the rows to the desired 7-day range, so a separateROWS BETWEEN ...qualifier is unnecessary and has been dropped. The remaining emptyOVER()qualifier then just refers to the entire 7-day range. This doesn't affect the results, but simplifies the code.The updated stored procedure would then be:
Results:
As an alternative more traditional local peak calculation, you can define a peak as a point where the current value equals or exceeds all other values within a sliding +/- 7-day window (or chose your window size). In the case of level plateaus having one or more adjacent or closely-spaced equal peak values, the earliest can be chosen.
The above logic will not select a peak from among the middle of a (mostly) continuously ascending or descending series of values. Only when the values form a true local maximum (within a +/- 7 day window) will a peak value be selected.
Results:
Notes:
Adjusting the window size to a range other than +/- 7 days may increase or decrease the number of detected local maximums.
See this db<>fiddle for a demo of both approaches.