Using "Retain" approach in BigQuery (similar to SAS)

41 Views Asked by At

While defining a variable in BigQuery, how to compare value in a row with the value obtained above (while variable creation is still in process)?

I have a table with ID, start and end dates. I want to create a MaxDate variable which will compare (from 2nd row onwards) if the start date of each row is <= to "lag" end date, and if the condition is true, then MaxDate should be the maximum of the EndDate and the "previous" EndDate immediately above.

ID StartDate EndDate MaxDate
A 2019-10-25 2019-10-31 2019-10-31
A 2019-10-26 2019-10-26 2019-10-31
A 2019-10-28 2019-10-30 2019-10-31
A 2019-10-29 2019-10-29 2019-10-31

The below approach is failing at the second lag row:


WITH
S1 AS
(
SELECT ID, 1 AS COUNT
FROM S0
ORDER BY ID, StartDate, EndDate
)

,S2 AS
(
SELECT *,

CASE WHEN StartDate < LAG(EndDate) OVER (PARTITION BY ID ORDER BY StartDate, EndDate)
THEN
  MAX(EndDate) OVER (PARTITION BY ID ORDER BY StartDate, EndDate
    ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
      ELSE EndDate
END AS EndDate2
FROM S1
)

,S3 AS
(SELECT *,

CASE WHEN StartDate > LAG(EndDate2) OVER (PARTITION BY ID ORDER BY StartDate, EndDate) + 1
THEN 
  MAX(COUNT) OVER (PARTITION BY ID ORDER BY StartDate, EndDate
    ROWS 1 PRECEDING) + 1
      ELSE 1
END AS Visit_Count

FROM S2
ORDER BY ID, StartDate, EndDate
)

(
SELECT ID, Visit_Count, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM S3
GROUP BY ID, Visit_Count
ORDER BY ID, Visit_Count
)
;

Thanks!

1

There are 1 best solutions below

0
Prathamesh Pathak On

Figured out:

SELECT *,
CASE WHEN StartDate <= MAX(EndDate) 
  OVER (PARTITION BY PATIENT_ID ORDER BY StartDate) 
      THEN MAX(EndDate)
        OVER (PARTITION BY PATIENT_ID ORDER BY StartDate)
            ELSE StartDate
              END AS MaxDate
FROM table
;