Creating a Date Range from Multiple Rows Based on a Single Date

308 Views Asked by At

I have a user table with following fields: User_ID, Email, Used_date.

Original Table

As we can see, a user can switch to multiple emails over period. From used_date field I want to create date range fields (Email_Start_Date and Email_End_Date). They will store a time period when a user has used that email.

Expected Table

Its possible that a user can switch back to an older email. In this case same email will have two date ranges.

Also I want to fill the gap between last day of previous email and start date of current email.

For example if user used [email protected] for a period of 8/28/2020 - 8/31/2020.

Further he switched to [email protected] on 9/3/2020.

Then in output [email protected] will have a date range of 8/28/2020 - 9/2/2020.

This is a case of gaps-and-islands. But I don't know how to implement this.

Thanks everyone!

2

There are 2 best solutions below

2
On BEST ANSWER

I would suggest just the difference of row numbers and aggregation:

select user_id, email, min(used_date) as email_start_date,
       lead(min(used_date)) over (partition by user_id order by min(used_date)) - interval '1 day' as email_end_date
from (select t.*,
             row_number() over (partition by user_id order by used_date) as seqnum,
             row_number() over (partition by user_id, email order by used_date) as seqnum_2
      from t
     ) t
group by user_id, email, (seqnum - seqnum_2);

Actually, you can also do this with lag() and no aggregation:

select user_id, email, min(used_date) as email_start_date,
       lead(used_date) over (partition by user_id order by used_date) - interval '1 day' as email_end_date
from (select t.*,
             lag(email) over (partition by user_id order by used_date) as prev_email
      from t
     ) t
where prev_email is null or prev_email <> email;

This second one is simple. It just keeps the rows where the email changes (or the row where the data starts for the user). It then uses lead() to get the end date.

Here is a db<>fiddle.

1
On

Next time, paste your data as text, so we don't have to type it back ...

Do you mean this? I prefer an "infinite date" instead of a NULL value for the last until-date - and I prefer a "session id" to an "island identifier", they're usually called like that in clickstream and IoT analyses ...

WITH
indata(userid,email,used_dt) AS (
          SELECT 1,'[email protected]' , DATE '2020-08-28'
UNION ALL SELECT 1,'[email protected]' , DATE '2020-08-29'
UNION ALL SELECT 1,'[email protected]' , DATE '2020-08-30'
UNION ALL SELECT 1,'[email protected]' , DATE '2020-08-31'
UNION ALL SELECT 1,'[email protected]', DATE '2020-09-03'
UNION ALL SELECT 1,'[email protected]', DATE '2020-09-05'
UNION ALL SELECT 1,'[email protected]', DATE '2020-09-07'
UNION ALL SELECT 1,'[email protected]',  DATE '2020-09-09'
UNION ALL SELECT 2,'[email protected]'     , DATE '2019-07-12'
UNION ALL SELECT 3,'[email protected]' , DATE '2020-08-08'
)
,
with_change_counter AS (
SELECT 
  userid
, email
, used_dt AS used_from_dt
, CASE 
    WHEN LAG(email,1,'') OVER(
      PARTITION BY userid ORDER BY used_dt
    ) <> email 
    THEN 1
    ELSE 0 
  END AS counter
, LEAD(used_dt,1,'9999-12-31') OVER(
    PARTITION BY userid ORDER BY used_dt
  ) AS used_until_dt
  FROM indata
)
,with_sess_id AS (
  SELECT
    userid
  , email
  , used_from_dt
  , used_until_dt
  , SUM(counter) OVER(PARTITION BY userid ORDER BY used_from_dt) AS sessid
  , counter
  FROM with_change_counter
) 
SELECT
  userid
, MAX(email) AS email
, MIN(used_from_dt) AS email_start_date
, MAX(used_until_dt) AS email_end_date
FROM with_sess_id
GROUP BY
  sessid
, userid
ORDER BY
  userid
, sessid
, email
;
-- out  userid |       email        | email_start_date | email_end_date 
-- out --------+--------------------+------------------+----------------
-- out       1 | [email protected]  | 2020-08-28       | 2020-09-03
-- out       1 | [email protected] | 2020-09-03       | 2020-09-09
-- out       1 | [email protected]  | 2020-09-09       | 9999-12-31
-- out       2 | [email protected]      | 2019-07-12       | 9999-12-31
-- out       3 | [email protected]  | 2020-08-08       | 9999-12-31