Query to find unique time periods from list of time ranges

77 Views Asked by At

I have a table which looks like below:

CREATE TABLE time_records (
  id uuid NOT NULL,
  employee_id uuid NOT NULL,
  starttime timestampt NOT NULL,
  endtime timestampt NOT NULL
)

There will be overlap in times between the records for the same employee_id:

id employee_id starttime endtime
1 1 '2023-09-01 07:00:00' '2023-09-01 09:15:00'
2 1 '2023-09-01 07:00:00' '2023-09-01 15:00:00'
3 1 '2023-09-01 07:00:00' '2023-09-01 15:00:00'
4 1 '2023-09-01 14:00:00' '2023-09-01 15:00:00'
5 1 '2023-09-01 23:45:00' '2023-09-01 23:59:00'
6 1 '2023-09-01 23:45:00' '2023-09-01 23:59:00'

What I'm trying to do is get the time ranges within all of these times:

employee_id starttime endtime ids
1 '2023-09-01 07:00:00' '2023-09-01 15:00:00' [1,2,3,4]
1 '2023-09-01 23:45:00' '2023-09-01 23:29:00' [5,6]

I can get this to work if there's only one set of overlapping time within a day using max/min for the start and end times, but I can't seem to make it work when there are multiple sets of overlapping time in a day:

select timea.employee_id,
       min(timea.starttime) starttime,
       max(timea.endtime)   endtime,
       array_agg(timea.id) ids
from time_records timea
         inner join time_records timea2 on timea.employee_id = timea2.employee_id and
                                           tsrange(timea2.starttime, timea2.endtime, '[]') &&
                                           tsrange(timea.starttime, timea.endtime, '[]')
    and timea.id != timea2.id
group by timea.employee_id;

With results:

employee_id starttime endtime ids
1 '2023-09-01 07:00:00' '2023-09-01 23:59:00' [1,2,3,4,5,6]
2

There are 2 best solutions below

1
Erwin Brandstetter On BEST ANSWER

make it work when there are multiple sets of overlapping time in a day

Plain aggregation with min() and max() cannot solve this. Which rows eventually form a group only becomes evident after merging ranges.

The aggregate function range_agg() makes the task a whole lot simpler. It was added with Postgres 14. Just computing merged ranges is very simple now:

SELECT unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged_range
FROM   time_records;

To also get an array of involved IDs, we need to do more. One way is to join back to the underlying table and then aggregate once more (groups are identified by the merged ranges now):

SELECT employee_id, lower(merged) AS starttime, upper(merged) AS endtime
     , array_agg(t.id) AS ids
FROM  (
   SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged
   FROM   time_records
   GROUP  BY employee_id
   ) r
JOIN   time_records t USING (employee_id)
WHERE  r.merged @> t.starttime
GROUP  BY r.employee_id, r.merged
ORDER  BY r.employee_id, r.merged;

Another way with a LATERAL subquery:

SELECT r.employee_id, lower(r.merged) AS starttime, upper(r.merged) AS endtime, i.ids
FROM  (
   SELECT employee_id, unnest(range_agg(tsrange(starttime, endtime, '[]'))) AS merged
   FROM   time_records
   GROUP  BY employee_id
   ) r
CROSS  JOIN LATERAL (
   SELECT ARRAY (
      SELECT t.id
      FROM   time_records t
      WHERE  t.employee_id = r.employee_id
      AND    t.starttime <@ r.merged
      ORDER  BY t.id
      )
   ) i (ids)
ORDER  BY r.employee_id, r.merged;

fiddle

Related:

Not sure if either query is also faster than my custom function below, because that only iterates over the whole table once.

Postgres 9.6

While stuck on your outdated version, create a custom set-returning function (once):

CREATE OR REPLACE FUNCTION public.f_merge_ranges()
  RETURNS TABLE (
    employee_id int
  , starttime timestamp
  , endtime timestamp
  , ids int[]
  )
  LANGUAGE plpgsql AS
$func$
DECLARE
   r record;  -- current row
BEGIN
   FOR r IN 
      SELECT t.id, t.employee_id, t.starttime, t.endtime
      FROM   time_records t
      ORDER  BY t.employee_id, t.starttime, t.endtime DESC, t.id  -- better take longer range first
   LOOP
      IF r.employee_id = employee_id THEN  -- works for null in first iteration
         IF r.starttime > endtime THEN
            RETURN NEXT;
            starttime   := r.starttime;
            endtime     := r.endtime;
            ids         := ARRAY[r.id];
         ELSE
            ids         := ids || r.id;
            IF r.endtime > endtime THEN
               endtime  := r.endtime;
            END IF;
         END IF;
      ELSE
         IF employee_id IS NOT NULL THEN  -- catch first iteration
            RETURN NEXT;
         END IF;
         employee_id := r.employee_id;
         starttime   := r.starttime;
         endtime     := r.endtime;
         ids         := ARRAY[r.id];
      END IF;
   END LOOP;

   -- return last row (if any)
   IF FOUND THEN
      RETURN NEXT;
   END IF;
END
$func$;

Call:

SELECT * FROM public.f_merge_ranges();

fiddle

Unlike the queries above, the array in ids is unsorted. You need to do more if you need that.

Related:

0
Ajax1234 On

Using a cte to produce the maximum endtime for each starttime, the largest overlapping intervals can then be found and the original time_records table joined back onto it with an aggregation:

with cte as (
   select t.employee_id, t.starttime, max(t.endtime) m from time_records t
   group by t.employee_id, t.starttime
)
select c.employee_id, c.starttime, c.m, array_agg(t.id)
from cte c join time_records t on c.starttime <= t.starttime and t.endtime <= c.m
where not exists (select 1 from cte t1 where t1.employee_id = c.employee_id and t1.starttime < c.starttime and c.m <= t1.m)
group by c.employee_id, c.starttime, c.m

See fiddle