I want to calculate session intervals for multiple, possibly overlapping sessions per agent. I have a log table like this in a PostgreSQL database:
id dt session_id agent_id state
7 2024-01-25 22:26:57 4 3148 0
7 2024-01-25 22:24:57 4 3148 1
6 2024-01-25 22:23:57 2 3148 0
5 2024-01-25 15:53:30 1 3148 0
4 2024-01-25 15:53:30 3 3148 0
3 2024-01-25 13:53:02 3 3148 1
2 2024-01-25 12:43:10 2 3148 1
1 2024-01-25 12:30:02 1 3148 1
state = 1 indicates the start of a session, state = 0 its end.
For single sessions I use the lag() function:
select agent_id, status, dt, lag(dt) over (partition by (agent_id) order by dt) as pre_status_dt from
(
select *,
lag(status) over (partition by (agent_id) order by dt) as pre_status
from (
select cs.dt, cs.user_id as agent_id, cs.status
from "channel_subscribe" cs
where cs.channel = 14
) as t1
where t1.dt >= '2024-01-01'
) as t2
where t2.status != t2.pre_status
order by dt asc
But that does not work for overlapping sessions.
I would like to get this result:
agent_id start_dt end_dt
3148 2024-01-25 12:30:02 2024-01-25 22:23:57
3148 2024-01-25 22:24:57 2024-01-25 22:26:57
Your sample shows nested sessions: Additional sessions start and end within the first open session of the same agent. Alternatively, sessions might be chained: next session starts within the first session, but ends later.
The following solutions cover both cases: A combined session starts with the first open session, and ends once all open sessions have been closed.
Each assumes consistent data: Every session that gets closed has been opened before (or at the same time, but with am earlier
id). Unfinished sessions are reported as such (with null forend_dt). If your data is not as reliable, you need to define what can go wrong, and handle cases accordingly.Pure SQL
The query in
sub1adds a running count per agent assum_state. A combined session ends whensum_stategoes back to 0.Subquery
sub2forms groups (island) by counting "0"-events per agent. Note the descending sort (DESC) to include each "0"-event in its group.The outer query reports data as requested for every combined session. The added
FILTER (WHERE sum_state = 0)makes sure that unfinished sessions are reported as such.Procedural solution
Once we need multiple subquery levels (with changing sort order), it's probably faster to use a (combined) procedural solution like this PL/pgSQL function:
Call:
fiddle
Related: