Pre-define groups for PostgreSQL query

45 Views Asked by At

my_events table stores event_id, event_date, event_type, event_duration.

There are many event types for example: Event A, Event B, Event C, Event D The query that I have written will get my results but will exclude events that were not in my observation period event_date > '2023-05-01'

How do I get the results so that the grouping will show all Events in the first column?

This query will return only events that have occurred after '2023-05-01' for example:

SELECT event_type,
COUNT(event_id) as num_events,
AVG(event_duration) as avg_duration
FROM my_events
WHERE event_date > '2023-05-01'
GROUP BY event_type
event_type num_events avg_duration
Event B 20 00:33:25
Event C 33 01:04:22

I want the results to show A and D even if they do not occur in my observation period:

event_type num_events avg_duration
Event A 0 00:00:00
Event B 20 00:33:25
Event C 33 01:04:22
Event D 0 00:00:00

Additional information based on responses: The table is very large and I would like to stick to pre-defining a list of rows of events. My next attempt was to create a Common Table Expression and left join my_events onto the CTE as shown

WITH event_list
     AS (SELECT 'Event A' AS event_type
         UNION ALL
         SELECT 'Event B' AS event_type
         UNION ALL
         SELECT 'Event C' AS event_type
         UNION ALL
         SELECT 'Event D' AS event_type)
SELECT el.event_type,
       Count(me.event_id)     AS num_events,
       Avg(me.event_duration) AS avg_duration
FROM   event_list el
       LEFT JOIN my_events me using (event_type)
WHERE  me.event_date > '2023-05-01'
GROUP  BY el.event_type

However, if I run the CTE alone it displays the rows for Events A-D, but after joining it with my_events I still get the same result of just B and C.

2

There are 2 best solutions below

1
Andrey Ivanov On

You can select all distinct event_type's and then use subqueries to collect num_events and avg_duration:

SELECT DISTINCT event_type, 
(SELECT COUNT(*) FROM my_events WHERE event_type = events.event_type AND event_date > '2023-05-01') AS num_events, 
(SELECT AVG(event_duration) FROM my_events WHERE event_type = events.event_type AND event_date > '2023-05-01') AS avg_duration 
FROM my_events AS events;

If you need to get 00:00:00 in avg_duration instead of NULL's, then use COALESCE function:

SELECT DISTINCT event_type, 
(SELECT COUNT(*) FROM my_events WHERE event_type = events.event_type AND event_date > '2023-05-01') AS num_event, 
COALESCE((SELECT AVG(event_duration) FROM my_events WHERE event_type = events.event_type AND event_date > '2023-05-01'), '00:00:00') AS avg_duration 
FROM my_events AS events;
0
JohnH On

The following query returns the requested results:

WITH event_types AS (
  SELECT DISTINCT event_type
    FROM my_events
)
SELECT event_types.event_type,
       COUNT(my_events.event_id) AS num_events,
       COALESCE(AVG(my_events.event_duration), INTERVAL '0' SECOND) AS avg_duration
  FROM event_types
  LEFT JOIN my_events
    ON event_types.event_type = my_events.event_type
      AND my_events.event_date > '2023-05-01'::DATE
 GROUP BY event_types.event_type;