Combine similar EXISTS conditions in PostgreSQL

Question

Combine similar EXISTS conditions in PostgreSQL

52 Views Asked by Philipp Li At 31 August 2023 at 10:59

I would like to know whether multiple similar exists conditions can be combined in a meaningful and performant way.

Let us assume the following example: Different activities can be assigned to a service (n-m). Activities can be grouped independently into activity groups. Activity groups can be assigned to a group type.

If I now want to find all services that have a reference to certain group types and I want to link the condition by OR, then this is relatively simple by combining EXISTS and IN.

select *
from service
where exists (
          select 1
          from activitiy
               join activitiy_activitiy_group
                    on activitiy.id = activitiy_activitiy_group.id_activitiy
                    join activitiy_group
                        on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
          where (
                        activitiy_group.id_type in (1, 3)
                        and activitiy.id_service = service.id
                    );

If, on the other hand, I want to link the condition by AND, then it is not quite so simple. I could add multiple EXITS conditions:

select *
from service
where exists (
          select 1
          from activitiy
               join activitiy_activitiy_group
                    on activitiy.id = activitiy_activitiy_group.id_activitiy
                    join activitiy_group
                        on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
          where (
                        activitiy_group.id_type = 1
                        and activitiy.id_service = service.id
                    )
and
exists (
          select 1
          from activitiy
               join activitiy_activitiy_group
                    on activitiy.id = activitiy_activitiy_group.id_activitiy
                    join activitiy_group
                        on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
          where (
                            activitiy_group.id_type = 3
                        and activitiy.id_service = service.id
                    );

But I wonder if this approach is performant for many filter elements. I experimented a bit and one approach would be with only one subselect, by selecting all distinct activity group type ids related to a service into one array and comparing it with the filter values:

select *
from service
where true =
      (select ARRAY_AGG(activitiy_group.id_type) @> ('{1,3}'::Integer[])
       from activitiy
               join activitiy_activitiy_group
                    on activitiy.id = activitiy_activitiy_group.id_activitiy
                    join activitiy_group
                        on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
       where ativitiy.id_service = service.id);

But here, too, the question arises as to whether this is really performant. Can anyone assess this, or is there perhaps a more sensible alternative approach? I assumed that the underlying basic problem is a standard problem, but unfortunately could not find any other approach on the internet.

Original Q&A

There are 1 best solutions below

**Erwin Brandstetter** · Accepted Answer · 2023-08-31T12:28:44.410000

I'll pick a setting where the involved tables are big and writes are comparatively rare. Then it makes sense to create an auxiliary MATERIALIZED VIEW (once) to dramatically speed up the query:

CREATE MATERIALIZED VIEW service_activity_types AS
SELECT a.id_service, array_agg(ag.id_type) AS activity_types
FROM  (
   SELECT DISTINCT a.id_service, ag.id_type
   FROM   activitiy                 a
   JOIN   activitiy_activitiy_group aag ON aag.id_activitiy = a.id
   JOIN   activitiy_group           ag  ON ag.id = aag.id_activitiy_group
   ORDER  BY 1, 2
   ) sub
GROUP  BY 1;

Generates a table with unique services with an array of unique activity types.

(It's typically faster to apply DISTINCT and ORDER BY once in a subquery.)

Create a unique index on service_activity_types to allow refreshing the MV CONCURRENTLY:

CREATE UNIQUE INDEX service_activity_type_uni ON service_activity_types (id_service);

Refresh after impactful changes to underlying tables:

REFRESH MATERIALIZED VIEW CONCURRENTLY service_activity_types;

Create a index on the array column to make the query fast. There are various options. For your case, I expect a GIN index using the operator class gin__int_ops from the additional module intarray to be fastest. Install the module once per database first. See:

How to create an index for elements of an array in PostgreSQL?

CREATE INDEX service_activity_type_gin_idx ON service_activity_types USING gin (activity_types gin__int_ops);

Maybe even a multicolumn index. See:

Multicolumn index on 3 fields with heterogenous data types

Also, to get your column statistics started:

ANALYZE service_activity_types;

Then your query can be:

SELECT id_service
FROM   service_activity_types
WHERE  activity_types @> '{1,3}';

And it will be blazingly fast.

Combine similar EXISTS conditions in PostgreSQL

There are 1 best solutions below

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in EXISTS

Related Questions in POSTGRESQL-PERFORMANCE

Related Questions in RELATIONAL-DIVISION

Trending Questions

Popular # Hahtags

Popular Questions