I would like to know whether multiple similar exists conditions can be combined in a meaningful and performant way.
Let us assume the following example: Different activities can be assigned to a service (n-m). Activities can be grouped independently into activity groups. Activity groups can be assigned to a group type.
If I now want to find all services that have a reference to certain group types and I want to link the condition by OR, then this is relatively simple by combining EXISTS and IN.
select *
from service
where exists (
select 1
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where (
activitiy_group.id_type in (1, 3)
and activitiy.id_service = service.id
);
If, on the other hand, I want to link the condition by AND, then it is not quite so simple. I could add multiple EXITS conditions:
select *
from service
where exists (
select 1
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where (
activitiy_group.id_type = 1
and activitiy.id_service = service.id
)
and
exists (
select 1
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where (
activitiy_group.id_type = 3
and activitiy.id_service = service.id
);
But I wonder if this approach is performant for many filter elements. I experimented a bit and one approach would be with only one subselect, by selecting all distinct activity group type ids related to a service into one array and comparing it with the filter values:
select *
from service
where true =
(select ARRAY_AGG(activitiy_group.id_type) @> ('{1,3}'::Integer[])
from activitiy
join activitiy_activitiy_group
on activitiy.id = activitiy_activitiy_group.id_activitiy
join activitiy_group
on activitiy_activitiy_group.id_activitiy_group = activitiy_group.id
where ativitiy.id_service = service.id);
But here, too, the question arises as to whether this is really performant. Can anyone assess this, or is there perhaps a more sensible alternative approach? I assumed that the underlying basic problem is a standard problem, but unfortunately could not find any other approach on the internet.
I'll pick a setting where the involved tables are big and writes are comparatively rare. Then it makes sense to create an auxiliary
MATERIALIZED VIEW(once) to dramatically speed up the query:Generates a table with unique services with an array of unique activity types.
(It's typically faster to apply
DISTINCTandORDER BYonce in a subquery.)Create a unique index on
service_activity_typesto allow refreshing the MVCONCURRENTLY:Refresh after impactful changes to underlying tables:
Create a index on the array column to make the query fast. There are various options. For your case, I expect a GIN index using the operator class
gin__int_opsfrom the additional moduleintarrayto be fastest. Install the module once per database first. See:Maybe even a multicolumn index. See:
Also, to get your column statistics started:
Then your query can be:
And it will be blazingly fast.