We are working on a platform right now and I'm having difficulties optimizing one of our queries, it's actually a view.
When explaining the query a full table scan is performed on my user_role
table.
The query behind the view looks like what I've posted below (I used * for select instead of different columns of the different tables, just to showcase my issue). All results are based on that simplified query.
SELECT t.*
FROM task t
LEFT JOIN project p ON t.project_id = p.id
LEFT JOIN company comp ON comp.id = p.company_id
LEFT JOIN company_users cu ON cu.companies_id = comp.id
LEFT JOIN user_table u ON u.email= cu.users_email
LEFT JOIN user_role role ON u.email= role.user_email
WHERE lower(t.type) = 'project'
AND t.project_id IS NOT NULL
AND role.role::text in ('SOME_ROLE_1', 'SOME_ROLE_2')
Basically this query runs ok like this. If I explain the query it uses all my index in place, nice! But.. As soon as I start added to extra where clauses the issue arises. I simply add this:
and u.email = '[email protected]'
and comp.id = 4
and p.id = 3
And all of the sudden on tables company_users
and user_role
a full table scan is performed. No on table project
.
The full query plan is:
Nested Loop (cost=0.98..22.59 rows=1 width=97) (actual time=0.115..4.448 rows=189 loops=1)
-> Nested Loop (cost=0.84..22.02 rows=1 width=632) (actual time=0.099..3.091 rows=252 loops=1)
-> Nested Loop (cost=0.70..20.10 rows=1 width=613) (actual time=0.082..1.774 rows=252 loops=1)
-> Nested Loop (cost=0.56..19.81 rows=1 width=621) (actual time=0.068..0.919 rows=252 loops=1)
-> Nested Loop (cost=0.43..19.62 rows=1 width=101) (actual time=0.058..0.504 rows=63 loops=1)
-> Index Scan using task_project_id_index on task t (cost=0.28..11.43 rows=1 width=97) (actual time=0.041..0.199 rows=63 loops=1)
Index Cond: (project_id IS NOT NULL)
Filter: (lower((type)::text) = 'project'::text)
-> Index Scan using project_id_uindex on project p (cost=0.15..8.17 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=63)
Index Cond: (id = t.project_id)
-> Index Scan using company_users_companies_id_index on company_users cu (cost=0.14..0.17 rows=1 width=520) (actual time=0.002..0.004 rows=4 loops=63)
Index Cond: (companies_id = p.company_id)
-> Index Only Scan using company_id_index on company comp (cost=0.14..0.29 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=252)
Index Cond: (id = p.company_id)
Heap Fetches: 252
-> Index Only Scan using user_table_email_index on user_table u (cost=0.14..1.81 rows=1 width=19) (actual time=0.004..0.004 rows=1 loops=252)
Index Cond: (email = (cu.users_email)::text)
Heap Fetches: 252
-> Index Scan using user_role_user_email_index on user_role role (cost=0.14..0.56 rows=1 width=516) (actual time=0.004..0.004 rows=1 loops=252)
Index Cond: ((user_email)::text = (u.email)::text)
Filter: ((role)::text = ANY ('{COMPANY_ADMIN,COMPANY_USER}'::text[]))
Rows Removed by Filter: 0
Planning time: 2.581 ms
Execution time: 4.630 ms
The explanation for company_users
in particular is:
SEQ_SCAN (Seq Scan) table: company_users; 1 1.44 0.0 Parent Relationship = Inner;
Parallel Aware = false;
Alias = cu;
Plan Width = 520;
Filter = ((companies_id = 4) AND ((users_email)::text = '[email protected]'::text));
However I already created an index on the company_users
table: create index if not exists company_users_users_email_companies_id_index on company_users (users_email, companies_id);
.
Same counts for the user_role
table.
The explanation is
SEQ_SCAN (Seq Scan) table: user_role; 1 1.45 0.0 Parent Relationship = Inner;
Parallel Aware = false;
Alias = role;
Plan Width = 516;
Filter = (((role)::text = ANY ('{SOME_ROLE_1,SOME_ROLE_2}'::text[])) AND ((user_email)::text = '[email protected]'::text));
And thus for user_role
I also have an index in place on columns role
and user_email
: create index if not exists user_role_role_user_email_index on user_role (role, user_email);
Did not expect it to change anything but anyways I did try to adapt the query to include only a single role in the where class and also tried with OR statements instead of an IN but all makes no difference!
The weird thing to me is that without my extra 3 where filters it works perfect, and as soon as I add them it does not work, but I do have indexes created for the fields mentioned in the explanations...
We are using exactly the same algo in a few other queries and they all suffer the same issue on those two fields... So the big question is, how can I further improve my queries and make it use the index instead of a full table scan?
Create an index and calculate statistics:
That should improve the estimate, so that PostgreSQL chooses a better join strategy.