Postgres index issue

128 Views Asked by At

We are working on a platform right now and I'm having difficulties optimizing one of our queries, it's actually a view.

When explaining the query a full table scan is performed on my user_role table. The query behind the view looks like what I've posted below (I used * for select instead of different columns of the different tables, just to showcase my issue). All results are based on that simplified query.

SELECT t.*
FROM task t
         LEFT JOIN project p ON t.project_id = p.id
         LEFT JOIN company comp ON comp.id = p.company_id
         LEFT JOIN company_users cu ON cu.companies_id = comp.id
         LEFT JOIN user_table u ON u.email= cu.users_email
         LEFT JOIN user_role role ON u.email= role.user_email
WHERE lower(t.type) = 'project'
  AND t.project_id IS NOT NULL
  AND role.role::text in ('SOME_ROLE_1', 'SOME_ROLE_2')

Basically this query runs ok like this. If I explain the query it uses all my index in place, nice! But.. As soon as I start added to extra where clauses the issue arises. I simply add this:

  and u.email = '[email protected]'
  and comp.id = 4
  and p.id = 3

And all of the sudden on tables company_users and user_role a full table scan is performed. No on table project.

The full query plan is:

Nested Loop  (cost=0.98..22.59 rows=1 width=97) (actual time=0.115..4.448 rows=189 loops=1)                                                                                 
  ->  Nested Loop  (cost=0.84..22.02 rows=1 width=632) (actual time=0.099..3.091 rows=252 loops=1)                                                                          
        ->  Nested Loop  (cost=0.70..20.10 rows=1 width=613) (actual time=0.082..1.774 rows=252 loops=1)                                                                    
              ->  Nested Loop  (cost=0.56..19.81 rows=1 width=621) (actual time=0.068..0.919 rows=252 loops=1)                                                              
                    ->  Nested Loop  (cost=0.43..19.62 rows=1 width=101) (actual time=0.058..0.504 rows=63 loops=1)                                                         
                          ->  Index Scan using task_project_id_index on task t  (cost=0.28..11.43 rows=1 width=97) (actual time=0.041..0.199 rows=63 loops=1)               
                                Index Cond: (project_id IS NOT NULL)                                                                                                        
                                Filter: (lower((type)::text) = 'project'::text)                                                                                             
                          ->  Index Scan using project_id_uindex on project p  (cost=0.15..8.17 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=63)                  
                                Index Cond: (id = t.project_id)                                                                                                             
                    ->  Index Scan using company_users_companies_id_index on company_users cu  (cost=0.14..0.17 rows=1 width=520) (actual time=0.002..0.004 rows=4 loops=63)
                          Index Cond: (companies_id = p.company_id)                                                                                                         
              ->  Index Only Scan using company_id_index on company comp  (cost=0.14..0.29 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=252)                      
                    Index Cond: (id = p.company_id)                                                                                                                         
                    Heap Fetches: 252                                                                                                                                       
        ->  Index Only Scan using user_table_email_index on user_table u  (cost=0.14..1.81 rows=1 width=19) (actual time=0.004..0.004 rows=1 loops=252)                     
              Index Cond: (email = (cu.users_email)::text)                                                                                                                  
              Heap Fetches: 252                                                                                                                                             
  ->  Index Scan using user_role_user_email_index on user_role role  (cost=0.14..0.56 rows=1 width=516) (actual time=0.004..0.004 rows=1 loops=252)                         
        Index Cond: ((user_email)::text = (u.email)::text)                                                                                                                  
        Filter: ((role)::text = ANY ('{COMPANY_ADMIN,COMPANY_USER}'::text[]))                                                                                               
        Rows Removed by Filter: 0                                                                                                                                           
Planning time: 2.581 ms                                                                                                                                                     
Execution time: 4.630 ms                                                                                                                                                    

The explanation for company_users in particular is:

SEQ_SCAN (Seq Scan)  table: company_users;  1   1.44    0.0 Parent Relationship = Inner;
Parallel Aware = false;
Alias = cu;
Plan Width = 520;
Filter = ((companies_id = 4) AND ((users_email)::text = '[email protected]'::text));

However I already created an index on the company_users table: create index if not exists company_users_users_email_companies_id_index on company_users (users_email, companies_id);.

Same counts for the user_role table. The explanation is

SEQ_SCAN (Seq Scan)  table: user_role;  1   1.45    0.0 Parent Relationship = Inner;
Parallel Aware = false;
Alias = role;
Plan Width = 516;
Filter = (((role)::text = ANY ('{SOME_ROLE_1,SOME_ROLE_2}'::text[])) AND ((user_email)::text = '[email protected]'::text));

And thus for user_role I also have an index in place on columns role and user_email: create index if not exists user_role_role_user_email_index on user_role (role, user_email);

Did not expect it to change anything but anyways I did try to adapt the query to include only a single role in the where class and also tried with OR statements instead of an IN but all makes no difference!

The weird thing to me is that without my extra 3 where filters it works perfect, and as soon as I add them it does not work, but I do have indexes created for the fields mentioned in the explanations...

We are using exactly the same algo in a few other queries and they all suffer the same issue on those two fields... So the big question is, how can I further improve my queries and make it use the index instead of a full table scan?

1

There are 1 best solutions below

2
On

Create an index and calculate statistics:

CREATE INDEX ON task (lower(type)) WHERE project_id IS NOT NULL;

ANALYZE task;

That should improve the estimate, so that PostgreSQL chooses a better join strategy.