How to improve this PostgreSQL query's performance in index scan

81 Views Asked by At

In this execution plan, the following query is spending a lot of time, 3.780 seconds, to perform only a index scan on order_line table (line 11 of execution plan).

Primary keys of tables:

Customer PK - c_w_id, c_d_id, c_id
OOrder PK - o_w_id, o_d_id, o_id
Order_line PK - ol_w_id, ol_d_id, ol_o_id, ol_number
Nation PK - n_nationkey

Query:

select   c_id, c_last, sum(ol_amount) as revenue, c_city, c_phone, n_name
from     customer, oorder, order_line, nation
where    c_id = o_c_id
     and c_w_id = o_w_id
     and c_d_id = o_d_id
     and ol_w_id = o_w_id
     and ol_d_id = o_d_id
     and ol_o_id = o_id
     and o_entry_d >= '2007-01-02 00:00:00.000000'
     and o_entry_d <= ol_delivery_d
     and n_nationkey = ascii(substr(c_state,1,1))
group by c_id, c_last, c_city, c_phone, n_name
order by revenue desc

How can I improve the performance of this query? What materialized view do you recommend? Is it a good option?

CREATE MATERIALIZED VIEW mview AS
select c_id, c_last, ol_amount, c_city, c_phone, o_entry_d, ol_delivery_d, c_state
from     customer, oorder, order_line
where    c_id = o_c_id
     and c_w_id = o_w_id
     and c_d_id = o_d_id
     and ol_w_id = o_w_id
     and ol_d_id = o_d_id
     and ol_o_id = o_id;
2

There are 2 best solutions below

0
On

You may try to use expression index for ascii(substr(c_state,1,1)). You have sequential scan on customers.

1
On
  1. Never use such column names.
  2. If not 1) then never write such column names without table prefixes in a query (no need to add prefixes in column names).
  3. It's better to write joins in "join" clause, not in "where" clause, thus you can manipulate with hierarchy of joins.

Finally, have you run

analyse customer; 
analyse order_line; 
analyse oorder; 
analyse nation;

after you added index on ascii(substr(c_state,1,1))? If not - run it.

Also you can get n_name as a subquery in select clause:

select c_id, c_last, revenue, c_city, c_phone, (select x.n_name from nation x where x.n_nationkey = ascii(c_statecut)) as n_name 
  from (
select   c_id, c_last, sum(ol_amount) as revenue, c_city, c_phone, substr(c_state,1,1) as c_statecut
from     customer, oorder, order_line
where    c_id = o_c_id
     and c_w_id = o_w_id
     and c_d_id = o_d_id
     and ol_w_id = o_w_id
     and ol_d_id = o_d_id
     and ol_o_id = o_id
     and o_entry_d >= '2007-01-02 00:00:00.000000'
     and o_entry_d <= ol_delivery_d
group by c_id, c_last, c_city, c_phone, c_statecut
) finale
order by revenue desc

Nation table is an easy lookup table, but it affects overall plan for aggregation.

Regarding materialised view, preaggregation of order_line table alone is the fastest way of getting totals, but as far as you have "o_entry_d <= ol_delivery_d" condition in query you can't calculate aggregates using order_line table alone. Try this as a materialised (and indexed) view to be joined with the clients and nations later.

select o_c_id, ol_w_id, ol_d_id, ol_o_id, sum(ol_amount)
  from order_line 
  join oorder
    on     ol_w_id = o_w_id
       and ol_d_id = o_d_id
       and ol_o_id = o_id
       and o_entry_d <= ol_delivery_d -- (very strange condition, which looks redundant... it could be better if you can remove it)
 where o_entry_d >= '2007-01-02 00:00:00.000000'
 group by o_c_id, ol_w_id, ol_d_id, ol_o_id

If this query takes around 3 seconds, then materialize it. If less than 3 seconds, then you can join it with the rest on-the-fly.