Why using same field when filtering cause different execution time? (different index usage)

191 Views Asked by At

When I run query and filter by agreement_id it is slow,
but when I filter by an alias id it is fast. (Look at the end of the query)

Why using same field when filtering cause different execution time?

Links to explain analyze:

slow1, slow2
fast1, fast2

Difference start at #20: Where different indexes are used:
Index Cond: (o.sys_period @> sys_time()) VS Index Cond: (o.agreement_id = 38)

PS. It would be nice if I can contact to developer of this feature (I have one more similar problem)

UPD I did some experiments. when I remove window functions from my query it works fast in any case. So why window function stop index usage in some cases? How to escape/workaround that?

dbfiddle with minimal test case
Server version is v13.1
Full query:

WITH gconf AS
  -- https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-SELECT
  NOT MATERIALIZED -- force it to be merged into the parent query
  -- it gives a net savings because each usage of the WITH query needs only a small part of the WITH query's full output.
( SELECT
  ocd.*, 
  tstzrange( '2021-05-01', '2021-05-01', '[]') AS acc_period,

  (o).agreement_id      AS id,       -- Required to passthrough WINDOW FUNCTION
  (o).id                AS order_id,
  (ic).consumed_period  AS consumed_period,
  dense_rank()  OVER ( PARTITION BY (o).agreement_id, (o).id ORDER BY (ic).consumed_period )  AS nconf,
  row_number()  OVER ( wconf ORDER BY (c).sort_order NULLS LAST                            )  AS nitem,

  (sum( ocd.item_cost )  OVER wconf)::numeric( 10, 2) AS conf_cost,
  max((ocd.ic).consumed) OVER wconf                   AS consumed,
  CASE WHEN true
    THEN (sum( ocd.item_suma )  OVER wconf)::numeric( 10, 2 )
    ELSE (sum( ocd.item_cost )  OVER wconf)::numeric( 10, 2 )
  END AS conf_suma
FROM order_cost_details( tstzrange( '2021-05-01', '2021-05-01', '[]') ) ocd
WHERE true  OR  (ocd.ic).consumed_period @> lower( tstzrange( '2021-05-01', '2021-05-01', '[]') )

WINDOW wconf AS ( PARTITION BY (o).agreement_id, (o).id, (ic).consumed_period )
),
gorder AS (
SELECT *,
  (conf_suma/6)::numeric( 10, 2 ) as conf_nds,
  sum( conf_suma ) FILTER (WHERE nitem = 1) OVER worder AS order_suma
FROM gconf
WINDOW worder AS ( PARTITION BY gconf.id, (o).id )
-- TODO: Ask PG developers: Why changing to (o).agreement_id slows down query?
-- WINDOW worder AS ( PARTITION BY (o).agreement_id, (o).id )
)
SELECT
  u.id, consumed_period, nconf, nitem,
  (c).id                                                as item_id,
  COALESCE( (c).sort_order,   pd.sort_order          )  as item_order,
  COALESCE( st.display, st.name, rt.display, rt.name )  as item_name,
  COALESCE( item_qty,         (c).amount/rt.unit     )  as item_qty,
  COALESCE( (p).label,        rt.label               )  as measure,
  item_price, item_cost, item_suma,
  conf_cost, consumed, conf_suma, conf_nds, order_suma,
  (order_suma/6)::numeric( 10, 2 )                      as order_nds,
  sum( conf_suma                        ) FILTER (WHERE nitem = 1                ) OVER wagreement  AS total_suma,
  sum( (order_suma/6)::numeric( 10, 2 ) ) FILTER (WHERE nitem = 1  AND  nconf = 1) OVER wagreement  AS total_nds,

  pkg.id                            as package_id,
  pkg.link_1c_id                    as package_1c_id,
  COALESCE( pkg.display, pkg.name ) as package,

  acc_period

FROM gorder u
LEFT JOIN resource_type rt ON rt.id  = (c).resource_type_id
LEFT JOIN service_type  st ON st.id  = (c).service_type_id
LEFT JOIN package      pkg ON pkg.id = (o).package_id
LEFT JOIN package_detail     pd  ON pd.package_id = (o).package_id
  AND pd.resource_type_id IS NOT DISTINCT FROM (c).resource_type_id
  AND pd.service_type_id  IS NOT DISTINCT FROM (c).service_type_id

-- WHERE (o).agreement_id = 38   --  slow
WHERE u.id = 38                  --  fast
WINDOW wagreement AS ( PARTITION BY (o).agreement_id )
1

There are 1 best solutions below

0
Eugen Konkov On

As problem workaround we can additionally SELECT an alias for column used at PARTITION BY expression. Then PG apply optimization and use index.

The answer to the question could be: PG does not apply optimization if composite type is used. Notice as it works:

PARTITION | FILTER | IS USED?
------------------------------
ALIAS     | ORIG   | NO
ALIAS     | ALIAS  | YES
ORIG      | ALIAS  | NO
ORIG      | ORIG   | NO

See this dbfiddle

create table agreement ( ag_id int, name text, cost numeric(10,2) );
create index ag_idx on agreement (ag_id);
insert into agreement (ag_id, name, cost) values ( 1, '333', 22 ),
(1,'333', 33), (1, '333', 7), (2, '555', 18 ), (2, '555', 2), (3, '777', 4);
select * from agreement;

create function initial () 
returns table( agreement_id int, ag agreement ) language sql stable AS $$
select ag_id, t from agreement t;
$$;
select * from initial() t;

explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
  select 
    *,
    sum( (t.ag).cost ) over ( partition by agreement_id ) as total
  from initial() t
)
select * from totals_by_ag t
where (t.ag).ag_id = 1; -- index is NOT USED

explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
  select 
    *,
    sum( (t.ag).cost ) over ( partition by agreement_id ) as total
  from initial() t
)
select * from totals_by_ag t
where agreement_id = 1; -- index is used when alias for column is used

explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
  select 
    *,
    sum( (t.ag).cost ) over ( partition by (t.ag).ag_id ) as total --renamed
  from initial() t
)
select * from totals_by_ag t
where agreement_id = 1; -- index is NOT USED because grouping by original column

explain( analyze, costs, buffers, verbose ) with totals_by_ag as (
  select 
    *,
    sum( (t.ag).cost ) over ( partition by (t.ag).ag_id ) as total --renamed
  from initial() t
)
select * from totals_by_ag t
where (t.ag).ag_id = 1; -- index is NOT USED even if at both cases original column