Query is slow with indexes - how to understand execution plans?

66 Views Asked by At

I need help to understand why my query is slower when I use index than without any index. I ran explain analyze command, and below are execution plans option 1 - with index, and option 2 - without index.

Can someone explain to me why index makes performances worse in those execution plans?

PS. When I add 10 million rows to table (original size 2M), situation is turning in favor of index, and in that case query with index is 3x faster).

OPTION 1 WITH INDEX FOR LEFT JOIN invoice_id+acct_level ON TABLE cost_invoice_facepage AND CONDITION (cdb.invoice_id = invoice_id) AND (acct_level = 1)

Append  (cost=48.87..38583.97 rows=163773 width=371) (actual time=1.269..1516.564 rows=379129 loops=1)
  ->  Nested Loop  (cost=48.87..10520.11 rows=36504 width=362) (actual time=1.268..5.986 rows=579 loops=1)
        ->  Hash Left Join  (cost=44.66..9918.22 rows=507 width=322) (actual time=1.160..5.497 rows=579 loops=1)
              Hash Cond: (cd.gl_string_id = gs.id)
              ->  Nested Loop Left Join  (cost=0.85..9873.07 rows=507 width=262) (actual time=0.485..4.473 rows=579 loops=1)
                    Filter: ((c.gl_rule_type IS NULL) OR ((cd.charge_id IS NOT NULL) AND (c.gl_rule_type_id <> ALL ('{60,70}'::integer[]))))
                    ->  Index Scan using cost_invoice_charge_invoice_id_idx on cost_invoice_charge c  (cost=0.43..1204.53 rows=1188 width=243) (actual time=0.467..2.664 rows=579 loops=1)
                          Index Cond: (invoice_id = 14517)
                          Filter: ((chg_amt <> '0'::numeric) AND ((gl_rule_type IS NULL) OR (gl_rule_type_id <> ALL ('{60,70}'::integer[]))))
                          Rows Removed by Filter: 3364
                    ->  Index Scan using "gl_charge_detail.charge_id->cost_invoice_info_only.id" on gl_charge_detail cd  (cost=0.42..7.28 rows=1 width=27) (actual time=0.002..0.002 rows=1 loops=579)
                          Index Cond: (c.id = charge_id)
              ->  Hash  (cost=31.69..31.69 rows=969 width=64) (actual time=0.657..0.657 rows=969 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 103kB
                    ->  Seq Scan on gl_strings gs  (cost=0.00..31.69 rows=969 width=64) (actual time=0.026..0.389 rows=969 loops=1)
        ->  Materialize  (cost=4.22..145.78 rows=72 width=44) (actual time=0.000..0.000 rows=1 loops=579)
              ->  Hash Left Join  (cost=4.22..145.42 rows=72 width=44) (actual time=0.100..0.102 rows=1 loops=1)
                    Hash Cond: (f.vendor_id = vn.id)
                    ->  Nested Loop  (cost=0.57..141.57 rows=72 width=31) (actual time=0.027..0.029 rows=1 loops=1)
                          ->  Index Scan using cost_invoice_header_id_idx on cost_invoice_header ch  (cost=0.29..8.31 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1)
                                Index Cond: (id = 14517)
                                Filter: (status_code <> ALL ('{100,101,102,490}'::integer[]))
                          ->  Index Scan using "invoice_id+acct_level" on cost_invoice_facepage f  (cost=0.29..132.55 rows=72 width=31) (actual time=0.013..0.013 rows=1 loops=1)
                                Index Cond: ((invoice_id = 14517) AND (acct_level = 1))
                    ->  Hash  (cost=2.73..2.73 rows=73 width=17) (actual time=0.061..0.061 rows=73 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 12kB
                          ->  Seq Scan on appdata_vendor_common vn  (cost=0.00..2.73 rows=73 width=17) (actual time=0.020..0.038 rows=73 loops=1)
  ->  Hash Left Join  (cost=2276.48..25607.26 rows=127269 width=374) (actual time=204.117..1486.717 rows=378550 loops=1)
        Hash Cond: (f_1.vendor_id = vn_1.id)
        ->  Nested Loop Left Join  (cost=2272.84..25250.14 rows=127269 width=361) (actual time=204.072..1328.491 rows=378550 loops=1)
              ->  Gather  (cost=2272.55..24385.32 rows=2663 width=338) (actual time=204.055..335.965 rows=378550 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Hash Left Join  (cost=1272.55..23119.02 rows=1110 width=338) (actual time=127.365..321.126 rows=126183 loops=3)
                          Hash Cond: (cdb.gl_string_id = gs_1.id)
                          ->  Hash Join  (cost=1228.74..23072.30 rows=1110 width=278) (actual time=126.126..263.315 rows=126183 loops=3)
                                Hash Cond: (cdb.charge_id = c_1.id)
                                ->  Parallel Seq Scan on gl_charge_detail_ban cdb  (cost=0.00..20581.15 rows=480915 width=43) (actual time=0.270..109.543 rows=384732 loops=3)
                                ->  Hash  (cost=1194.13..1194.13 rows=2769 width=239) (actual time=7.232..7.232 rows=3929 loops=3)
                                      Buckets: 4096  Batches: 1  Memory Usage: 635kB
                                      ->  Index Scan using cost_invoice_charge_invoice_id_idx on cost_invoice_charge c_1  (cost=0.43..1194.13 rows=2769 width=239) (actual time=0.070..4.686 rows=3929 loops=3)
                                            Index Cond: (invoice_id = 14517)
                                            Filter: (chg_amt <> '0'::numeric)
                                            Rows Removed by Filter: 14
                          ->  Hash  (cost=31.69..31.69 rows=969 width=64) (actual time=1.127..1.127 rows=969 loops=3)
                                Buckets: 1024  Batches: 1  Memory Usage: 103kB
                                ->  Seq Scan on gl_strings gs_1  (cost=0.00..31.69 rows=969 width=64) (actual time=0.165..0.714 rows=969 loops=3)
              ->  Index Scan using "invoice_id+acct_level" on cost_invoice_facepage f_1  (cost=0.29..0.31 rows=1 width=31) (actual time=0.001..0.002 rows=1 loops=378550)
                    Index Cond: ((cdb.invoice_id = invoice_id) AND (acct_level = 1))
        ->  Hash  (cost=2.73..2.73 rows=73 width=17) (actual time=0.035..0.035 rows=73 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 12kB
              ->  Seq Scan on appdata_vendor_common vn_1  (cost=0.00..2.73 rows=73 width=17) (actual time=0.014..0.021 rows=73 loops=1)
Planning Time: 3.636 ms
Execution Time: 1550.844 ms

and OPTION 2 WITHOUT INDEXES

Append  (cost=48.58..43257.20 rows=163773 width=371) (actual time=7.965..831.408 rows=379129 loops=1)
  ->  Nested Loop  (cost=48.58..12251.68 rows=36504 width=362) (actual time=7.965..14.476 rows=579 loops=1)
        ->  Hash Left Join  (cost=44.66..9918.22 rows=507 width=322) (actual time=0.588..6.245 rows=579 loops=1)
              Hash Cond: (cd.gl_string_id = gs.id)
              ->  Nested Loop Left Join  (cost=0.85..9873.07 rows=507 width=262) (actual time=0.245..5.442 rows=579 loops=1)
                    Filter: ((c.gl_rule_type IS NULL) OR ((cd.charge_id IS NOT NULL) AND (c.gl_rule_type_id <> ALL ('{60,70}'::integer[]))))
                    ->  Index Scan using cost_invoice_charge_invoice_id_idx on cost_invoice_charge c  (cost=0.43..1204.53 rows=1188 width=243) (actual time=0.231..3.003 rows=579 loops=1)
                          Index Cond: (invoice_id = 14517)
                          Filter: ((chg_amt <> '0'::numeric) AND ((gl_rule_type IS NULL) OR (gl_rule_type_id <> ALL ('{60,70}'::integer[]))))
                          Rows Removed by Filter: 3364
                    ->  Index Scan using "gl_charge_detail.charge_id->cost_invoice_info_only.id" on gl_charge_detail cd  (cost=0.42..7.28 rows=1 width=27) (actual time=0.003..0.003 rows=1 loops=579)
                          Index Cond: (c.id = charge_id)
              ->  Hash  (cost=31.69..31.69 rows=969 width=64) (actual time=0.331..0.331 rows=969 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 103kB
                    ->  Seq Scan on gl_strings gs  (cost=0.00..31.69 rows=969 width=64) (actual time=0.017..0.183 rows=969 loops=1)
        ->  Materialize  (cost=3.93..1877.35 rows=72 width=44) (actual time=0.013..0.013 rows=1 loops=579)
              ->  Hash Left Join  (cost=3.93..1876.99 rows=72 width=44) (actual time=7.370..7.698 rows=1 loops=1)
                    Hash Cond: (f.vendor_id = vn.id)
                    ->  Nested Loop  (cost=0.29..1873.14 rows=72 width=31) (actual time=7.307..7.635 rows=1 loops=1)
                          ->  Index Scan using cost_invoice_header_id_idx on cost_invoice_header ch  (cost=0.29..8.31 rows=1 width=4) (actual time=0.011..0.013 rows=1 loops=1)
                                Index Cond: (id = 14517)
                                Filter: (status_code <> ALL ('{100,101,102,490}'::integer[]))
                          ->  Seq Scan on cost_invoice_facepage f  (cost=0.00..1864.12 rows=72 width=31) (actual time=7.293..7.619 rows=1 loops=1)
                                Filter: ((invoice_id = 14517) AND (acct_level = 1))
                                Rows Removed by Filter: 40340
                    ->  Hash  (cost=2.73..2.73 rows=73 width=17) (actual time=0.045..0.045 rows=73 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 12kB
                          ->  Seq Scan on appdata_vendor_common vn  (cost=0.00..2.73 rows=73 width=17) (actual time=0.022..0.028 rows=73 loops=1)
  ->  Hash Left Join  (cost=4248.29..28548.92 rows=127269 width=374) (actual time=234.692..789.334 rows=378550 loops=1)
        Hash Cond: (cdb.invoice_id = f_1.invoice_id)
        ->  Gather  (cost=2272.55..24385.32 rows=2663 width=338) (actual time=216.507..376.349 rows=378550 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Hash Left Join  (cost=1272.55..23119.02 rows=1110 width=338) (actual time=128.932..389.669 rows=126183 loops=3)
                    Hash Cond: (cdb.gl_string_id = gs_1.id)
                    ->  Hash Join  (cost=1228.74..23072.30 rows=1110 width=278) (actual time=127.984..308.092 rows=126183 loops=3)
                          Hash Cond: (cdb.charge_id = c_1.id)
                          ->  Parallel Seq Scan on gl_charge_detail_ban cdb  (cost=0.00..20581.15 rows=480915 width=43) (actual time=0.163..117.001 rows=384732 loops=3)
                          ->  Hash  (cost=1194.13..1194.13 rows=2769 width=239) (actual time=8.779..8.779 rows=3929 loops=3)
                                Buckets: 4096  Batches: 1  Memory Usage: 635kB
                                ->  Index Scan using cost_invoice_charge_invoice_id_idx on cost_invoice_charge c_1  (cost=0.43..1194.13 rows=2769 width=239) (actual time=0.050..5.563 rows=3929 loops=3)
                                      Index Cond: (invoice_id = 14517)
                                      Filter: (chg_amt <> '0'::numeric)
                                      Rows Removed by Filter: 14
                    ->  Hash  (cost=31.69..31.69 rows=969 width=64) (actual time=0.829..0.829 rows=969 loops=3)
                          Buckets: 1024  Batches: 1  Memory Usage: 103kB
                          ->  Seq Scan on gl_strings gs_1  (cost=0.00..31.69 rows=969 width=64) (actual time=0.184..0.534 rows=969 loops=3)
        ->  Hash  (cost=1804.87..1804.87 rows=13670 width=44) (actual time=18.101..18.101 rows=13705 loops=1)
              Buckets: 16384  Batches: 1  Memory Usage: 1198kB
              ->  Hash Left Join  (cost=3.64..1804.87 rows=13670 width=44) (actual time=0.075..14.009 rows=13705 loops=1)
                    Hash Cond: (f_1.vendor_id = vn_1.id)
                    ->  Seq Scan on cost_invoice_facepage f_1  (cost=0.00..1763.26 rows=13670 width=31) (actual time=0.017..6.216 rows=13705 loops=1)
                          Filter: (acct_level = 1)
                          Rows Removed by Filter: 26636
                    ->  Hash  (cost=2.73..2.73 rows=73 width=17) (actual time=0.052..0.052 rows=73 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 12kB
                          ->  Seq Scan on appdata_vendor_common vn_1  (cost=0.00..2.73 rows=73 width=17) (actual time=0.013..0.027 rows=73 loops=1)
Planning Time: 3.365 ms
Execution Time: 863.941 ms
1

There are 1 best solutions below

3
On BEST ANSWER

Look at the line that is driving the iteration over the index scan:

Gather  (cost=2272.55..24385.32 rows=2663 width=338) (actual time=204.055..335.965 rows=378550 loops=1)

It thinks the index scan will get iterated 2663 times (with a different value of invoice_id for each one) but it really gets iterated 378550 times, (this latter number is where the 'loops' field on the index scan comes from), a difference of 140 fold. Every time you hit the index, you need to re-descend from the root to the leaf, locking and unlocking pages as you go. While this is not terribly expensive, it does add up if you do it 378550 times. It gets to be faster to process the table in bulk into a private hash table. But since the estimated row count is so wrong, PostgreSQL doesn't realize that in this case.