How to fix this slow Postgres select query

226 Views Asked by pyr0t0n At 27 August 2020 at 11:02

I need a little bit of help with a PostgreSQL query. I have the following SELECT query which needs about of 30 seconds to run on a table with around 100.000 and 200.000 entries.

SELECT i.id, i.debit_nr, i.pat_id, i.pat_name, i.invoice_id, i.invoice_date, i.due_date, i.client_short, i.payment, i.payment_option, i.marker, i.comment, sum(t.Sum) AS i_sum, i.import_date 
FROM invoices AS i 
   LEFT JOIN invoice_items AS t ON t.invoice_id = i.id 
   JOIN jobs AS j ON i.job_id = j.id 
GROUP BY i.id

I figured out that the part which seems to be slow is only the SELECT on the invoices table, because if i run

SELECT i.id, i.debit_nr, i.pat_id, i.pat_name, i.invoice_id, i.invoice_date, 
i.due_date, i.client_short, i.payment, i.payment_option, i.marker, i.comment, i.import_date 
FROM invoices AS i

it needs almost the same time.

GroupAggregate  (cost=63048.71..65737.16 rows=110203 width=76) (actual time=1421.792..1785.528 rows=110203 loops=1)
  Group Key: i.id
  ->  Sort  (cost=63048.71..63577.52 rows=211523 width=76) (actual time=1421.772..1573.998 rows=211527 loops=1)
        Sort Key: i.id
        Sort Method: external merge  Disk: 19944kB
        ->  Hash Right Join  (cost=24793.35..34938.02 rows=211523 width=76) (actual time=473.877..1010.362 rows=211527 loops=1)
              Hash Cond: (t.invoice_id = i.id)
              ->  Seq Scan on invoice_items t  (cost=0.00..3878.23 rows=211523 width=12) (actual time=0.035..112.034 rows=211523 loops=1)
              ->  Hash  (cost=22123.81..22123.81 rows=110203 width=72) (actual time=472.566..472.566 rows=110203 loops=1)
                    Buckets: 65536  Batches: 4  Memory Usage: 3592kB
                    ->  Hash Join  (cost=777.49..22123.81 rows=110203 width=72) (actual time=7.784..334.883 rows=110203 loops=1)
                          Hash Cond: (i.job_id = j.id)
                          ->  Seq Scan on invoices i  (cost=0.00..19831.03 rows=110203 width=76) (actual time=0.005..170.120 rows=110203 loops=1)
                          ->  Hash  (cost=705.55..705.55 rows=5755 width=8) (actual time=7.707..7.707 rows=5755 loops=1)
                                Buckets: 8192  Batches: 1  Memory Usage: 289kB
                                ->  Seq Scan on jobs j  (cost=0.00..705.55 rows=5755 width=8) (actual time=0.004..4.741 rows=5755 loops=1)
Planning time: 0.874 ms
Execution time: 1824.846 ms

The problem is, it doesn't matter if i add an index over the id field or all fields in need in this select.

How can I speed it up?

PS: It's PostgreSQL 9.0 on a Windows Server.

Original Q&A

There are 1 best solutions below

Gordon Linoff On 27 August 2020 at 11:06

Try writing the query using a correlated subquery:

SELECT i.*,
       (SELECT SUM(it.Sum) 
        FROM invoice_items it
        WHERE it.invoice_id = i.id
       ) as i_sum
FROM invoices i ;

Avoiding the outer aggregation might help the performance (although Postgres has a good optimizer so that is not always true. You want in index on invoice-items, invoice_id, sum. I left jobs` out of the query because it does not seem to be used.

How to fix this slow Postgres select query

There are 1 best solutions below

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in QUERY-PERFORMANCE

Related Questions in PGADMIN

Related Questions in POSTGRESQL-9.0

Trending Questions

Popular # Hahtags

Popular Questions