I need a little bit of help with a PostgreSQL query. I have the following SELECT query which needs about of 30 seconds to run on a table with around 100.000 and 200.000 entries.
SELECT i.id, i.debit_nr, i.pat_id, i.pat_name, i.invoice_id, i.invoice_date, i.due_date, i.client_short, i.payment, i.payment_option, i.marker, i.comment, sum(t.Sum) AS i_sum, i.import_date
FROM invoices AS i
LEFT JOIN invoice_items AS t ON t.invoice_id = i.id
JOIN jobs AS j ON i.job_id = j.id
GROUP BY i.id
I figured out that the part which seems to be slow is only the SELECT on the invoices table, because if i run
SELECT i.id, i.debit_nr, i.pat_id, i.pat_name, i.invoice_id, i.invoice_date,
i.due_date, i.client_short, i.payment, i.payment_option, i.marker, i.comment, i.import_date
FROM invoices AS i
it needs almost the same time.
GroupAggregate (cost=63048.71..65737.16 rows=110203 width=76) (actual time=1421.792..1785.528 rows=110203 loops=1)
Group Key: i.id
-> Sort (cost=63048.71..63577.52 rows=211523 width=76) (actual time=1421.772..1573.998 rows=211527 loops=1)
Sort Key: i.id
Sort Method: external merge Disk: 19944kB
-> Hash Right Join (cost=24793.35..34938.02 rows=211523 width=76) (actual time=473.877..1010.362 rows=211527 loops=1)
Hash Cond: (t.invoice_id = i.id)
-> Seq Scan on invoice_items t (cost=0.00..3878.23 rows=211523 width=12) (actual time=0.035..112.034 rows=211523 loops=1)
-> Hash (cost=22123.81..22123.81 rows=110203 width=72) (actual time=472.566..472.566 rows=110203 loops=1)
Buckets: 65536 Batches: 4 Memory Usage: 3592kB
-> Hash Join (cost=777.49..22123.81 rows=110203 width=72) (actual time=7.784..334.883 rows=110203 loops=1)
Hash Cond: (i.job_id = j.id)
-> Seq Scan on invoices i (cost=0.00..19831.03 rows=110203 width=76) (actual time=0.005..170.120 rows=110203 loops=1)
-> Hash (cost=705.55..705.55 rows=5755 width=8) (actual time=7.707..7.707 rows=5755 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 289kB
-> Seq Scan on jobs j (cost=0.00..705.55 rows=5755 width=8) (actual time=0.004..4.741 rows=5755 loops=1)
Planning time: 0.874 ms
Execution time: 1824.846 ms
The problem is, it doesn't matter if i add an index over the id field or all fields in need in this select.
How can I speed it up?
PS: It's PostgreSQL 9.0 on a Windows Server.
Try writing the query using a correlated subquery:
Avoiding the outer aggregation might help the performance (although Postgres has a good optimizer so that is not always true. You want in index on
invoice-items, invoice_id, sum. I leftjobs` out of the query because it does not seem to be used.