I am trying to reduce the query execution time of the query given below. It joins 3 tables to get the data from very big Postgres tables, I have tried to introduce all the necessary indexes on relevant tables but still, the query is taking too long. The total size of the database is around 2TB. Query:
explain (ANALYZE, COSTS, VERBOSE, BUFFERS)
with au as (
select tbl2.client, tbl2.uid
from tbl2 where tbl2.client = '123kkjk444kjkhj3ddd'
and (tbl2.property->>'num') IN ('1', '2', '3', '31', '12a', '45', '78', '99')
)
SELECT tbl1.id,
CASE WHEN tbl3.displayname IS NOT NULL THEN tbl3.displayname ELSE tbl1.name END AS name,
tbl1.tbl3number, tbl3.originalname as orgtbl3
FROM table_1 tbl1
inner JOIN au tbl2 ON tbl2.client = '123kkjk444kjkhj3ddd' AND tbl2.uid = tbl1.uid
LEFT JOIN tbl3 ON tbl3.client = '123kkjk444kjkhj3ddd' AND tbl3.originalname = tbl1.name
WHERE tbl1.client = '123kkjk444kjkhj3ddd'
AND tbl1.date_col BETWEEN '2021-08-01T05:32:40Z' AND '2021-08-29T05:32:40Z'
ORDER BY tbl1.date_col DESC, tbl1.sid, tbl1.tbl3number
LIMIT 50000;
I have the above Query running but the query execution even after the index scan is very slow. I have attached the Query plan. Query Plan:
-> Limit (cost=7272.83..7272.86 rows=14 width=158) (actual time=40004.140..40055.737 rows=871 loops=1)
Output: tbl1.id, (CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END), tbl1.tbl3number, tbl3.originalsc
reenname, tbl1.date_col
Buffers: shared hit=249656881 dirtied=32
-> Sort (cost=7272.83..7272.86 rows=14 width=158) (actual time=40004.139..40055.671 rows=871 loops=1)
Output: tbl1.id, (CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END), tbl1.tbl3number, tbl3.orig
inalname, tbl1.date_col
Sort Key: tbl1.date_col DESC, tbl1.id, tbl1.tbl3number
Sort Method: quicksort Memory: 142kB
Buffers: shared hit=249656881 dirtied=32
-> Gather (cost=1001.39..7272.56 rows=14 width=158) (actual time=9147.574..40055.005 rows=871 loops=1)
Output: tbl1.id, (CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END), tbl1.tbl3number, scree
n.originalname, tbl1.date_col
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=249656881 dirtied=32
-> Nested Loop Left Join (cost=1.39..6271.16 rows=4 width=158) (actual time=3890.074..39998.436 rows=174 loops=5)
Output: tbl1.id, CASE WHEN (tbl3.displayname IS NOT NULL) THEN tbl3.displayname ELSE tbl1.name END, tbl1.tbl3number, s
creen.originalname, tbl1.date_col
Inner Unique: true
Buffers: shared hit=249656881 dirtied=32
Worker 0: actual time=1844.246..39996.744 rows=182 loops=1
Buffers: shared hit=49568277 dirtied=5
Worker 1: actual time=3569.032..39997.124 rows=210 loops=1
Buffers: shared hit=49968461 dirtied=10
Worker 2: actual time=2444.911..39997.561 rows=197 loops=1
Buffers: shared hit=49991521 dirtied=2
Worker 3: actual time=2445.013..39998.065 rows=110 loops=1
Buffers: shared hit=49670445 dirtied=10
-> Nested Loop (cost=1.12..6269.94 rows=4 width=610) (actual time=3890.035..39997.924 rows=174 loops=5)
Output: tbl1.id, tbl1.name, tbl1.tbl3number, tbl1.date_col
Inner Unique: true
Buffers: shared hit=249655135 dirtied=32
Worker 0: actual time=1844.200..39996.206 rows=182 loops=1
Buffers: shared hit=49567912 dirtied=5
Worker 1: actual time=3568.980..39996.522 rows=210 loops=1
Buffers: shared hit=49968040 dirtied=10
Worker 2: actual time=2444.872..39996.987 rows=197 loops=1
Buffers: shared hit=49991126 dirtied=2
Worker 3: actual time=2444.965..39997.712 rows=110 loops=1
Buffers: shared hit=49670224 dirtied=10
-> Parallel Index Only Scan using idx_sv_cuf8_110523 on public.table_1_110523 tbl1 (cost=0.69..5692.16 rows=220 width=692) (actual time=0.059..1458.129 rows=2922506 loops=5)
Output: tbl1.client, tbl1.id, tbl1.tbl3number, tbl1.date_col, tbl1.id, tbl1.name
Index Cond: ((tbl1.client = '123kkjk444kjkhj3ddd'::text) AND (tbl1.date_col >= '2021-08-01 05:32:40+00'::timestamp with time zone) AND (tbl1.date_col <= '2021-08-29 05:32:40+00'::timestamp with time zone))
Heap Fetches: 0
Buffers: shared hit=538663
Worker 0: actual time=0.059..1479.907 rows=2912875 loops=1
Buffers: shared hit=107477
Worker 1: actual time=0.100..1475.863 rows=2930306 loops=1
Buffers: shared hit=107817
Worker 2: actual time=0.054..1481.032 rows=2925849 loops=1
Buffers: shared hit=107812
Worker 3: actual time=0.058..1477.443 rows=2897544 loops=1
Buffers: shared hit=107047
-> Index Scan using tbl2_pkey_102328 on public.tbl2_102328 tbl2_1 (cost=0.43..2.63 rows=1 width=25) (actual time=0.013..0.013 rows=0 loops=14612531)
Output: tbl2_1.id
Index Cond: (((tbl2_1.id)::text = (tbl1.id)::text) AND ((tbl2_1.client)::text = '123kkjk444kjkhj3ddd'::text))
Filter: ((tbl2_1.property ->> 'num'::text) = ANY ('{"1","2","3","31","12a","45","78","99"}'::text[]))
Rows Removed by Filter: 1
Buffers: shared hit=249116472 dirtied=32
Worker 0: actual time=0.013..0.013 rows=0 loops=2912875
Buffers: shared hit=49460435 dirtied=5
Worker 1: actual time=0.013..0.013 rows=0 loops=2930306
Buffers: shared hit=49860223 dirtied=10
Worker 2: actual time=0.013..0.013 rows=0 loops=2925849
Buffers: shared hit=49883314 dirtied=2
Worker 3: actual time=0.013..0.013 rows=0 loops=2897544
Buffers: shared hit=49563177 dirtied=10
-> Index Scan using tbl3_unikey_104219 on public.tbl3_104219 tbl3 (cost=0.27..0.30 rows=1 width=52) (actual time=0.002..0.002 rows=0 loops=871)
Output: tbl3.client, tbl3.originalname, tbl3.displayname
Index Cond: (((tbl3.client)::text = '123kkjk444kjkhj3ddd'::text) AND ((tbl3.originalname)::text = (tbl1.name)::text))
Buffers: shared hit=1746
Worker 0: actual time=0.002..0.002 rows=0 loops=182
Buffers: shared hit=365
Worker 1: actual time=0.002..0.002 rows=0 loops=210
Buffers: shared hit=421
Worker 2: actual time=0.002..0.002 rows=0 loops=197
Buffers: shared hit=395
Worker 3: actual time=0.002..0.002 rows=0 loops=110
Buffers: shared hit=221
Planning Time: 0.361 ms
Execution Time: 40056.008 ms
Planning Time: 0.589 ms
Execution Time: 40071.485 ms
(89 rows)
Time: 40072.986 ms (00:40.073)
Can this query be further optimized to reduce the query execution time? Thank you in advance for your input.
The table definitions are as follows:
Table "public.tbl1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
client | character varying(32) | | not null | | extended | |
sid | character varying(32) | | not null | | extended | |
uid | character varying(32) | | | | extended | |
id | character varying(32) | | | | extended | |
tbl3number | integer | | not null | | plain | |
name | character varying(255) | | | | extended | |
date_col | timestamp without time zone | | | | plain | |
Indexes:
idx_sv_cuf8_110523(client,date_col desc,sid,tbl3number)
Table "public.tbl2"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------------------+-----------------------------+-----------+----------+-------------------------+----------+--------------+-------------
id | character varying(32) | | not null | | extended | |
uid | character varying(255) | | | NULL::character varying | extended | |
client | character varying(32) | | not null | | extended | |
property | jsonb | | | | extended | |
Indexes:
"tbl2_pkey" PRIMARY KEY, btree (uid, client)
--
Table "public.tbl3"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
client | character varying(500) | | not null | | extended | |
originalname | character varying(500) | | | | extended | |
displayname | character varying(500) | | | | extended | |
Indexes:
"tbl3_unikey" UNIQUE CONSTRAINT, btree (client, originalname)
Your first query uses JSON and operate a filter (restriction) inside the JSON structure to find data :
This part of the WHERE predicate is not "sargable". So the only maner to answer your demand, is to scan every row in the table tbl2 and then for every row to scan the json text stream to find the desired value.
So the iteration is a kind of cross product between row cardinality of the table and parts of the JSON.
There is no way to optimize such a query...
Every time you will introduce an objects (in your query a JSON) which have an iterative comportement while querying it, inside a dataset that can be retrieve using set based algorithms (index, parallelism...) the result is to scan and scan, and scan...
Actually JSON cannot be indexed. PostgreSQL does not accepts JSON indexes nor XML ones, in contrary to DB2, Oracle or SQL Server that are able to create specialized indexes on XML...