This query below takes way too long to execute
Number of records in each table:
- stories - 416,355 records
- event_relations - 24,050,862 records
- districts - 10 records
- industries - 100 records
- industry_districts - 40 records
- subsectors - 200 records
- supply - 2,869 records
select s.id, s.description, count(distinct et.id) from
"district s" s
inner join
(
select id.id from "industry_districts" id
) as id2 on id2.id = s.id
inner join
(
select id, industry_districts_id from "industries" i
) as i2 on i2.industry_districts_id = id2.id
inner join
(
select id, industries_id from "subsectors" si
) as si2 on si2.industries_id = i2.id
inner join
(
select id, subsectors_id from "supplys" rs
) as rs2 on rs2.subsectors_id = si2.id
inner join
(
select id, supplys_id, events_id from "event_relations" ers
) as ers2 on ers2.supplys_id = rs2.id
left join (
select id, events_id from "stories"
) et on et.events_id = ers2.events_id
group by s.id
limit 10 offset 0
**Explain analyze gives the following result **
Limit (cost=0.99..20719301.15 rows=10 width=130) (actual time=2107.465..214891.582 rows=10 loops=1)
-> GroupAggregate (cost=0.99..24863161.18 rows=12 width=130) (actual time=2107.462..214891.567 rows=10 loops=1)
Group Key: s.id
-> Nested Loop Left Join (cost=0.99..23466091.79 rows=279413855 width=126) (actual time=2.073..141429.178 rows=104217089 loops=1)
-> Nested Loop (cost=0.57..607232.85 rows=11644034 width=126) (actual time=2.055..62770.206 rows=8174966 loops=1)
-> Nested Loop (cost=0.14..3598.24 rows=1389 width=126) (actual time=2.009..56.815 rows=927 loops=1)
Join Filter: (si.id = rs.subindustries_id)
Rows Removed by Join Filter: 226384
-> Nested Loop (cost=0.14..117.68 rows=77 width=126) (actual time=1.004..2.502 rows=80 loops=1)
Join Filter: (i.id = si.industries_id)
Rows Removed by Join Filter: 5309
-> Nested Loop (cost=0.14..32.60 rows=34 width=126) (actual time=0.513..0.775 rows=34 loops=1)
Join Filter: (ig.id = i.industry_groups_id)
Rows Removed by Join Filter: 671
-> Nested Loop (cost=0.14..18.13 rows=12 width=126) (actual time=0.443..0.507 rows=11 loops=1)
Join Filter: (s.id = ig.id)
Rows Removed by Join Filter: 126
-> Index Scan using "PK_Sectors" on "Sectors" s (cost=0.14..12.31 rows=12 width=122) (actual time=0.402..0.417 rows=11 loops=1)
-> Materialize (cost=0.00..1.38 rows=25 width=4) (actual time=0.002..0.004 rows=12 loops=11)
-> Seq Scan on "Industry_Groups" ig (cost=0.00..1.25 rows=25 width=4) (actual time=0.011..0.019 rows=25 loops=1)
-> Materialize (cost=0.00..2.05 rows=70 width=8) (actual time=0.001..0.013 rows=64 loops=11)
-> Seq Scan on "Industries" i (cost=0.00..1.70 rows=70 width=8) (actual time=0.007..0.023 rows=70 loops=1)
-> Materialize (cost=0.00..4.38 rows=159 width=8) (actual time=0.010..0.028 rows=158 loops=34)
-> Seq Scan on "Subindustries" si (cost=0.00..3.59 rows=159 width=8) (actual time=0.345..0.386 rows=159 loops=1)
-> Materialize (cost=0.00..174.03 rows=2869 width=8) (actual time=0.007..0.307 rows=2841 loops=80)
-> Seq Scan on "supplys" rs (cost=0.00..159.69 rows=2869 width=8) (actual time=0.578..4.098 rows=2869 loops=1)
-> Index Scan using "IXFK_event_relations_supplys" on "event_relations" ers (cost=0.44..350.07 rows=8451 width=8) (actual time=0.103..65.007 rows=8819 loops=927)
Index Cond: (supplys_id = rs.id)
-> Index Scan using "IXFK_Event_Tweets_Katalyst_Events" on "Event_Tweets" (cost=0.42..1.40 rows=56 width=8) (actual time=0.003..0.006 rows=12 loops=8174966)
Index Cond: (katalyst_events_id = ers.katalyst_events_id)
Planning Time: 16.839 ms
Execution Time: 214891.868 ms
explain(analyze, verbose, buffers):
Limit (cost=0.99..20719301.15 rows=10 width=130) (actual time=2751.614..222343.984 rows=10 loops=1)
Output: s.id, s.description, (count(DISTINCT "stories".id))
Buffers: shared hit=29275971 read=6562620, temp read=331820 written=332375
-> GroupAggregate (cost=0.99..24863161.18 rows=12 width=130) (actual time=2751.601..222342.173 rows=10 loops=1)
Output: s.id, s.description, count(DISTINCT "stories".id)
Group Key: s.id
Buffers: shared hit=29275971 read=6562620, temp read=331820 written=332375
-> Nested Loop Left Join (cost=0.99..23466091.79 rows=279413855 width=126) (actual time=2.785..147622.650 rows=104217089 loops=1)
Output: s.id, s.description, "stories".id
Buffers: shared hit=29275971 read=6562620
-> Nested Loop (cost=0.57..607232.85 rows=11644034 width=126) (actual time=2.755..65731.047 rows=8174966 loops=1)
Output: s.id, s.description, ers.events_id
Buffers: shared hit=2084415 read=6047838
-> Nested Loop (cost=0.14..3598.24 rows=1389 width=126) (actual time=2.013..60.815 rows=927 loops=1)
Output: s.id, s.description, rs.id
Join Filter: (si.id = rs.subdistricts_id)
Rows Removed by Join Filter: 226384
Buffers: shared hit=5 read=132
-> Nested Loop (cost=0.14..117.68 rows=77 width=126) (actual time=0.984..2.532 rows=80 loops=1)
Output: s.id, s.description, si.id
Join Filter: (i.id = si.industries_id)
Rows Removed by Join Filter: 5309
Buffers: shared hit=5 read=1
-> Nested Loop (cost=0.14..32.60 rows=34 width=126) (actual time=0.840..1.093 rows=34 loops=1)
Output: s.id, s.description, i.id
Join Filter: (ig.id = i.industry_districts_id)
Rows Removed by Join Filter: 671
Buffers: shared hit=3 read=1
-> Nested Loop (cost=0.14..18.13 rows=12 width=126) (actual time=0.759..0.820 rows=11 loops=1)
Output: s.id, s.description, ig.id
Inner Unique: true
Join Filter: (s.id = ig.id)
Rows Removed by Join Filter: 126
Buffers: shared hit=2 read=1
-> Index Scan using "PK_districts" on public."districts" s (cost=0.14..12.31 rows=12 width=122) (actual time=0.379..0.392 rows=11 loops=1)
Output: s.id, s.description, s.active_flag, s.shared_district_id, s.additional_text, s.assets
Buffers: shared hit=1 read=1
-> Materialize (cost=0.00..1.38 rows=25 width=4) (actual time=0.031..0.034 rows=12 loops=11)
Output: ig.id
Buffers: shared hit=1
-> Seq Scan on public."industry_districts" ig (cost=0.00..1.25 rows=25 width=4) (actual time=0.043..0.048 rows=25 loops=1)
Output: ig.id
Buffers: shared hit=1
-> Materialize (cost=0.00..2.05 rows=70 width=8) (actual time=0.003..0.013 rows=64 loops=11)
Output: i.industry_districts_id, i.id
Buffers: shared hit=1
-> Seq Scan on public."industries" i (cost=0.00..1.70 rows=70 width=8) (actual time=0.021..0.033 rows=70 loops=1)
Output: i.industry_districts_id, i.id
Buffers: shared hit=1
-> Materialize (cost=0.00..4.38 rows=159 width=8) (actual time=0.001..0.017 rows=158 loops=34)
Output: si.industries_id, si.id
Buffers: shared hit=2
-> Seq Scan on public."subdistricts" si (cost=0.00..3.59 rows=159 width=8) (actual time=0.018..0.054 rows=159 loops=1)
Output: si.industries_id, si.id
Buffers: shared hit=2
-> Materialize (cost=0.00..174.03 rows=2869 width=8) (actual time=0.007..0.315 rows=2841 loops=80)
Output: rs.subdistricts_id, rs.id
Buffers: shared read=131
-> Seq Scan on public."supplys" rs (cost=0.00..159.69 rows=2869 width=8) (actual time=0.505..3.618 rows=2869 loops=1)
Output: rs.subdistricts_id, rs.id
Buffers: shared read=131
-> Index Scan using "IXFK_event_relations_supplys" on public."event_relations" ers (cost=0.44..350.07 rows=8451 width=8) (actual time=0.112..67.905 rows=8819 loops=927)
Output: ers.id, ers.events_id, ers.supplys_id, ers.additional_information
Index Cond: (ers.supplys_id = rs.id)
Buffers: shared hit=2084410 read=6047706
-> Index Scan using "IXFK_stories_events" on public."stories" (cost=0.42..1.40 rows=56 width=8) (actual time=0.003..0.006 rows=12 loops=8174966)
Output: "stories".id, "stories".events_id,"stories".published_at
Index Cond: ("stories".events_id = ers.events_id)
Buffers: shared hit=27191556 read=514782
Planning:
Buffers: shared hit=57 read=23
Planning Time: 30.368 ms
Execution Time: 222346.279 ms
Indexes created
CREATE INDEX "IXFK_event_relations_events" ON public."event_relations" USING btree (events_id);
CREATE INDEX "IXFK_event_relations_supplys" ON public."event_relations" USING btree (supplys_id);
CREATE UNIQUE INDEX "PK_event_relations" ON public."event_relations" USING btree (id);
CREATE INDEX "IXFK_stories_events" ON public."stories" USING btree (events_id);
CREATE UNIQUE INDEX "PK_stories" ON public."stories" USING btree (id);
CREATE INDEX "IXFK_industries_industry_districts" ON public."industries" USING btree (industry_districts_id);
CREATE UNIQUE INDEX "PK_industries" ON public."industries" USING btree (id);
CREATE INDEX "IXFK_industry_districts_district s" ON public."industry_districts" USING btree (district s_id);
CREATE UNIQUE INDEX "PK_industry_districts" ON public."industry_districts" USING btree (id);
CREATE UNIQUE INDEX "PK_district s" ON public."district s" USING btree (id);
CREATE INDEX "IXFK_subsectors_industries" ON public."subsectors" USING btree (industries_id);
CREATE UNIQUE INDEX "PK_subsectors" ON public."subsectors" USING btree (id);
CREATE TABLE public."events" (
id serial4 NOT NULL,
updated_at timestamptz NULL,
created_at timestamptz NOT NULL,
title text NOT NULL,
CONSTRAINT "PK_events" PRIMARY KEY (id)
);
CREATE TABLE public."districts" (
id serial4 NOT NULL,
description varchar(50) NOT NULL,
CONSTRAINT "PK_districts" PRIMARY KEY (id)
);
CREATE TABLE public."stories" (
id serial4 NOT NULL,
events_id int4 NOT NULL,
created_at timestamptz NULL,
CONSTRAINT "PK_stories" PRIMARY KEY (id),
CONSTRAINT "FK_stories_events" FOREIGN KEY (events_id) REFERENCES public."events"(id)
);
CREATE INDEX "IXFK_stories_events" ON public."stories" USING btree (events_id);
CREATE TABLE public."industry_districts" (
id serial4 NOT NULL,
districts_id int4 NOT NULL,
description varchar(50) NOT NULL,
CONSTRAINT "PK_industry_districts" PRIMARY KEY (id),
CONSTRAINT "FK_industry_districts_districts" FOREIGN KEY (districts_id) REFERENCES public."districts"(id)
);
CREATE INDEX "IXFK_industry_districts_districts" ON public."industry_districts" USING btree (districts_id);
CREATE TABLE public."industries" (
id serial4 NOT NULL,
industry_districts_id int4 NOT NULL,
description varchar(50) NOT NULL,
CONSTRAINT "PK_industries" PRIMARY KEY (id),
CONSTRAINT "FK_industries_industry_districts" FOREIGN KEY (industry_districts_id) REFERENCES public."industry_districts"(id)
);
CREATE INDEX "IXFK_industries_industry_districts" ON public."industries" USING btree (industry_districts_id);
CREATE TABLE public."subsector" (
id serial4 NOT NULL,
industries_id int4 NOT NULL,
description varchar(50) NOT NULL,
CONSTRAINT "PK_subsector" PRIMARY KEY (id),
CONSTRAINT "FK_subsector_industries" FOREIGN KEY (industries_id) REFERENCES public."industries"(id)
);
CREATE INDEX "IXFK_subsector_industries" ON public."subsector" USING btree (industries_id);
CREATE TABLE public."supplys" (
id serial4 NOT NULL,
subsector_id int4 NOT NULL,
description text NOT NULL,
CONSTRAINT "PK_supplys" PRIMARY KEY (id),
CONSTRAINT "FK_supplys_subsector" FOREIGN KEY (subsector_id) REFERENCES public."subsector"(id)
);
CREATE INDEX "IXFK_supplys_subsector" ON public."supplys" USING btree (subsector_id);
CREATE TABLE public."event_relations" (
id serial4 NOT NULL,
events_id int4 NOT NULL,
supplys_id int4 NOT NULL,
CONSTRAINT "PK_event_relations" PRIMARY KEY (id),
CONSTRAINT "FK_event_relations_events" FOREIGN KEY (events_id) REFERENCES public."events"(id),
CONSTRAINT "FK_event_relations_supplys" FOREIGN KEY (supplys_id) REFERENCES public."supplys"(id)
);
CREATE INDEX "IXFK_event_relations_events" ON public."event_relations" USING btree (events_id);
CREATE INDEX "IXFK_event_relations_supplys" ON public."event_relations" USING btree (supplys_id);
I am using postgres version 14.0
How do I improve the performance of this query?
You could get rid of the subselects and allow PostgreSQL to use its internal optimizations while joining:
I have no contextual knowledge about your query, so excuse me if I have made a typo.
EDIT
You could run a cron job that runs (and waits for the result) periodically based on the query above and store the results somewhere (cache, server files, table or materialized view, it's your choice).
Then, whenever the users are trying to access these values, they can reach out to the temporarily stored results rather than waiting for the results each time they access them.