Excessive Planning Time for PostgreSQL Query Involving Union of 5000 Tables

Question

Excessive Planning Time for PostgreSQL Query Involving Union of 5000 Tables

128 Views Asked by David H. J. At 23 October 2023 at 09:59

I am working with a PostgreSQL database where I have a query that unions 5000 tables with identical structures. The query is saved as a view. I've noticed that the planning time for this query is significantly longer than the execution time—around 20 seconds for planning versus less than 1 second for execution.

Here's a simplified example of my query:

CREATE OR REPLACE VIEW my_view AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
-- ... (up to 5000 tables)
;

Questions:

Why is the planning time taking so long compared to the execution time?
Are there any optimization techniques to reduce the planning time?
Since the query is saved as a view, is there a way to cache or skip the planning process for subsequent runs, given that the query itself doesn't change?

Any insights or suggestions would be greatly appreciated.

Software version:

PostgreSQL v15.4
Docker 4.24

Original Q&A

There are 3 best solutions below

**Zegarek** · Answer 1 · 2023-10-23T11:17:08.880000

Analyzing the structure and coming up with a plan to read 5k tables takes some CPU even if they are simple and empty. The resulting plan is trivial and I suspect tables are a combination of simple, tiny and empty, so execution isn't a challenge. demo1
PostgreSQL caches and considers re-using some query plans by default (plan_cache_mode=auto). You can try to PREPARE a select from this view, and then EXECUTE it multiple times, re-using some of the work put into processing the statement - possibly even the plan: demo2. Note that this plan cache and prepared statement are owned by and exclusively available to a session, not shared, so each client would have to initially run their own PREPARE.

In a test involving 5k empty tables on a PostgreSQLv15, planning repeatedly took 10s for a series of repeated select * from my_view that unioned them all. Once I PREPAREd and EXECUTEd the statement once, its subsequent EXECUTEs took 4ms to plan.
Make this view a materialized view to cache the result: demo3. This can be shared by multiple sessions.

Remember to refresh materialized view whenever you want to discard the old cache. If you want it to automatically refresh on its own, you can set up (statement level) triggers on the source tables that issue a refresh materialized view concurrently whenever there's a change to be cascaded. There's a pg_ivm extension that lets you make it refresh incrementally.

**Marmite Bomber** · Answer 2 · 2023-10-23T18:31:46.940000

You very probably missed the table partitioning in your desing

Example of a list partitioned table with 1001 partitions

create table my_table  (a int, b text, c text, d text, e text, f text, g text, h text, i text, j timestamp) PARTITION BY LIST (a);

create table my_table_0 PARTITION of  my_table FOR VALUES IN (0);
...
create table my_table_1000 PARTITION of  my_table FOR VALUES IN (1000);

On empty table the query is close to instant

explain (analyze, buffers, format text)  select * from my_table;
Append  (cost=0.00..14064.05 rows=270270 width=268) (actual time=9.176..9.460 rows=0 loops=1)
  ->  Seq Scan on my_table_0 my_table_1  (cost=0.00..12.70 rows=270 width=268) (actual time=0.028..0.028 rows=0 loops=1)
...

Planning Time: 24.782 ms
Execution Time: 13.456 ms

The your desing with 1001 tables with union view leads in the same empty setup to

Planning Time: 1290.083 ms
Execution Time: 16.412 ms

**jjanes** · Answer 3 · 2023-10-23T18:57:07.283000

I think this problem was fixed in v16, with commit e42e312430279d: Avoid O(N^2) cost when pulling up lots of UNION ALL subqueries (which is not mentioned in the release notes).

Before that commit, I get OOM errors. But if I had enough memory not to get errors, I assume it would be slow.

Excessive Planning Time for PostgreSQL Query Involving Union of 5000 Tables

There are 3 best solutions below

Related Questions in SQL

Related Questions in DATABASE

Related Questions in POSTGRESQL

Related Questions in UNION-ALL

Trending Questions

Popular # Hahtags

Popular Questions