I am working with a PostgreSQL database where I have a query that unions 5000 tables with identical structures. The query is saved as a view. I've noticed that the planning time for this query is significantly longer than the execution time—around 20 seconds for planning versus less than 1 second for execution.
Here's a simplified example of my query:
CREATE OR REPLACE VIEW my_view AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
-- ... (up to 5000 tables)
;
Questions:
- Why is the planning time taking so long compared to the execution time?
- Are there any optimization techniques to reduce the planning time?
- Since the query is saved as a view, is there a way to cache or skip the planning process for subsequent runs, given that the query itself doesn't change?
Any insights or suggestions would be greatly appreciated.
Software version:
- PostgreSQL v15.4
- Docker 4.24
Analyzing the structure and coming up with a plan to read 5k tables takes some CPU even if they are simple and empty. The resulting plan is trivial and I suspect tables are a combination of simple, tiny and empty, so execution isn't a challenge. demo1
PostgreSQL caches and considers re-using some query plans by default (
plan_cache_mode=auto). You can try toPREPAREa select from this view, and thenEXECUTEit multiple times, re-using some of the work put into processing the statement - possibly even the plan: demo2. Note that this plan cache and prepared statement are owned by and exclusively available to a session, not shared, so each client would have to initially run their ownPREPARE.In a test involving 5k empty tables on a PostgreSQLv15, planning repeatedly took
10sfor a series of repeatedselect * from my_viewthat unioned them all. Once IPREPAREd andEXECUTEd the statement once, its subsequentEXECUTEs took4msto plan.Make this view a
materialized viewto cache the result: demo3. This can be shared by multiple sessions.Remember to
refresh materialized viewwhenever you want to discard the old cache. If you want it to automatically refresh on its own, you can set up (statement level) triggers on the source tables that issue arefresh materialized view concurrentlywhenever there's a change to be cascaded. There's apg_ivmextension that lets you make it refresh incrementally.