I am using Postgres 14.x.
I have a complex query that can be observed, next to it's EXPLAIN ANALYZE, here.
select *
from (select * from public.select_version_of_projects('2024-03-08T08:31:08.280Z')) as "project_version"
where "project_version"."id" = 'fd18211b-a400-49ed-a723-9648ab05ca4f';
CREATE OR REPLACE FUNCTION public.select_version_of_projects(version_p TIMESTAMP WITH TIME ZONE)
RETURNS TABLE
(
id UUID,
folder_id UUID,
name TEXT,
description TEXT,
picture TEXT,
files JSON,
keywords TSVECTOR,
scene JSON,
view JSON,
credits JSON,
published JSON,
imports JSON,
settings JSON,
studio_version TEXT,
readonly BOOLEAN,
deleted BOOLEAN,
created_at TIMESTAMP WITH TIME ZONE,
updated_at TIMESTAMP WITH TIME ZONE
)
AS
$$
SELECT COALESCE(past.id, present.id) AS id,
COALESCE(past.folder_id, present.folder_id) AS folder_id,
COALESCE(past.name, present.name) AS name,
COALESCE(past.description, present.description) AS description,
COALESCE(past.picture, present.picture) AS picture,
COALESCE(past.files, present.files) AS files,
COALESCE(past.keywords, present.keywords) AS keywords,
COALESCE(past.scene, present.scene) AS scene,
COALESCE(past.view, present.view) AS view,
COALESCE(past.credits, present.credits) AS credits,
COALESCE(past.published, present.published) AS published,
COALESCE(past.imports, present.imports) AS imports,
COALESCE(past.settings, present.settings) AS settings,
COALESCE(past.studio_version, present.studio_version) AS studio_version,
COALESCE(past.readonly, present.readonly) AS readonly,
COALESCE(past.deleted, present.deleted) AS deleted,
COALESCE(past.created_at, present.created_at) AS created_at,
COALESCE(past.updated_at, present.updated_at) AS updated_at
FROM (SELECT *
FROM project) AS present
FULL OUTER JOIN
(SELECT history.project.id,
(ARRAY_AGG(history.project.folder_id ORDER BY history.project.recorded_at) FILTER (WHERE history.project.folder_id IS NOT NULL))[1] AS folder_id,
(ARRAY_AGG(history.project.name ORDER BY history.project.recorded_at) FILTER (WHERE history.project.name IS NOT NULL))[1] AS name,
(ARRAY_AGG(history.project.description ORDER BY history.project.recorded_at) FILTER (WHERE history.project.description IS NOT NULL))[1] AS description,
(ARRAY_AGG(history.project.picture ORDER BY history.project.recorded_at) FILTER (WHERE history.project.picture IS NOT NULL))[1] AS picture,
(ARRAY_AGG(history.project.files ORDER BY history.project.recorded_at) FILTER (WHERE history.project.files IS NOT NULL))[1] AS files,
(ARRAY_AGG(history.project.keywords ORDER BY history.project.recorded_at) FILTER (WHERE history.project.keywords IS NOT NULL))[1] AS keywords,
(ARRAY_AGG(history.project.scene ORDER BY history.project.recorded_at) FILTER (WHERE history.project.scene IS NOT NULL))[1] AS scene,
(ARRAY_AGG(history.project.view ORDER BY history.project.recorded_at) FILTER (WHERE history.project.view IS NOT NULL))[1] AS view,
(ARRAY_AGG(history.project.credits ORDER BY history.project.recorded_at) FILTER (WHERE history.project.credits IS NOT NULL))[1] AS credits,
(ARRAY_AGG(history.project.published ORDER BY history.project.recorded_at) FILTER (WHERE history.project.published IS NOT NULL))[1] AS published,
(ARRAY_AGG(history.project.imports ORDER BY history.project.recorded_at) FILTER (WHERE history.project.imports IS NOT NULL))[1] AS imports,
(ARRAY_AGG(history.project.settings ORDER BY history.project.recorded_at) FILTER (WHERE history.project.settings IS NOT NULL))[1] AS settings,
(ARRAY_AGG(history.project.studio_version ORDER BY history.project.recorded_at) FILTER (WHERE history.project.studio_version IS NOT NULL))[1] AS studio_version,
(ARRAY_AGG(history.project.readonly ORDER BY history.project.recorded_at) FILTER (WHERE history.project.readonly IS NOT NULL))[1] AS readonly,
(ARRAY_AGG(history.project.deleted ORDER BY history.project.recorded_at) FILTER (WHERE history.project.deleted IS NOT NULL))[1] AS deleted,
(ARRAY_AGG(history.project.created_at ORDER BY history.project.recorded_at) FILTER (WHERE history.project.created_at IS NOT NULL))[1] AS created_at,
(ARRAY_AGG(history.project.updated_at ORDER BY history.project.recorded_at) FILTER (WHERE history.project.updated_at IS NOT NULL))[1] AS updated_at
FROM history.project
WHERE history.project.recorded_at > version_p
GROUP BY history.project.id) AS past
ON present.id = past.id
WHERE COALESCE(past.created_at, present.created_at) < version_p;
$$
LANGUAGE sql
STABLE;
The purpose of the history table is to record the changes to each row in the public table so that we can always know the past states. And the function basically gives you a table representing the state of the public table at the timestamp provided.
Calculating what the whole table looked like in the past is very expensive, but my intention is that by using the function alongside a filter for the ID of the row I want then just that row will be calculated. But that is not happening.
It is very slow. It should take a few milliseconds. But instead of executing the filter "id" = 'fd18211b-a400-49ed-a723-9648ab05ca4f' at the first scans to vastly reduce the number of rows fetched so that the aggregation node would be much faster, it is doing it at the end.
If I replace COALESCE(past.id, present.id) AS id to either past.id AS id or present.id AS id, it's fix as can be seen here, but it's not correct because past.id or present.id could be NULL.
But if past.id is NULL it means all of past.* is NULL. Same with present. So it is valid to apply the filter to the subqueries still.
So how can I keep the COALESCE while informing the planner that it should apply the filter to the subqueries?
Unless Postgres can inline the function in the calling query (which is often not possible for a set-returning function), it is a black box to the query planner. Everything inside the function is planned and executed separately from the outer query, and the result is materialized before proceeding with the outer query. A worst-case scenario for you.
To apply the all-important filter on ID as early as possible, pass it to the function, and apply it at the root(s).
Plus, don't
array_agg()all rows, when you just need the latest not-null value. I show a solution usingUNION ALLinstead of yourFULL OUTER JOIN, order once in the subquery, and aggregate with the functionfirst()from the additional first_last_agg module, which you must have installed first.The extension is not available on Amazon RDS Postgres. But you can create the aggregate function yourself. It's simple, see:
Details and alternatives in this related answer:
Call:
Your filters on
created_atandupdated_atwere not all clear to me. You may have to do more there.Be sure to have a multicolumn index on
history.project(id, updated_at)!Not sure about
public.project- there an index on just(id)should suffice, and should already exist as PK index.Pass a
timestamptzvalue as declared or you may get surprising conversions.