Per the documentation in Postgres, I see clear instructions on how to load a very large table via copy/adjusting WAL ect. But I do not see clear notes on how to update large tables in a faster amount of time. I recognize it would be faster to just re-create the table but that is not an option. The query below is what will need to be executed, and the db specs are below
PostgresSQL 16.2 w/ GIS extension on RDS db.m6i.2xlarge
Query:
UPDATE test.table_1 SET datetime = remap_output_1.datetime FROM data_agg.remap_output_1 WHERE test_1.source_id = remap_output_1.source_id AND table_1.l_id = remap_output_1.l_id AND table_1.datetime_local = remap_output_1.datetime_local
Explain output:
"Update on table_1 (cost=1826063.94..34047845.24 rows=0 width=0)"
" -> Hash Join (cost=1826063.94..34047845.24 rows=11501270 width=20)"
" Hash Cond: ((table_1.source_id = datetime_remap_output_1.source_id) AND (table_1.location_id = remap_output_1.l_id) AND (table_1.datetime_local = remap_output_1.datetime_local))"
" -> Seq Scan on table_1 (cost=0.00..6313237.12 rows=144108512 width=22)"
" -> Hash (cost=767054.16..767054.16 rows=41837416 width=38)"
" -> Seq Scan on remap_output_1 (cost=0.00..767054.16 rows=41837416 width=38)"
The likely problem here is that the hash join is spilling to disk. By itself, this would not be a problem, as hashes spilling to disk can be done pretty efficiently. The problem is that once the join spills to disk, it returns tuples up to the Update node in an unfriendly order which leads to the Update hopping all over disk, doing random IO. If you were just doing a select, this shouldn't be a problem because all the necessary data should be spilled to disk and the real tuple should not need to be visited again. But with an update, at some point the real tuple needs to get updated, not just a temp copy of it.
I should think it should really be sequential IO which would still suck as it would read through the table sequentially but as many times as their are hash batches spilt to disk. But in practice it seems to be much worse than this, for reasons I've never figured out.
Unfortunately there isn't much you can do about it to completely fix it. Switching to a Merge Join will just bring up the same problem in a different manifestation. A nested loop wouldn't necessarily be better. The update might be more efficient, but figuring out what to update and what to update it to could then be much worse.
One work around would be to increase work_mem by so much that it doesn't need to spill to disk at all, or at least does so with so few batches that the problem is reduced. Another way to deal with it is to divide the update into batches in separate transactions in some way. It would still be slow to get the whole thing done, but at least the work done by one batch would not be lost just because another batch got interrupted.
I think a "fix" to PostgreSQL would be to inject a sort node (by ctid) between the update and the hashjoin. That way data would be fed to the update in an optimal order. Unfortunately getting even a rough draft of this (to prove that was actually effective) exceeds my ability.