I'm running into an issue where I have 24 insert queries which all insert into different tables (shards), and they run slowly when the inserts are performed concurrently. When I run them sequentially, they're pretty fast. Strange thing, these all insert into other physical tables, so I don't understand where this severe contention would come from. I'm using citus, so I'd really like to see a performance gain inserting in parallel into different shards. But I'm 99% sure the problem is not citus-specific.
All 24 queries have the following form:
INSERT INTO schema.rtsepezocoav_p_452_119046 AS citus_table_alias (se_id, pe_id, zo_id, co_id, value_num, ad_id) SELECT ...
INSERT INTO schema.rtsepezocoav_p_452_119040 AS citus_table_alias (se_id, pe_id, zo_id, co_id, value_num, ad_id) SELECT ...
rtsepezocoav_p_452 is a partition of the parent table rtsepezocoav (declarative partitioning)
They all insert around 100k new records into a table (shard) which contains around 2Mio records.
All tables have 3 btree indexes on them. I understand inserting into tables with indexes slows things down, but what I don't understand is the severe degradation in performance when performing these inserts concurrently...
When I run them sequentially, they all take less than 2s:
Timing is on.
INSERT 0 94052
Time: 1860.400 ms (00:01.860)
Timing is on.
INSERT 0 95136
Time: 1798.271 ms (00:01.798)
...
When I fire off the 24 queries in parallel (through psql command ended with '&' in the terminal), they take much longer: around 13s.
INSERT 0 94052
Time: 12234.630 ms (00:12.235)
INSERT 0 95136
Time: 13207.503 ms (00:13.208)
INSERT 0 96049
Time: 13481.625 ms (00:13.482)
INSERT 0 93804
Time: 13684.117 ms (00:13.684)
INSERT 0 97342
Time: 14657.759 ms (00:14.658)
When instead of inserting, I just count(1) in parallel, I don't see any concurrency issues, so the problem is in writing.
I'm on postgres12, on a machine with 250GB ram and 12 cores.
I've tried playing with my postgres settings, to no avail however. Some of the settings which are probably most relevant:
shared_buffers 64000MB
temp_buffers 80MB
work_mem 100MB
maintenance_work_mem 2GB
wal_buffers 16MB
effective_cache_size 192000MB
min_wal_size 1GB
max_wal_size 4GB
synchronous_commit off
The last tweak seems to improve performance when I run standard pgbench tests. But for my actual queries, I don't see any difference.
Is there some setting I'm missing ? I'm really hoping Postgres can handle synchronous inserts smoothly when they don't touch the same table where the inserts end up...