Postgres merge silently ignore unique constraint violation

129 Views Asked by At

The following quote is from here: https://pganalyze.com/blog/5mins-postgres-15-merge-vs-insert-on-conflict

The comment that he essentially made is that the downside of MERGE's handling of concurrency is that when you concurrently INSERT, so at the same time as you're executing the MERGE statement, there is another INSERT going on, then MERGE might not notice that. MERGE would go into its INSERT logic, and then it would get a unique violation.

I recall that, back when he initially designed the INSERT ON CONFLICT feature, that because of this constraint of how MVCC works in Postgres, you cannot implement MERGE with its general syntax and options to work in the same reliable way. If you want the generality of MERGE, you have to accept the fact that you might get unique constraint violations, when there are concurrent inserts, versus with INSERT ON CONFLICT, the way it's designed with its speculative insertions, guarantees that you either get an INSERT or an UPDATE and that is true even if there are concurrent inserts. You might want to choose INSERT ON CONFLICT if you need the guarantee.

Pretend then that we're doing a really huge merge that will take a long time, and in the process of the merge, there happens this concurrent insert described in the quote above that causes unique constraint violation during the insert logic. What exactly happens in this case? Does it lead to the whole merge query failing and throwing a unique constraint exception?

Or, perhaps, if the unique constraint violation does occur for a row or rows in the final insert logic, only those rows that have the unique constraint violation are just silently ignored, and the rest of the insert proceeds as normal?

I would reckon that postgres' merge functionality doesn't support this (the latter - silently ignoring unique constraint violations), but I do think this would be a valuable addition to the merge functionality, because there will definitely be users (me currently) who wouldn't want their entire merge insert (or update) to fail just because one or a couple of rows had a unique constraint violation.

1

There are 1 best solutions below

0
On

I did run some tests using the following code, and I discovered, as expected, that if a unique violation exception occurs during the insert of the merge command, this unique violation exception bubbles up and tanks the whole merge query.

That means, for example, if you're doing a long running batch merge operation, the merge might perform 99% of the work, but then if even the last insert row causes a unique violation exception, the whole merge query gets rolled back and all the work lost with it.

In many foreseeable uses cases for merge I would expect this to be very disappointing and far less than ideal. In my opinion, it would be much better for merge to offer the ability to simply ignore the rows that cause unique violation exceptions instead of tanking the whole query.

I am of the opinion that this additional functionality (to ignore rows the cause unique violation exception during the insert query) should be added to both the sql standard and to postgres.

Code used for testing

In one sql process run the following:

create table tmp_table (col int);

insert into tmp_table (col)
select *
from generate_series(1, 10000000) as s(i);

CREATE TABLE tmp_table2 (col int unique);

merge into tmp_table2 tt2
using (
  select *
  from tmp_table
) as tt1
on tt2.col = tt1.col
when matched then do nothing
when not matched then
  insert (col)
  values (tt1.col);

In a second sql process, run the following immediately after running the merge command in the first sql process:

insert into tmp_table2 (col)
values (9000000);