MERGE update with duplicates

214 Views Asked by Дрвосеча At 10 August 2023 at 20:25

I am trying to UPDATE target table but in source table I have duplicate data. I am trying in Azure SQL Server database. All will be ok with HASHBYTES but if I remove HASHBYTES it will raise an error:

drop table if exists #source
drop table if exists #target

CREATE TABLE #target(id int IDENTITY(1,1) NOT NULL,
  a int NOT NULL,
  b int NOT NULL,
  c varchar(20) NOT NULL,
  d decimal(10,3) NULL,
  PRIMARY KEY (id))

CREATE TABLE #source(a int NOT NULL,
  b int NOT NULL,
  c varchar(20) NOT NULL,
  d decimal(10,3) NULL)
INSERT #target (a, b, c, d) VALUES (1, 1, 'TEXT', NULL)
INSERT #source (a, b, c, d) VALUES (2, 2, 'TEST1', NULL), (2, 2, 'TEST2', NULL)
go

select a, b, count(*) as NumberOfRows
from #source
group by a, b
having count(*)>1

MERGE #target t
USING #source s ON t.a = s.a AND t.b = s.b
WHEN NOT MATCHED BY TARGET THEN
   INSERT (a, b, c, d) VALUES(s.a, s.b, s.c, s.d)
WHEN MATCHED and hashbytes('SHA2_512', CONCAT(t.c, t.d)) != hashbytes('SHA2_512', CONCAT(s.c, s.d))
THEN UPDATE
   set t.c=s.c
   , t.d=s.d 
;

select * from #target

I expect to have an error when source contain duplicate data. When you run code first time all will be good, but when you ran MERGE again with HASHBYTES you need to have an error. With HASHBYTES function MERGE execute successful without error:

The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

Do you know why I don't get error with HASHBYTES function?

Thanks in advance.

Original Q&A

There are 1 best solutions below

Kiran Ramaswamy On 10 August 2023 at 20:55

The problem is that the first half of your MERGE statement is saying "match records based on a and b". source has two records that have the same value for a and b, while target has no record matching, which means both records will be inserted into target.

The second half handles what happens when there is a matching record, and the second time you run the MERGE statement, it hits that half. MERGE looks at the incoming row from source, and finds two matching rows in target; without the HASHBYTES function, it has no way of knowing which of those target rows to update, and therefore throws an error.

When including HASHBYTES, it adds an extra criteria to the matching - now, it looks at those two rows in target, and determines that neither row satisfy the extra criteria, and so it simply doesn't do any update at all. Note that in this specific case HASHBYTES is overkill - you could easily have said WHEN MATCHED AND CONCAT(t.c, t.d) != CONCAT(s.c, s.d).

All that said - I question whether you're really solving your original problem. This approach will not dedupe source. You may want to consider cleaning up source before doing your MERGE.

MERGE update with duplicates

There are 1 best solutions below

Related Questions in SQL

Related Questions in SQL-SERVER

Related Questions in AZURE

Related Questions in AZURE-SQL-DATABASE

Related Questions in SQL-MERGE

Trending Questions

Popular # Hahtags

Popular Questions