I am trying to UPDATE target table but in source table I have duplicate data. I am trying in Azure SQL Server database. All will be ok with HASHBYTES but if I remove HASHBYTES it will raise an error:
drop table if exists #source
drop table if exists #target
CREATE TABLE #target(id int IDENTITY(1,1) NOT NULL,
a int NOT NULL,
b int NOT NULL,
c varchar(20) NOT NULL,
d decimal(10,3) NULL,
PRIMARY KEY (id))
CREATE TABLE #source(a int NOT NULL,
b int NOT NULL,
c varchar(20) NOT NULL,
d decimal(10,3) NULL)
INSERT #target (a, b, c, d) VALUES (1, 1, 'TEXT', NULL)
INSERT #source (a, b, c, d) VALUES (2, 2, 'TEST1', NULL), (2, 2, 'TEST2', NULL)
go
select a, b, count(*) as NumberOfRows
from #source
group by a, b
having count(*)>1
MERGE #target t
USING #source s ON t.a = s.a AND t.b = s.b
WHEN NOT MATCHED BY TARGET THEN
INSERT (a, b, c, d) VALUES(s.a, s.b, s.c, s.d)
WHEN MATCHED and hashbytes('SHA2_512', CONCAT(t.c, t.d)) != hashbytes('SHA2_512', CONCAT(s.c, s.d))
THEN UPDATE
set t.c=s.c
, t.d=s.d
;
select * from #target
I expect to have an error when source contain duplicate data. When you run code first time all will be good, but when you ran MERGE again with HASHBYTES you need to have an error. With HASHBYTES function MERGE execute successful without error:
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
Do you know why I don't get error with HASHBYTES function?
Thanks in advance.
The problem is that the first half of your
MERGEstatement is saying "match records based onaandb".sourcehas two records that have the same value foraandb, whiletargethas no record matching, which means both records will be inserted intotarget.The second half handles what happens when there is a matching record, and the second time you run the
MERGEstatement, it hits that half.MERGElooks at the incoming row fromsource, and finds two matching rows intarget; without theHASHBYTESfunction, it has no way of knowing which of thosetargetrows to update, and therefore throws an error.When including
HASHBYTES, it adds an extra criteria to the matching - now, it looks at those two rows intarget, and determines that neither row satisfy the extra criteria, and so it simply doesn't do any update at all. Note that in this specific caseHASHBYTESis overkill - you could easily have saidWHEN MATCHED AND CONCAT(t.c, t.d) != CONCAT(s.c, s.d).All that said - I question whether you're really solving your original problem. This approach will not dedupe
source. You may want to consider cleaning upsourcebefore doing yourMERGE.