ingore duplicate in merge SQL on Snowflake

63 Views Asked by At

I have two tables:

src:

ID model accy delivery_date ETA call_off department style duration plant
123abc xxyy MM 2022-12-14T00:00:00.000Z 2022-10-20T00:00:00.000Z 2023-01-17T00:00:00.000Z paint pink 3.3 dd
123abc xxyy MM 2022-12-14T00:00:00.000Z 2022-10-20T00:00:00.000Z 2022-10-20T00:00:00.000Z paint pink 3.3 dd
123abc xxyy MM 2022-12-14T00:00:00.000Z 2022-10-20T00:00:00.000Z 2023-02-06T00:00:00.000Z paint pink 3.3 dd

dest:

ID model accy delivery_date ETA call_off department style duration plant
123abc xxyy MM 2022-12-14T00:00:00.000Z 2022-10-20T00:00:00.000Z 2023-01-17T00:00:00.000Z paint pink 3.3 dd

the dest has this row which is the same as the first row in the src table.

whenever I try to merge I get this error:

100090 (42P18): Duplicate row detected during DML action
Row Values: ["123abc", "xxyy", "MM", "2022-12-14T00:00:00.000Z", "2022-10-20T00:00:00.000Z", "2023-01-17T00:00:00.000Z", "paint", "pink", 3.3, "dd"]

My query:

merge into cleaned_delivery dest
using ( 
SELECT distinct
  RECORD_CONTENT:ID::varchar AS ID,
  RECORD_CONTENT:model::varchar AS model,
  RECORD_CONTENT:accy::varchar AS accy,
  RECORD_CONTENT:delivery_date::varchar AS delivery_date,
  RECORD_CONTENT:ETA::varchar AS ETA,
  RECORD_CONTENT:call_off::varchar AS call_off,
  RECORD_CONTENT:department::varchar AS department,
  RECORD_CONTENT:style::varchar AS style, 
  RECORD_CONTENT:duration::float AS duration,    
  RECORD_CONTENT:plant::varchar AS plant

FROM raw_delivery where RECORD_CONTENT:ID

) as src (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant)
on dest.ID = src.ID and dest.model = src.model and dest.accy = src.accy 
when matched
then update set
  dest.delivery_date = src.delivery_date,
  dest.ETA = src.ETA,
  dest.call_off = src.call_off,
  dest.department = src.department,
  dest.style = src.style,
  dest.duration = src.duration
when not matched then insert (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant)
  values (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant);

I tried adding this line after the when matched:

when matched and (src.delivery_date != dest.delivery_date or src.ETA!= dest.ETA or src.call_off!= dest.call_off)

But still the same error. How can I merge these two tables and ignore rows that already exists (duplicate)

2

There are 2 best solutions below

0
Lukasz Szozda On BEST ANSWER

It is possible to allow nondeterministic MERGE by setting ERROR_ON_NONDETERMINISTIC_MERGE

ALTER SESSION SET ERROR_ON_NONDETERMINISTIC_MERGE = FALSE;

Hiding the fact that source provides multiple matching rows is not recommended course of action. A better way is to explicitly state which row should be taken - unique per columns defined on on dest.ID = src.ID and dest.model = src.model and dest.accy = src.accy :

merge into cleaned_delivery dest
using ( 
SELECT
  RECORD_CONTENT:ID::varchar AS ID,
  RECORD_CONTENT:model::varchar AS model,
  RECORD_CONTENT:accy::varchar AS accy,
  RECORD_CONTENT:delivery_date::varchar AS delivery_date,
  RECORD_CONTENT:ETA::varchar AS ETA,
  RECORD_CONTENT:call_off::varchar AS call_off,
  RECORD_CONTENT:department::varchar AS department,
  RECORD_CONTENT:style::varchar AS style, 
  RECORD_CONTENT:duration::float AS duration,    
  RECORD_CONTENT:plant::varchar AS plant  
FROM raw_delivery 
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID, model, accy 
                          ORDER BY <here_goes_columns>) = 1  
) AS src
 on dest.ID = src.ID 
and dest.model = src.model 
and dest.accy = src.accy 
0
Keyboard Corporation On

The error show because it seems that there are duplicate rows in the cleaned_delivery that match the rows from the raw_delivery. If that so, you might need to adjust the condition in the ON, so that it doesn't match duplicate rows in the cleaned_delivery.

MERGE INTO cleaned_delivery dest
USING ( 
 SELECT DISTINCT
 RECORD_CONTENT:ID::varchar AS ID,
 RECORD_CONTENT:model::varchar AS model,
 RECORD_CONTENT:accy::varchar AS accy,
 RECORD_CONTENT:delivery_date::varchar AS delivery_date,
 RECORD_CONTENT:ETA::varchar AS ETA,
 RECORD_CONTENT:call_off::varchar AS call_off,
 RECORD_CONTENT:department::varchar AS department,
 RECORD_CONTENT:style::varchar AS style, 
 RECORD_CONTENT:duration::float AS duration, 
 RECORD_CONTENT:plant::varchar AS plant
 FROM raw_delivery where RECORD_CONTENT:ID
) as src (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant)
ON dest.ID = src.ID AND dest.model = src.model AND dest.accy = src.accy AND dest.delivery_date = src.delivery_date AND dest.ETA = src.ETA AND dest.call_off = src.call_off
WHEN MATCHED AND (src.delivery_date != dest.delivery_date OR src.ETA != dest.ETA OR src.call_off != dest.call_off) THEN 
 UPDATE SET 
 dest.delivery_date = src.delivery_date,
 dest.ETA = src.ETA,
 dest.call_off = src.call_off,
 dest.department = src.department,
 dest.style = src.style,
 dest.duration = src.duration
WHEN NOT MATCHED THEN 
 INSERT (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant)
 VALUES (ID, model, accy, delivery_date, ETA, call_off, department, style, duration, plant);