Duplicates in Snowflake Stream

458 Views Asked by At

With the setting SHOW_INITIAL_ROWS = TRUE, we created a stream on top of a view (which has many joins). We created a Stored procedure with a single merge statement that ingests all of the data from the stream into a target table. Following is the merge statement used by the stored procedure.

merge into target tgt
using
(
select id,fname,metadata$action,metadata$isupdate
from emp_stream where not(metadata$action = 'DELETE' and metadata$isupdate = 'TRUE')
) src
on src.id = tgt.id
when matched and metadata$action = 'DELETE' and metadata$isupdate = 'FALSE' then delete

when matched and metadata$action = 'INSERT' and metadata$isupdate = 'TRUE' then update
set  tgt.id = src.id
     ,tgt.fname = src.fname

when not matched and metadata$action = 'INSERT' and metadata$isupdate = 'FALSE' then
insert (id,fname) values (src.id,src.fname);

A task was created to run the stored procedure for every 8 hours. It ran successfully for the first time, i.e. the full load, which inserts all of the records from the view into the target table. However, the second load failed due to a duplicate row error. When we queried the stream, we found two records with the same PK(i.e., id) but different metadata$rowids and metadata$actions of insert and delete, with metadata$isupdate set to false for each.

If it is an update, the metadata$isupdate parameter should be set to true which is not the case here. Could someone please assist us with this?

Trying to do Incremental load using Streams in snowflake but facing duplicate row error.

0

There are 0 best solutions below