Delete using SQL Merge statement for subsequent Incremental Runs?

54 Views Asked by At

I have 2 tables in SQL Server - staging_table and final_table.

Data is pulled from an API and goes through staging_table and from there it is merged into the final table.

The final_table contains 20 million rows accumulated over time. The staging table is a temporary table that gets refreshed during each data pipeline run, and may contain about 10000 records at a time. It is then merged into final table for updates, deletes and inserts.

This is what my SQL merge query currently looks like:

WITH trg AS 
(
    SELECT * 
    FROM final_table
    WHERE transaction_id IN (SELECT transaction_id FROM staging_table)
)
MERGE trg
USING staging_table AS src ON trg.transaction_id = src.transaction_id

WHEN NOT MATCHED BY TARGET THEN
    INSERT (practice_id...)
    VALUES (practice_id...)
                
-- matching records in source and target - update
WHEN MATCHED AND 
                (ISNULL(trg.[practice_id], '') != ISNULL(src.[practice_id], '') 
                    ...) 
THEN 
    UPDATE 
        SET [practice_id] = src.[practice_id],
                    ...;

-- exists in target but not in source - delete
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

I'm using a CTE for target table trg so that during deletion, it will only delete against the 10000 or so records in staging table if not matched by source. Otherwise it would've deleted millions of existing records. However there is obviously issue here because if the API deleted 1 record from their end, this record won't even be in staging table but it exists in the final table and I'm wondering how I can delete this.

Please help.

1

There are 1 best solutions below

2
Mike Petri On

If you can get the API to add a flag in staging for "deleted_flag" and get those records inserted into the staging table, you can simplify your MERGE statement to only do UPSERTs. Completely gut the portion doing DELETE and handle it outside the MERGE.

MERGE final_table AS trg
USING staging_table AS src ON trg.transaction_id = src.transaction_id AND (src.deleted_flag = 0 OR src.deleted_flag IS NULL)

WHEN NOT MATCHED BY TARGET THEN
    INSERT (practice_id...)
    VALUES (practice_id...)
                
-- matching records in source and target - update
WHEN MATCHED AND 
                (ISNULL(trg.[practice_id], '') != ISNULL(src.[practice_id], '') 
                    ...) 
THEN 
    UPDATE 
        SET [practice_id] = src.[practice_id],
                    ...;

Then, handle the delete outside of the MERGE.

DELETE  trg
FROM    final_table AS trg
WHERE   EXISTS (
                   SELECT   1
                   FROM     staging_table AS src
                   WHERE    trg.transaction_id = src.transaction_id
                            AND src.delete_flag = 1
               );

Also, make sure you've got an good indexes on transaction_id for both the staging and final table. Include any other evaluated columns as indexed or INCLUDE to avoid key lookups.

CREATE INDEX NCI_TransactionID ON final_table (transaction_id)
CREATE INDEX NCI_TransactionID ON staging_table (transaction_id,deleted_flag)