Slow changing dimension | SCD type 1 deleting rows from prior date data

171 Views Asked by At

I have implemented SCD type 1 using merge into statement in azure databricks.

When I am doing first load, it’s loading the data that is expected with some just say 5000 rows for ingest date 28 Nov, 2023.

Now when I am doing the load today and picking up only 29 Nov, 2023 data from source, my SCD type 1 is loading the correct records/row count of 1300 that is correct as per source data count but now for 28 Nov, 2023, the records count got changed to 4500.

This is unexpected as it should not do any change to my older date records.

Can anyone please help me understand this issue?

I am using simple Merge into statement where I am updating the rows in case of match found between my key columns and inserting records in case of no match found between my key columns.

1

There are 1 best solutions below

0
On

As @NotFound mentioned, when you update the table and filter by time, you get a count smaller than the original, as some rows have been updated. However, the count may be even smaller than expected.

In the example below, I have 30 rows and am updating 6 rows. When filtering out to get rows before this timestamp, the expected count is 24, but it is showing 21.

Source

enter image description here

After performing the merge:

enter image description here

The expected count is 24, but it is showing 21. This is because the Delta merge rewrites some rows without modifying them.

In the image below, you can see numTargetRowsCopied is 3 and numOutputRows is 9. Three records are rewritten without being modified, which results in a different count.

enter image description here

You can refer to this solution for more information.

My suggestion is to use Delta versioning. You can get the previous records' data using either version or timestamp, and then you will get correct results.

%sql
select count(*) from target version as of 0 where _metadata.file_modification_time<"2023-12-02T03:57:00"

enter image description here