What should happen to a SCD 2 table record if the record becomes inactive?

160 Views Asked by At

Let's say we have a table that has these values:

id| name| country| start_dt| end_dt| is_current| current_dt
1001| John| CA| 2001-01-10| 2012-06-01| TRUE| 2012-06-01

Next day, the same record comes from upstream and the SCD2 record modifies to:

id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-02
1001| John| US| 2012-06-01| 2016-06-02| TRUE| 2012-06-02

But on the third day, this record stops coming from upstream table.

In this case, what should happen in the SCD2 table?

Scenario 1: The record remains as it is.

id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-03
1001| John| US| 2012-06-01| 9999-12-31| TRUE| 2012-06-03

Scenario 2: The record becomes inactive in the SCD2 table

id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-03
1001| John| US| 2012-06-01| 2012-06-03| FALSE| 2012-06-03
1

There are 1 best solutions below

0
On

You leave the last version as IS_CURRENT=TRUE.

The last-known-good version of an SCD row is sometimes used instead of the time-sensitive version. So you can write

select d.Name, sum(f.someval) totalVal
from somefact f
join somedim d
  on f.dimid = d.id 
  and d.is_current = 'TRUE'

for joins to the dimension that treat it as a non-SCD.