In SSIS, if an incoming dataset has multiple records for the same Business Key, how do I load it to the dimensions table with SCD type 2 without using the SCD Wizard.

Sample dataset

Customer ID   Name      Segment     Postal Code
1             James     Corporate   50026
2             Andrew    Consumer    33311
3             Steven    Consumer    90025
2             Andrew    Consumer    33306
3             Steven    Consumer    90032
1             James     Corporate   50087
3             Steven    Consumer    90000

In my case, if I try Loading the dimension table with other SSIS components (Lookup/Conditional Split) all the record show up a new row in the table because they are all coming in all at the same time.

I have ‘CurrentFlag’ as the indicator of the current record.

In SSIS, if I have an incoming dataset that has multiple records for the same Business Key, How do I get to recognize these, and set the CurrentFlag as necessary, whether or not a record in the target table has that Business Key already?

Thanks.

1

There are 1 best solutions below

0
J Weezy On

OK, this is a massive simplification because SCD's are very challenging to correctly implement. You will need to sit down and think critically about this. My answer below only handles ongoing daily processing - it does not explain how to handle historical files being re-processed, which could potentially result in duplicate records with different EffectiveStart and End Dates.

By definition, you will have an existing record source component (i.e., query from the database table) and an incoming data source component (i.e., a *.csv flatfile). You will need to perform a merge join to identify new records versus existing records. For existing records, you will need to determine if any of the columns have changed (do this in a Derived Column transformation).

You will need to also include two columns for EffectiveStartDate and EffectiveEndDate.

IncomingEffectiveStartDate = FileDate
IncomingEffectiveEndDate = 12-31-9999
ExistingEffectiveEndDate = FileDate - 1

Note on 12-31-9999: This is effectively the Y10K bug. But, it allows users to query the database between date ranges without having to consciously add ISNULL(GETDATE()) in the WHERE clause of a query in the event that they are querying between date ranges.

This will prevent the dates on the columns from overlapping, which could potentially result in multiple records being returned for a given date.

To determine if a record has changed, create a new column called RecordChangedInd of type Bit.

(ISNULL(ExistingColumn1, 0) != ISNULL(IncomingColumn1, 0) ||
 ISNULL(ExistingColumn2, 0) != ISNULL(IncomingColumn2, 0) ||

....

ISNULL(ExistingColumn_N, 0) != ISNULL(IncomingColumn_N, 0) ? 1 : 0)

Then, in your split condition you can create two outputs: RecordHasChanged (this will be an INSERT) and RecordHasNotChanged (this will be an UPDATE to deactivate the exiting record and an INSERT).

You can conceivably route both inputs to the same INSERT destination. But, you will need to be careful suppress the update record's ExistingEffectiveEndDate value that deactivates the date.