Joining Tables for Databricks Delta Live Tables SCD Type 2

183 Views Asked by At

Quite a niche question, I think, but my organisation has started using Delta Live Tables in Databricks for data modelling, recently. One of the dimensions I am trying to model takes data from 3 existing tables in our data lake. It also needs to be a type 2 slowly changing dimension (SCD).

This appears to be quite simple if you have just one table, according to the databricks documentation (https://docs.databricks.com/en/delta-live-tables/cdc.html), but I'm struggling when trying to make the final dimension a composite of 3 source tables.

For example, let's say the tables are: person, personDetail, and job.

person:

PersonID Name
1 Name1
2 Name2

personDetail:

PersonID JobID detail
1 1 detail1
2 2 detail2

job:

jobID jobName
1 job1
2 job2

The final dimension needs to have personID, detail, and jobName, and show any changes to any of the 3 tables.

Other than making a really complicated union in pyspark at the beginning of the code, I can't think of a way to make this work, perhaps creating each table as a separate streaming table and then joining them? Am I missing something obvious?

Any help much appreciated.

1

There are 1 best solutions below

0
On

You need to independently apply SCD Type 2 to each of the three tables, and then create a final view that joins the three tables together.