Data Vault Model and Lookup Tables

1.4k Views Asked by At

I am designing a data warehouse that uses the Data Vault model. There is an entity in my data warehouse called Specialty. There is a Lookup Table for these Specialties based on their codes that has a one-to-one mapping from Specialty_CD to Description. There is a history of data entries for this Lookup Table (as Specialty Codes can change meaning) and all of the Satellite Tables in my vault.

I've come across a curious case where I want to link another entity in the Data Vault called "Professional" to the Specialty entity, where a professional can have multiple specialties. However, there is no Hub for the Specialty Entity. My Current solution simply has a Professional Hub which holds Business Keys and Business Key Hashes for each professional, a link table that maps Professional BK Hashes to the Specialty Codes of the specialties that the professional practices, and a lookup table from Specialty Codes to description. The catch here is that the link table is connecting a hub to a lookup table, instead of another hub. I can't help but feel like this breaks the Data Vault model. Does this break the rules of the Data Vault model? Does this ruin the normalization of my whole model?

I feel like Creating a Hub for The Specialties would result in an extra join for data retrieval. Since the Specialty Codes are already unique identifiers of specialty descriptions, I'd rather just have a single Lookup Table for this specialty entity as opposed to a hub and extra satellites for the Specialty entity that I don't need. Especially because it is only in the data warehouse to serve the purpose of definition and description of a specialty code.

Any suggestions for this situation are welcomed. Is the data vault ruined in this scenario by linking a hub to a lookup table? Is it worth the overhead and additional joins on retrieval to create a hub and satellites for this Specialty Entity?

Thank you!

2

There are 2 best solutions below

0
On

I think your model should look like this

  • Specialty_HUB (HUB_ID(pk),Specialty_CD)
  • Specialty_SAT (SAT_ID(pk),HUB_ID(fk),Load_Date,Description)
  • Professional_HUB (HUB_ID(pk),Professional_PK)
  • Professional_SAT (SAT_ID(pk),(HUB_ID(fk), Professional details...)
  • Profesional_X_Specialty_LNK (LNK_ID(pk), Load_Date, End_Date, Professional_HUB_ID(fk),Specialty_HUB_ID(fk))

NOTE that in this design you have a requirement for an End_Date in the link table as you may need to retire a single association independently of the status of the professional or specialty and you will need to ensure that your ETL logic picks up the 'deletions'. If the Codes related to specialty can also change then you will need another layer of abstraction between the link table and the specialty SAT to ensure that the links remain stable.

0
On

You have answered your own question: "Specialty Codes of the specialties that the professional practices"

Specialties is clearly a Core Business Concept and is a Hub. The fact that it has only 2 fields (Code and Description) is irrelevant.

Data Vault works on patterns. It's not "only in the database to serve the purpose of definition". If it was removed would the database still work? For example is I removed the postcode from an address the database would continue to function. I'll guess that if you removed Specialty there could be problems with some of the reports.

You need to get used to the fact that you will have MANY more tables (around 7 is the general metric) when you create a Raw Data Vault.

pcd