Data Vault Modelling

519 Views Asked by At

Assuming the following data architecture: Source Systems -> Data Warehouse (using the data vault model) -> Data Virtualization -> Consumption Layer (e.g., BI Tools & reporting)

I read that for data vault, one of the key principles is to load raw data and keeping records from all sources - so no de-dupping or transformations for traceability/auditing purposes. If this is true, where would the transformations happen?

2

There are 2 best solutions below

0
On BEST ANSWER

Yes, it is true, the "raw" data vault keeps records as it was on source system when it was loaded.

But there's another concept, the "business" data vault. This is where all the logic and transformation happens. The business data vault is not a full copy of the raw data vault, but you create hub/link/sat/pit/bridge to implement the logic to suit your needs.

That way, it helps you in the long run. If, for example, you need to change a business rule next year, you still have the original data for a particular source system at a particular time in the past. If your logic has a bug, you still have the original data.

0
On

From my experience usually you have this architecture:

  1. Raw Source (Copy from your OLTP data sources)
  2. Staging (nowadays as Persistent Staging Area in a Datalake, because it is cheaper than a Relational DB)
  3. Raw Vault (applying so called Hard Rules, like data type changes)
  4. Business Vault (applying so called Soft Rules, all your Business Logic, Aggregations, Concatenation, ...)
  5. Information Mart (Data Mart sometimes virtualized, but not always ... usually Star/Snowflake Schema)
  6. Cube/Tabular Model
  7. BI Tool

More information about the difference between Raw Vault and Business Vault you can find here: Datavault - hard rules (rawvault) vs soft rules (businessvault)