How to handle dimensions with similar attributes?

272 Views Asked by At

I have the following entities in my data mart:

  • Location: Contains information like Country, City, etc.
  • Users: Contains information about a user this includes their address, which includes Country and City

Both these dimensions contain attributes regarding country and city. Storing the attributes only in the location dimension would make more sense, so we don't duplicate data and we can retrieve the user's location information by joining the tables surrogate keys together through the fact table. However, doing it this way would increase the overhead because we would need to perform multiple joins.

I’m inexperienced with dimensional modelling and data warehousing, so I don’t know which one is better practice:

  1. Let both of the dimensions contain the same attributes
  2. Put the attributes in the table that it belongs to (in this case the Location dimension)
1

There are 1 best solutions below

5
On

Option 1 is nearer to the Kimball way. Dimensional modelling doesn't prioritise avoiding duplication of data. In this case, they'd argue that the data 'belongs to' multiple tables. Users have information about their address, so that that's information about a user- it belongs to the user.

The important thing is that the data is conformed in the naming of the fields and the nature of the content- its up to you as the person in charge of ETL/the DW to do the work to make sure this is the case.

A lot of people go with Option 2. For this way, I don't know the nature of your fact (so I might be wrong) but it sounds like if you wanted to know the Location of a User you would have a key from the dimension rather than the fact- which would be a 'snowflake' structure. These aren't always wrong, but don't allow you to get the most out of a dimensional model.