I have a project to extract data from an OLTP db to a Stagging star schema and then Datawarehouse db. I have a relationship : playlist<---(1-N)PlaylistTRack----->(1-N)Track--->(N-1)Sales. There is a concern in the group that duplicates will be created if I put ttables Playlist/playlistTrack/Track in 1 dimtable in Stagging bd, but i dont get how that comes about or if there really is a problem.
As we have searched, we found that we have to do a bridge table. But does that solve the problem if there is an actual one?
This is a valid concern. Your playlist and your track are two different conceptual things that relate to each other in a many-to-many way (a playlist can have multiple tracks and a track can appear in more than one playlist). You most certainly would not want to collapse this into a single table or you will have duplicates.
The bridge table that solves this is your
playlisttrack
table. The grain of that table is one track on one playlist, and the key is a combination of the keys to both the playlist and track dimensions.Now, if you have a different table playing the true role of a fact table (sales), then this doesn't change what's needed. Your
sales
fact table would have a single key toplaylist
:Your product dimension has been split into three tables in a pattern known as "snowflaking" but that's unavoidable. In this situation, while your
sales
table still acts as the real fact table, the bridge table (playlisttrack
) is in a way a fact table of its own with its own granularity. It's important to realize this because you have to be careful when joining fromsales
through the bridge table totrack
, as that changes the granularity of your query. Unless your query filters on a single track, you would need to use carefulGROUP BY
logic with appropriate aggregation (MAX
vsSUM
on sales facts until grouped at the sales fact grain, thenSUM
only above that grouping, etc..) in any such query to avoid overcounting your sales facts.