Currently facing a problem where I am trying to create a login utilization report for a web application. To describe the report a bit, users in our system are tagged with different metadata about the user. For example, I could be tagged with "New York City" and "Software Engineer", while other users may be tagged with different locations and job titles. The utilization report is essentially the following:
- Time period (quarterly)
- Total number of logins
- Unique logins
- Total users
- "Engagement percentage" (Unique logins / Total users)
The catch is, the report needs to be a bit dynamic. I need to be able to be apply any combination of job titles and locations and have each of the numbers reflect the applied metadata. The time period also needs to be able to be easily adjusted to support weekly, monthly, and yearly as well. Ideally, I can create a view in Redshift that allows our BI software users to run this report whenever they see fit.
My question is, what is an ideal strategy to design a data model to support this report? I currently have an atomic fact table that contains all logins with this schema:
- User ID
- Login ID
- Login Timestamp
- Job Title Group ID (MD5 hash of job titles to support multi valued)
- Location Group ID (MD5 hash of locations to support multi valued)
The fact table allows me to easily write a query to aggregate on total (count of login id) and unique (distinct count of user id).
How can I supplement the data I have to include a count of total users? Is what I currently have the best approach?
Hierarchical, fixed-depth many-to-one (M:1) relationships between attributes are typically denormalized or collapsed into a flattened dimension table. If you’ve spent most of your career designing entity-relationship models for transaction processing systems, you’ll need to resist your instinctive tendency to normalize or snowflake a M:1 relationship into smaller subdimensions; dimension denormalization is the name of the game in dimensional modeling.
It is relatively common to have multiple M:1 relationships represented in a single dimension table. One-to-one relationships, like a unique product description associated with a product code, are also handled in a dimension table. Occasionally many-to-one relationships are resolved in the fact table, such as the case when the detailed dimension table has millions of rows and its roll-up attributes are frequently changing. However, using the fact table to resolve M:1 relationships should be done sparingly.
In your case I recommend you to have this following design as a solution :