Is combining a Junk Dimension with a Conformed Dimension in a single table advisable for efficient data management?

58 Views Asked by At

In the process of optimizing the product dimension in the data lake for PowerBI analytics, a challenge has emerged where each department has a distinct product dimension with specific features. This results in complications when generating cross-department reports due to varying SKs assigned to the same product.

To address this issue, a proposed solution involves creating a master product dimension for a unified source of truth, which actually turns out to be what Kimball calls Conformed Dimensions. However, concerns have been raised regarding potential complexity, particularly because of the low-cardinality in sales and marketing that drive us also to a kind of Junk Dimension, and the inclusion of finance products not present in other departments.

I undertook the following exercise to gain insight into the potential behavior of this master data. As you can observe below, the number of lines is anticipated to be substantial. This is due to the fact that, for every combination of product and features, an individual line would be generated, distinguished by a unique SKU identifier.

How can scalability and management issues be mitigated in a data system that incorporates tables with mixed 'Junk' and 'Conformed' dimensions? Are there established best practices for handling such data structures, or is this approach generally discouraged based on factual evidence?

enter image description here

Here is my ddl code example with a synthetic data:

CREATE TABLE #TEMP_PRODUCT_SALES (
    sk INT IDENTITY(1,1) PRIMARY KEY,
    product_id INT,
    SLS_feature1 VARCHAR(255),
    SLS_feature2 VARCHAR(255)
);


INSERT INTO #TEMP_PRODUCT_SALES (product_id, SLS_feature1, SLS_feature2)
VALUES
    (1, 'A', 'X'),
(1, 'A', 'Y'),
(1, 'A', 'Z'),
(2, 'B', 'X'),
(2, 'B', 'Y'),
(2, 'B', 'Z'),
(3, 'C', 'X'),
(3, 'C', 'Y'),
(3, 'C', 'Z');


CREATE TABLE #TEMP_PRODUCT_marketing (
    sk INT IDENTITY(1,1) PRIMARY KEY,
    product_id INT,
    mkt_feature1 VARCHAR(255),
    mkt_feature2 VARCHAR(255)
);


INSERT INTO #TEMP_PRODUCT_marketing (product_id, mkt_feature1, mkt_feature2)
VALUES
    (1, 'M1', 'N1'),
    (1, 'M1', 'N2'),
    (1, 'M1', 'N3'),
    (2, 'M2', 'N2'),
    (3, 'M3', 'N3');

-- Create PRODUCT_finance temporary table
CREATE TABLE #TEMP_PRODUCT_finance (
    sk INT IDENTITY(1,1) PRIMARY KEY,
    product_id INT,
    fin_feature1 VARCHAR(255),
    fin_feature2 VARCHAR(255)
);


INSERT INTO #TEMP_PRODUCT_finance (product_id, fin_feature1, fin_feature2)
VALUES
    (1, 'F1', 'G1'),
    (2, 'F2', 'G2'),
    (3, 'F3', 'G3'),
    (4, 'F2', 'G2');


CREATE TABLE #TEMP_Conformed_Dimension (
    sk INT IDENTITY(1,1) PRIMARY KEY,
    product_id INT,
    SLS_feature1 VARCHAR(255),
    SLS_feature2 VARCHAR(255),
    mkt_feature1 VARCHAR(255),
    mkt_feature2 VARCHAR(255),
    fin_feature1 VARCHAR(255),
    fin_feature2 VARCHAR(255)
);



INSERT INTO #TEMP_Conformed_Dimension (product_id, SLS_feature1, SLS_feature2, mkt_feature1, mkt_feature2, fin_feature1, fin_feature2)
SELECT
    p.product_id,
    ps.SLS_feature1,
    ps.SLS_feature2,
    pm.mkt_feature1,
    pm.mkt_feature2,
    pf.fin_feature1,
    pf.fin_feature2
FROM
    (SELECT DISTINCT product_id FROM #TEMP_PRODUCT_SALES) p
    LEFT JOIN #TEMP_PRODUCT_SALES ps ON p.product_id = ps.product_id
    LEFT JOIN #TEMP_PRODUCT_marketing pm ON p.product_id = pm.product_id
    LEFT JOIN #TEMP_PRODUCT_finance pf ON p.product_id = pf.product_id;


INSERT INTO #TEMP_Conformed_Dimension (product_id, SLS_feature1, SLS_feature2, mkt_feature1, mkt_feature2, fin_feature1, fin_feature2)
SELECT
    product_id,
    SLS_feature1,
    SLS_feature2,
    NULL AS mkt_feature1,
     NULL AS mkt_feature2,
     NULL AS fin_feature1,
    NULL AS fin_feature2
FROM
    #TEMP_PRODUCT_SALES
WHERE
    product_id NOT IN (SELECT product_id FROM #TEMP_Conformed_Dimension)
UNION
SELECT
    product_id,
    NULL AS SLS_feature1,
    NULL AS SLS_feature2,
     mkt_feature1,
     mkt_feature2,
    NULL AS fin_feature1,
    NULL AS fin_feature2
FROM
    #TEMP_PRODUCT_marketing
WHERE
    product_id NOT IN (SELECT product_id FROM #TEMP_Conformed_Dimension)
UNION
SELECT
    product_id,
    NULL AS SLS_feature1,
    NULL AS SLS_feature2,
    NULL AS mkt_feature1,
    NULL AS mkt_feature2,
    fin_feature1,
     fin_feature2
FROM
    #TEMP_PRODUCT_finance
WHERE
    product_id NOT IN (SELECT product_id FROM #TEMP_Conformed_Dimension);


SELECT * FROM #TEMP_Conformed_Dimension;
0

There are 0 best solutions below