Aggregation Tables with less data than facts table - Is it a problem?

67 Views Asked by At

I've recently enter the big data BI world and have a very specific question.

This is my model:

data model

I have a fact table with last 4 years (2020-2023) of data (2 billion rows) and an aggregated table with the last 2 years (2022-2023) of data (20M rows). So, the aggregated table as less years of data than the facts table.

When I use the date slicer and pick a date prior to 2022, I get a blank visual because the measure always hits the AGG (which does not have data for this time period) and not the facts table as expected. Have you guys faced a common issue? Is there a workaround?

The dax measure:

measure =
    CALCULATE(
        COUNT(cci_gpmt_ecommerce_global[article]),
        cci_gpmt_ecommerce_global[is_unique] = 1
    )

Thank you in advance.

2

There are 2 best solutions below

2
On

The aggregated table by default doesn't contain data for dates prior to 2022 that's why you have blank visuals for earlier dates.

  Measure =
    VAR SelectedStartDate = MIN('Date'[Date])  
    VAR SelectedEndDate = MAX('Date'[Date])    
    VAR UseAggregatedTable = SelectedStartDate >= DATE(2022, 1, 1) && SelectedEndDate <= DATE(2023, 12, 31)
    RETURN
        IF(
            UseAggregatedTable,
            CALCULATE(
                COUNT(cci_gpmt_ecommerce_global[article]),
                cci_gpmt_ecommerce_global[is_unique] = 1
            ),
            CALCULATE(
                COUNT(factTable[article]),  
                factTable[is_unique] = 1
            )
        )
0
On

I've come to a workaround, mainly because, as far as I could understand the engine does not work how I imagined it.

Your aggregation table must encompass the full dataset of the facts table. Even though the engine is very good, he's not able of switching between AGG and Facts table, if the AGG does not have all of the required timeframe.

The solution was to create a data model based on a hot and cold direct query:

enter image description here

The HOT DQ will encompass data from the last 2 years, while the COLD DQ will encompass data prior to the last 2 years.

I've had to make a few adjustments on the measures I've created, for example:

Measure =
    DIVIDE(
        CALCULATE(
            COUNT(HOT_cci_gpmt_ecommerce_global[article]),
            HOT_cci_gpmt_ecommerce_global[has_discount] = 1
            ),
        CALCULATE(COUNT(HOT_cci_gpmt_ecommerce_global[article]))
    )
    +
    IF(
        MIN(Dim_Calendar[date]) < [_min_updated_date],
        DIVIDE(
        CALCULATE(
            COUNT(COLD_cci_gpmt_ecommerce_global[article]),
            COLD_cci_gpmt_ecommerce_global[has_discount] = 1
            ),
        CALCULATE(COUNT(COLD_cci_gpmt_ecommerce_global[article]))
    )
    )

The measure containing the _threshold_date, required to evaluate if the consulting of the COLD DQ is required:

_threshold_date = 
    CALCULATE(
                MIN(HOT_cci_gpmt_ecommerce_global[report_date]),
                ALL(Dim_Calendar)
            ) 

This way:

  • if the user selects the last 2 years, the AGG is hit;
  • if the user selects a timeframe prior to the last 2 years, the COLD DQ is hit;
  • if the user selects a timeframe encompassing the last 4 years (for instance), both the AGG and COLD DQ are hit.

Hope i was able to explain it well, and thank you for the help!