How to resolve nested aggregate function error?

569 Views Asked by At

I used case function to bucket aggregate counts of an item and want to divide the sum total of all counts into the value for each bucket (looking to have each bucket shown as a % of the total). However, I am getting an error that I cannot nest aggregate functions, which I understand, but need some help finding an alternate solution to achieve my goal.

Error:

Aggregate functions cannot be nested: [COUNT("values".CASE_AGE_CATEGORY)] nested in [SUM(COUNT("values".CASE_AGE_CATEGORY))]

Code:

SELECT Case_Age_Category, COUNT(Case_Age_Category)/sum(count(Case_Age_Category)) as Volume
FROM
(
SELECT DISTINCT(c.CASE_ID),c.CLOSED_AT,
    CASE
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >0
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <24 then '0-24 HOURS'
    
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >24
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <48 then '24-48 HOURS'
    
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >48
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <72 then '48-72 HOURS'
        
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >72
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <96 then '72-96 HOURS'
        
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >96
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <120 then '96-120 HOURS'
        
        else '>5 DAYS'
    End as Case_Age_Category
FROM TEST_DB.STAGING.DW_DECISIV_CASES c inner join DB.Seed.DEALER_MAPPING d on c.DEALER_ID = d.DECISIVDEALERID
WHERE d.DIVISION = 'K' 
    and RO_NUMBER is not NULL 
    and (d.DEALERCATEGORY ILIKE 'DEALER' OR d.DEALERCATEGORY ILIKE 'RTC') 
    and d.DEALERUSAGE ILIKE 'PRODUCTION' 
    and d.OWNERGROUPCODE !='S040'    
)

WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC

Screenshot of current output:

Screenshot of current output

2

There are 2 best solutions below

0
Lukasz Szozda On BEST ANSWER

looking to have each bucket shown as a % of the total

Combining COUNT(...) with windowed SUM() OVER() to get the total across all groups:

SELECT Case_Age_Category, 
      DIV0(COUNT(Case_Age_Category), SUM(COUNT(Case_Age_Category)) OVER()) as Volume
FROM
(
  -- ...
) sub
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC

db<>fiddle demo

0
Simeon Pilgrim On

Your time bucketing has a few problems with it.

  • you do not catch negative or zero hours, so those will be classified as ">5"
  • you do not match the 24, 48, 72, 96 hours anywhere
  • a time difference of 2 seconds, is marked as 1 hour by TIMEDIFF maybe not a problem
SELECT 
    ASSET_CHECKED_IN_AT, 
    STATUS_CHANGED_TO_COMPLETE_HERE_AT,
    TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
    CASE
        WHEN hour_diff >0 AND hour_diff <24 then '0-24 HOURS'
        WHEN hour_diff >24 AND hour_diff <48 then '24-48 HOURS'
        WHEN hour_diff >48 AND hour_diff <72 then '48-72 HOURS'
        WHEN hour_diff >72 AND hour_diff <96 then '72-96 HOURS'
        WHEN hour_diff >96 AND hour_diff <120 then '96-120 HOURS'
        else '>5 DAYS'
    End as Case_Age_Category,
    (date_part(epoch_second, STATUS_CHANGED_TO_COMPLETE_HERE_AT::timestamp_ntz)-date_part(epoch_second, ASSET_CHECKED_IN_AT::timestamp_ntz))/3600 as hour_diff_2
FROM VALUES 
    ('2021-01-23 13:45:00','2021-01-23 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:40:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:44:00'),
    ('2021-01-23 13:59:59','2021-01-23 14:00:01'),
    ('2021-01-23 13:45:00','2021-01-24 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-27 13:45:00'),
    ('2021-01-24 13:45:00','2021-01-23 13:45:00')
    v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);

gives:

ASSET_CHECKED_IN_AT STATUS_CHANGED_TO_COMPLETE_HERE_AT HOUR_DIFF CASE_AGE_CATEGORY HOUR_DIFF_2
2021-01-23 13:45:00 2021-01-23 13:45:00 0 >5 DAYS 0
2021-01-23 13:45:00 2021-01-23 14:40:00 1 0-24 HOURS 0.916667
2021-01-23 13:45:00 2021-01-23 14:44:00 1 0-24 HOURS 0.983333
2021-01-23 13:59:59 2021-01-23 14:00:01 1 0-24 HOURS 0.000556
2021-01-23 13:45:00 2021-01-24 13:45:00 24 >5 DAYS 24
2021-01-23 13:45:00 2021-01-27 13:45:00 96 >5 DAYS 96
2021-01-24 13:45:00 2021-01-23 13:45:00 -24 >5 DAYS -24

next the order of WHEN clauses been evaluated in a CASE is the order presented, so you can use this to drop half the evaluations, AND if you put the time difference as a top level column, you SQL is more read able.

given this is going into a another layer of SQL, extra columns that are just works, is a rather nice way to make the SQL more readable.

SELECT 
    ASSET_CHECKED_IN_AT, 
    STATUS_CHANGED_TO_COMPLETE_HERE_AT,
    TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
    CASE
        WHEN hour_diff <0 then '-ve HOURS'
        WHEN hour_diff <24 then '0-24 HOURS'
        WHEN hour_diff <48 then '24-48 HOURS'
        WHEN hour_diff <72 then '48-72 HOURS'
        WHEN hour_diff <96 then '72-96 HOURS'
        WHEN  hour_diff <120 then '96-120 HOURS'
        else '>=5 DAYS'
    End as Case_Age_Category
FROM VALUES 
    ('2021-01-23 13:45:00','2021-01-23 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:40:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:44:00'),
    ('2021-01-23 13:59:59','2021-01-23 14:00:01'),
    ('2021-01-23 13:45:00','2021-01-24 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-27 13:45:00'),
    ('2021-01-24 13:45:00','2021-01-23 13:45:00')
    v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);
ASSET_CHECKED_IN_AT STATUS_CHANGED_TO_COMPLETE_HERE_AT HOUR_DIFF CASE_AGE_CATEGORY
2021-01-23 13:45:00 2021-01-23 13:45:00 0 0-24 HOURS
2021-01-23 13:45:00 2021-01-23 14:40:00 1 0-24 HOURS
2021-01-23 13:45:00 2021-01-23 14:44:00 1 0-24 HOURS
2021-01-23 13:59:59 2021-01-23 14:00:01 1 0-24 HOURS
2021-01-23 13:45:00 2021-01-24 13:45:00 24 24-48 HOURS
2021-01-23 13:45:00 2021-01-27 13:45:00 96 96-120 HOURS
2021-01-24 13:45:00 2021-01-23 13:45:00 -24 -ve HOURS

So now to present another way to calculate the percentage, RATIO_TO_REPORT

so you can use the DIV0 method Lukasz shows, which I have deconstructed below, but RETIO_TO_REPORT does the lifting, and I show it in a number of forms to show it's happy with deconstructed/composed versions:

With data AS (
    SELECT * 
    FROM VALUES 
    ('2021-01-23 13:45:00','2021-01-23 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:40:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:44:00'),
    ('2021-01-23 13:59:59','2021-01-23 14:00:01'),
    ('2021-01-23 13:45:00','2021-01-24 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-27 13:45:00'),
    ('2021-01-24 13:45:00','2021-01-23 13:45:00')
    v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT)
), cat_data AS (
    SELECT 
        ASSET_CHECKED_IN_AT, 
        STATUS_CHANGED_TO_COMPLETE_HERE_AT,
        TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
        CASE
            WHEN hour_diff <0 then '-ve HOURS'
            WHEN hour_diff <24 then '0-24 HOURS'
            WHEN hour_diff <48 then '24-48 HOURS'
            WHEN hour_diff <72 then '48-72 HOURS'
            WHEN hour_diff <96 then '72-96 HOURS'
            WHEN  hour_diff <120 then '96-120 HOURS'
            else '>=5 DAYS'
        End as Case_Age_Category
    FROM data
)
SELECT Case_Age_Category
    ,COUNT(1) as cat_count
    ,SUM(cat_count) OVER() as total_count_a
    ,DIV0(cat_count, total_count_a) as percentage
    ,RATIO_TO_REPORT(cat_count) over () as percentage_2
    ,RATIO_TO_REPORT(COUNT(1)) over () as percentage_3
    ,RATIO_TO_REPORT(COUNT(Case_Age_Category)) over () as percentage_4
FROM cat_data
GROUP BY 1;

gives:

CASE_AGE_CATEGORY CAT_COUNT TOTAL_COUNT_A PERCENTAGE PERCENTAGE_2 PERCENTAGE_3 PERCENTAGE_4
0-24 HOURS 4 7 0.571428 0.571429 0.571429 0.571429
24-48 HOURS 1 7 0.142857 0.142857 0.142857 0.142857
-ve HOURS 1 7 0.142857 0.142857 0.142857 0.142857
96-120 HOURS 1 7 0.142857 0.142857 0.142857 0.142857