Using CROSS APPLY to more easily "UNPIVOT" a wide table in Snowflake?

324 Views Asked by At

My company just switched from Microsoft SQL Server to Snowflake, and I have a bunch of scripts where I unpivoted a wide table to a long one using code like this:

SELECT 
    CONGLOM_IDc,
    DESTINATION_IDc,
    COHORTc,
    COHORT_PASS_TYPEc,
    ACCESS_SEASON,
    PASS_TYPE,
    First_Cohortc
FROM #PassTypeByYear2 M
CROSS APPLY
    (
    VALUES
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'18/19', "18/19_Pass", First_Cohort),
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'19/20', "19/20_Pass", First_Cohort), 
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'20/21', "20/21_Pass", First_Cohort),
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'21/22', "21/22_Pass", First_Cohort),
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'22/23', "22/23_Pass", First_Cohort)
    ) c (CONGLOM_IDc, DESTINATION_IDc, COHORTc, COHORT_PASS_TYPEc, ACCESS_SEASON, PASS_TYPE, First_Cohortc)

Is there a way to replicate this in Snowflake without using UNPIVOT a bunch of times? I have read the following post, but I can't seem to adjust CROSS JOIN to this situation. Any help is much appreciated.

An example of the "starting" table (#PassTypeByYear2) is below:

CONGLOM_ID DESTINATION_ID COHORT COHORT_PASS_TYPE 18/19_Pass 19/20_Pass 20/21_Pass 21/22_Pass 22/23_Pass First_Cohort Consistent_Pass_All_Years
101781679 196 22/23 Paid Employee Employee Employee Paid Paid NO No
101781679 196 21/22 Paid Employee Employee Employee Paid Paid NO No
101781679 196 20/21 Employee Employee Employee Employee Employee Paid NO No
101781679 196 18/19 Employee Employee Employee Employee Employee Paid YES No
101781679 196 19/20 Employee Employee Employee Employee Employee Paid NO No
101781679 227 19/20 Employee NULL Employee NULL NULL NULL YES Yes

and the output I want is below (EDIT: I just realized I did a SELECT TOP 10 * from this table instead of *. It should have a row for every unique cominbation of unpivoted values. I will re-run this query and upload the full table tomorrow - the table is about 5M rows so it takes a long time to run):

CONGLOM_ID DESTINATION_ID COHORT COHORT_PASS_TYPE ACCESS_SEASON PASS_TYPE First_Cohort Consistent_Pass_All_Years
101781679 196 22/23 Paid 18/19 Employee NO No
101781679 196 22/23 Paid 19/20 Employee NO No
101781679 196 22/23 Paid 20/21 Employee NO No
101781679 196 22/23 Paid 21/22 Paid NO No
101781679 196 22/23 Paid 22/23 Paid NO No
101781679 196 20/21 Employee 18/19 Employee NO No
101781679 196 20/21 Employee 19/20 Employee NO No
101781679 196 20/21 Employee 20/21 Employee NO No
101781679 196 20/21 Employee 21/22 Paid NO No
101781679 196 20/21 Employee 22/23 Paid NO No

CROSS APPLY in Oracle/SQL Server substitute in Snowflake

I tried converting the rest of the code to SNOWFLAKE, and I expected that I would end up with a long table (instead of a wide one) where the "18/19_Pass" Column turned into a value in the "18/19" ACCESS_SEASON row. Instead, I am just hitting an error:

SQL Error [2014] [22000]: SQL compilation error: Invalid expression [M.CONGLOM_ID] in VALUES clause

2

There are 2 best solutions below

1
NickW On

I think this SQL should give you what you want. I renamed some of the columns so that they were valid identifiers that didn't need to be "quoted" in order to use them - just to make things simpler.

CREATE TABLE PASS_BY_YEAR2 (
CONGLOM_ID TEXT
, DESTINATION_ID TEXT
, COHORT TEXT
, COHORT_PASS_TYPE TEXT
, PASS_18_19 TEXT
, PASS_19_20 TEXT
, PASS_20_21 TEXT
, PASS_21_22 TEXT
, PASS_22_23 TEXT
, FIRST_COHORT TEXT
, CONSISTENT_PASS_ALL_YEARS TEXT);

insert into pass_by_year2 (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE, PASS_18_19, PASS_19_20, PASS_20_21, PASS_21_22, PASS_22_23, First_Cohort, Consistent_Pass_All_Years) values 
('101781679', '196', '22/23', 'Paid', 'Employee', 'Employee', 'Employee', 'Paid', 'Paid', 'NO', 'No'),
('101781679', '196', '21/22', 'Paid', 'Employee', 'Employee', 'Employee', 'Paid', 'Paid', 'NO', 'No'),
('101781679', '196', '20/21', 'Employee', 'Employee', 'Employee', 'Employee', 'Employee', 'Paid', 'NO', 'No'),
('101781679', '196', '18/19', 'Employee', 'Employee', 'Employee', 'Employee', 'Employee', 'Paid', 'YES', 'No'),
('101781679', '196', '19/20', 'Employee', 'Employee', 'Employee', 'Employee', 'Employee', 'Paid', 'NO', 'No'),
('101781679', '227', '19/20', 'Employee', 'NULL', 'Employee', 'NULL', 'NULL', 'NULL', 'YES', 'Yes')
;

SELECT
CONGLOM_ID  
,DESTINATION_ID 
,COHORT 
,COHORT_PASS_TYPE   
,RIGHT(ACCESS_SEASON,LEN(ACCESS_SEASON)-5) ACCESS_SEASON    
,PASS_TYPE  
,First_Cohort   
,Consistent_Pass_All_Years
FROM PASS_BY_YEAR2
UNPIVOT (PASS_TYPE FOR ACCESS_SEASON IN (PASS_18_19, PASS_19_20, PASS_20_21, PASS_21_22, PASS_22_23))
;
0
DanielOnMSE On

See below for one way of doing this. Note that you could probably also build a JSON OBJECT (key-value pairs) instead of an ARRAY. Which would remove the need for the CASE statement in the final select, as you could directly reference the KEY from the JSON OBJECT (which you would naturally set to be the appropriate name).

No idea how this will perform on large data... Hopefully the optimiser is smart enough to see it is building an array just to unpack it again... But just test it out and see if it works. Otherwise, it does the trick for small data.

Considering your question has multiple columns, I guess you'd have to LATERAL JOIN, and then LATERAL JOIN FLATTEN once for every column you want to unpivot (keeping track of the array INDEX), giving you one rowset for each column. You'd then have to INNER JOIN these rowsets together ON ID = ID AND INDEX = INDEX.

So maybe it isn't any better than UNPIVOT... Sure makes you miss SQL Server.

--Environment dependent values
--USE ROLE "";
--USE DATABASE "";
--USE SCHEMA "" ;
--USE WAREHOUSE "";

--Create some test data
CREATE OR REPLACE TEMP TABLE TMP_FRUIT_STOCK
AS
SELECT 1 AS ID
    , 3 AS NUMBER_OF_APPLES
    , 4 AS NUMBER_OF_ORANGES
    , 2 AS NUMBER_OF_BANANAS
UNION ALL
SELECT 2 AS ID
    , 1 AS NUMBER_OF_APPLES
    , 1 AS NUMBER_OF_ORANGES
    , 1 AS NUMBER_OF_BANANAS
UNION ALL
SELECT 2 AS ID
    , 5 AS NUMBER_OF_APPLES
    , 2 AS NUMBER_OF_ORANGES
    , 7 AS NUMBER_OF_BANANAS;

--I have my doubts about how performant this will be on a big data set, but who knows try it out...
SELECT fs.ID
    , fn.VALUE AS NUMBER
    , CASE 
        WHEN fn.INDEX = 0 
            THEN 'APPLES'
        WHEN fn.INDEX = 1
            THEN 'ORANGES'
        WHEN fn.INDEX = 2
            THEN 'BANANAS'
    END AS FRUIT_TYPE
FROM TMP_FRUIT_STOCK fs
JOIN LATERAL(
    --Unfortunately looks like you can only JOIN LATERAL one row. So let's make an array
    SELECT ARRAY_CONSTRUCT(NUMBER_OF_APPLES --index 0
                            , NUMBER_OF_ORANGES --index 1
                            , NUMBER_OF_BANANAS --index 2
                            )
    FROM TMP_FRUIT_STOCK
    WHERE fs.ID = ID
        ) X(FRUIT_ARRAY)
--But now we can unpack that array with the FLATTEN command (feel like this should be called UNFLATTEN?)
JOIN LATERAL FLATTEN(X.FRUIT_ARRAY) fn
ORDER BY ID
    , fn.INDEX

Returns:

ID NUMBER FRUIT_TYPE
1 3 APPLES
1 4 ORANGES
1 2 BANANAS
2 1 APPLES
2 5 APPLES
2 1 APPLES
2 5 APPLES
2 1 ORANGES
2 2 ORANGES
2 1 ORANGES
2 2 ORANGES
2 1 BANANAS
2 7 BANANAS
2 1 BANANAS
2 7 BANANAS