Using CROSS APPLY to more easily "UNPIVOT" a wide table in Snowflake?

Question

Using CROSS APPLY to more easily "UNPIVOT" a wide table in Snowflake?

324 Views Asked by Eli Rush Kallison At 27 June 2023 at 20:20

My company just switched from Microsoft SQL Server to Snowflake, and I have a bunch of scripts where I unpivoted a wide table to a long one using code like this:

SELECT 
    CONGLOM_IDc,
    DESTINATION_IDc,
    COHORTc,
    COHORT_PASS_TYPEc,
    ACCESS_SEASON,
    PASS_TYPE,
    First_Cohortc
FROM #PassTypeByYear2 M
CROSS APPLY
    (
    VALUES
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'18/19', "18/19_Pass", First_Cohort),
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'19/20', "19/20_Pass", First_Cohort), 
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'20/21', "20/21_Pass", First_Cohort),
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'21/22', "21/22_Pass", First_Cohort),
        (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE,'22/23', "22/23_Pass", First_Cohort)
    ) c (CONGLOM_IDc, DESTINATION_IDc, COHORTc, COHORT_PASS_TYPEc, ACCESS_SEASON, PASS_TYPE, First_Cohortc)

Is there a way to replicate this in Snowflake without using UNPIVOT a bunch of times? I have read the following post, but I can't seem to adjust CROSS JOIN to this situation. Any help is much appreciated.

An example of the "starting" table (#PassTypeByYear2) is below:

CONGLOM_ID	DESTINATION_ID	COHORT	COHORT_PASS_TYPE	18/19_Pass	19/20_Pass	20/21_Pass	21/22_Pass	22/23_Pass	First_Cohort	Consistent_Pass_All_Years
101781679	196	22/23	Paid	Employee	Employee	Employee	Paid	Paid	NO	No
101781679	196	21/22	Paid	Employee	Employee	Employee	Paid	Paid	NO	No
101781679	196	20/21	Employee	Employee	Employee	Employee	Employee	Paid	NO	No
101781679	196	18/19	Employee	Employee	Employee	Employee	Employee	Paid	YES	No
101781679	196	19/20	Employee	Employee	Employee	Employee	Employee	Paid	NO	No
101781679	227	19/20	Employee	NULL	Employee	NULL	NULL	NULL	YES	Yes

and the output I want is below (EDIT: I just realized I did a SELECT TOP 10 * from this table instead of *. It should have a row for every unique cominbation of unpivoted values. I will re-run this query and upload the full table tomorrow - the table is about 5M rows so it takes a long time to run):

CONGLOM_ID	DESTINATION_ID	COHORT	COHORT_PASS_TYPE	ACCESS_SEASON	PASS_TYPE	First_Cohort	Consistent_Pass_All_Years
101781679	196	22/23	Paid	18/19	Employee	NO	No
101781679	196	22/23	Paid	19/20	Employee	NO	No
101781679	196	22/23	Paid	20/21	Employee	NO	No
101781679	196	22/23	Paid	21/22	Paid	NO	No
101781679	196	22/23	Paid	22/23	Paid	NO	No
101781679	196	20/21	Employee	18/19	Employee	NO	No
101781679	196	20/21	Employee	19/20	Employee	NO	No
101781679	196	20/21	Employee	20/21	Employee	NO	No
101781679	196	20/21	Employee	21/22	Paid	NO	No
101781679	196	20/21	Employee	22/23	Paid	NO	No

CROSS APPLY in Oracle/SQL Server substitute in Snowflake

I tried converting the rest of the code to SNOWFLAKE, and I expected that I would end up with a long table (instead of a wide one) where the "18/19_Pass" Column turned into a value in the "18/19" ACCESS_SEASON row. Instead, I am just hitting an error:

SQL Error [2014] [22000]: SQL compilation error: Invalid expression [M.CONGLOM_ID] in VALUES clause

Original Q&A

There are 2 best solutions below

**NickW** · Answer 1 · 2023-06-29T11:08:08.533000

I think this SQL should give you what you want. I renamed some of the columns so that they were valid identifiers that didn't need to be "quoted" in order to use them - just to make things simpler.

CREATE TABLE PASS_BY_YEAR2 (
CONGLOM_ID TEXT
, DESTINATION_ID TEXT
, COHORT TEXT
, COHORT_PASS_TYPE TEXT
, PASS_18_19 TEXT
, PASS_19_20 TEXT
, PASS_20_21 TEXT
, PASS_21_22 TEXT
, PASS_22_23 TEXT
, FIRST_COHORT TEXT
, CONSISTENT_PASS_ALL_YEARS TEXT);

insert into pass_by_year2 (CONGLOM_ID, DESTINATION_ID, COHORT, COHORT_PASS_TYPE, PASS_18_19, PASS_19_20, PASS_20_21, PASS_21_22, PASS_22_23, First_Cohort, Consistent_Pass_All_Years) values 
('101781679', '196', '22/23', 'Paid', 'Employee', 'Employee', 'Employee', 'Paid', 'Paid', 'NO', 'No'),
('101781679', '196', '21/22', 'Paid', 'Employee', 'Employee', 'Employee', 'Paid', 'Paid', 'NO', 'No'),
('101781679', '196', '20/21', 'Employee', 'Employee', 'Employee', 'Employee', 'Employee', 'Paid', 'NO', 'No'),
('101781679', '196', '18/19', 'Employee', 'Employee', 'Employee', 'Employee', 'Employee', 'Paid', 'YES', 'No'),
('101781679', '196', '19/20', 'Employee', 'Employee', 'Employee', 'Employee', 'Employee', 'Paid', 'NO', 'No'),
('101781679', '227', '19/20', 'Employee', 'NULL', 'Employee', 'NULL', 'NULL', 'NULL', 'YES', 'Yes')
;

SELECT
CONGLOM_ID  
,DESTINATION_ID 
,COHORT 
,COHORT_PASS_TYPE   
,RIGHT(ACCESS_SEASON,LEN(ACCESS_SEASON)-5) ACCESS_SEASON    
,PASS_TYPE  
,First_Cohort   
,Consistent_Pass_All_Years
FROM PASS_BY_YEAR2
UNPIVOT (PASS_TYPE FOR ACCESS_SEASON IN (PASS_18_19, PASS_19_20, PASS_20_21, PASS_21_22, PASS_22_23))
;

**DanielOnMSE** · Answer 2 · 2024-03-22T06:15:31.607000

See below for one way of doing this. Note that you could probably also build a JSON OBJECT (key-value pairs) instead of an ARRAY. Which would remove the need for the CASE statement in the final select, as you could directly reference the KEY from the JSON OBJECT (which you would naturally set to be the appropriate name).

No idea how this will perform on large data... Hopefully the optimiser is smart enough to see it is building an array just to unpack it again... But just test it out and see if it works. Otherwise, it does the trick for small data.

Considering your question has multiple columns, I guess you'd have to LATERAL JOIN, and then LATERAL JOIN FLATTEN once for every column you want to unpivot (keeping track of the array INDEX), giving you one rowset for each column. You'd then have to INNER JOIN these rowsets together ON ID = ID AND INDEX = INDEX.

So maybe it isn't any better than UNPIVOT... Sure makes you miss SQL Server.

--Environment dependent values
--USE ROLE "";
--USE DATABASE "";
--USE SCHEMA "" ;
--USE WAREHOUSE "";

--Create some test data
CREATE OR REPLACE TEMP TABLE TMP_FRUIT_STOCK
AS
SELECT 1 AS ID
    , 3 AS NUMBER_OF_APPLES
    , 4 AS NUMBER_OF_ORANGES
    , 2 AS NUMBER_OF_BANANAS
UNION ALL
SELECT 2 AS ID
    , 1 AS NUMBER_OF_APPLES
    , 1 AS NUMBER_OF_ORANGES
    , 1 AS NUMBER_OF_BANANAS
UNION ALL
SELECT 2 AS ID
    , 5 AS NUMBER_OF_APPLES
    , 2 AS NUMBER_OF_ORANGES
    , 7 AS NUMBER_OF_BANANAS;

--I have my doubts about how performant this will be on a big data set, but who knows try it out...
SELECT fs.ID
    , fn.VALUE AS NUMBER
    , CASE 
        WHEN fn.INDEX = 0 
            THEN 'APPLES'
        WHEN fn.INDEX = 1
            THEN 'ORANGES'
        WHEN fn.INDEX = 2
            THEN 'BANANAS'
    END AS FRUIT_TYPE
FROM TMP_FRUIT_STOCK fs
JOIN LATERAL(
    --Unfortunately looks like you can only JOIN LATERAL one row. So let's make an array
    SELECT ARRAY_CONSTRUCT(NUMBER_OF_APPLES --index 0
                            , NUMBER_OF_ORANGES --index 1
                            , NUMBER_OF_BANANAS --index 2
                            )
    FROM TMP_FRUIT_STOCK
    WHERE fs.ID = ID
        ) X(FRUIT_ARRAY)
--But now we can unpack that array with the FLATTEN command (feel like this should be called UNFLATTEN?)
JOIN LATERAL FLATTEN(X.FRUIT_ARRAY) fn
ORDER BY ID
    , fn.INDEX

Returns:

ID	NUMBER	FRUIT_TYPE
1	3	APPLES
1	4	ORANGES
1	2	BANANAS
2	1	APPLES
2	5	APPLES
2	1	APPLES
2	5	APPLES
2	1	ORANGES
2	2	ORANGES
2	1	ORANGES
2	2	ORANGES
2	1	BANANAS
2	7	BANANAS
2	1	BANANAS
2	7	BANANAS

Using CROSS APPLY to more easily "UNPIVOT" a wide table in Snowflake?

There are 2 best solutions below

Related Questions in SQL

Related Questions in UNPIVOT

Related Questions in SNOWFLAKE-SCHEMA

Related Questions in CROSS-APPLY

Trending Questions

Popular # Hahtags

Popular Questions