Snowflake Equiv of PySpark STACK (LATERAL VIEW)

44 Views Asked by At

I have working code using PySpark which uses the STACK function to turn columns into rows, creating a display_name column in doing so.

I am trying to convert this to a Snowflake query, in theory my thought was that a LATERAL JOIN should be able to do the job.

The below is very similar to what I have in PySpark, but for some reason I cant get it to work. I keep getting the below error:

Invalid expression [T.ACTIVE_SHARE_PCT] in VALUES clause

SELECT
    t.as_of_date,
    t.product_code,
    v.*
    
FROM (
    SELECT
        to_date(dd.date) AS as_of_date,
        pd.product_code AS product_code,
        pv.*
       
    FROM
        integration.product_valuation_accounting_fact_v01 pv
      
        JOIN integration.date_dim_v01 dd
            ON pv.valuation_date_key = dd.date_key

        JOIN integration.product_dim_v01 pd
            ON pv.product_key = pd.product_key
            AND dd.date BETWEEN pd.start_date AND pd.end_date

        LEFT JOIN integration.currency_dim_v01 cd
            ON pv.base_currency_key = cd.currency_key
        
    WHERE
        dd.date = '2023-06-30'
) AS t

CROSS JOIN LATERAL (
    VALUES 
        ('Active Share End Weight', t.active_share_pct, null),
        ('Wtd Avg Market Cap Base', t.weighted_avg_mkt_cap_base, null),
        ('Price to Book Ratio – Latest Reported', t.price_book_ratio_latest_rptd, null)
) AS v (display_name, portfolio, benchmark)


I can't really use a UNPIVOT clause because I need to include any NULL values plus, i would be unpivoting two columns, the port and benchmark.

1

There are 1 best solutions below

0
Simeon Pilgrim On

So if I makes so fake tables (as CTE's) the have the columns you use in your joins, and run this:

WITH product_valuation_accounting_fact_v01(valuation_date_key, product_key, base_currency_key) as (
    select * from values
        (100, 200, 300)
), date_dim_v01(date_key, date) as (
    select * from values
        (100, '2023-06-30')
), product_dim_v01(product_key, start_date, end_date, product_code) as (
    select * from values
        (200, '2023-01-01', '2099-01-01', 12345)
), currency_dim_v01(currency_key) as (
    select * from values
        (300 )
)
SELECT
    t.as_of_date,
    t.product_code,
    v.* 
FROM (
    SELECT
        to_date(dd.date) AS as_of_date,
        pd.product_code AS product_code,
        pv.*
    FROM
        /*integration.*/product_valuation_accounting_fact_v01 as pv
    JOIN /*integration.*/date_dim_v01 as dd
        ON pv.valuation_date_key = dd.date_key
    JOIN /*integration.*/product_dim_v01 as pd
        ON pv.product_key = pd.product_key
            AND dd.date BETWEEN pd.start_date AND pd.end_date
    LEFT JOIN /*integration.*/currency_dim_v01 as cd
        ON pv.base_currency_key = cd.currency_key
    WHERE dd.date = '2023-06-30'
) AS t
CROSS JOIN LATERAL (
    VALUES 
        ('Active Share End Weight', t.active_share_pct, null),
        ('Wtd Avg Market Cap Base', t.weighted_avg_mkt_cap_base, null),
        ('Price to Book Ratio – Latest Reported', t.price_book_ratio_latest_rptd, null)
) AS v (display_name, portfolio, benchmark)

I get:

Error: invalid identifier 'T.ACTIVE_SHARE_PCT' (line 37)

which is different to your error. Given t only has 2 known columns and "all of PV columns" I have to assume this are on PV, so we add those:

WITH product_valuation_accounting_fact_v01(valuation_date_key, product_key, base_currency_key, 
        ACTIVE_SHARE_PCT, WEIGHTED_AVG_MKT_CAP_BASE, price_book_ratio_latest_rptd) as (
    select * from values
        (100, 200, 300, 19.1::number(10, 2), 0.0123::number(20,10), 100::number)
), date_dim_v01(date_key, date) as (
    select * from values
        (100, '2023-06-30')
), product_dim_v01(product_key, start_date, end_date, product_code) as (
    select * from values
        (200, '2023-01-01', '2099-01-01', 12345)
), currency_dim_v01(currency_key) as (
    select * from values
        (300 )
)
SELECT
    t.as_of_date,
    t.product_code,
    v.* 
FROM (
    SELECT
        to_date(dd.date) AS as_of_date,
        pd.product_code AS product_code,
        pv.*
    FROM
        /*integration.*/product_valuation_accounting_fact_v01 as pv
    JOIN /*integration.*/date_dim_v01 as dd
        ON pv.valuation_date_key = dd.date_key
    JOIN /*integration.*/product_dim_v01 as pd
        ON pv.product_key = pd.product_key
            AND dd.date BETWEEN pd.start_date AND pd.end_date
    LEFT JOIN /*integration.*/currency_dim_v01 as cd
        ON pv.base_currency_key = cd.currency_key
    WHERE dd.date = '2023-06-30'
) AS t
CROSS JOIN LATERAL (
    VALUES 
        ('Active Share End Weight', t.active_share_pct, null),
        ('Wtd Avg Market Cap Base', t.weighted_avg_mkt_cap_base, null),
        ('Price to Book Ratio – Latest Reported', t.price_book_ratio_latest_rptd, null)
) AS v (display_name, portfolio, benchmark)

Invalid expression [T.ACTIVE_SHARE_PCT] in VALUES clause

yippie, it seems we have reproduced your problem...

right, now I read the code more, you are wanting to make 3 rows for each input row, and take one of three different values, but those values are coming from the input. Thus the LATERAL, but this is not really going to fly, instead, you should cross join to 3 values, and then use a CASE to access the values you wish like:

WITH product_valuation_accounting_fact_v01(valuation_date_key, product_key, base_currency_key, 
        ACTIVE_SHARE_PCT, WEIGHTED_AVG_MKT_CAP_BASE, price_book_ratio_latest_rptd) as (
    select * from values
        (100, 200, 300, 19.1::number(10, 2), 0.0123::number(20,10), 100::number)
), date_dim_v01(date_key, date) as (
    select * from values
        (100, '2023-06-30')
), product_dim_v01(product_key, start_date, end_date, product_code) as (
    select * from values
        (200, '2023-01-01', '2099-01-01', 12345)
), currency_dim_v01(currency_key) as (
    select * from values
        (300 )
)
SELECT
    t.as_of_date,
    t.product_code,
    v.display_name,
    case  v.type
        when 1 then t.active_share_pct
        when 2 then t.weighted_avg_mkt_cap_base
        when 3 then t.price_book_ratio_latest_rptd
    end as portfolio,
    null as benchmark   
FROM (
    SELECT
        to_date(dd.date) AS as_of_date,
        pd.product_code AS product_code,
        pv.*
    FROM
        /*integration.*/product_valuation_accounting_fact_v01 as pv
    JOIN /*integration.*/date_dim_v01 as dd
        ON pv.valuation_date_key = dd.date_key
    JOIN /*integration.*/product_dim_v01 as pd
        ON pv.product_key = pd.product_key
            AND dd.date BETWEEN pd.start_date AND pd.end_date
    LEFT JOIN /*integration.*/currency_dim_v01 as cd
        ON pv.base_currency_key = cd.currency_key
    WHERE dd.date = '2023-06-30'
) AS t
CROSS JOIN (
    VALUES 
        (1, 'Active Share End Weight'),
        (2, 'Wtd Avg Market Cap Base'),
        (3, 'Price to Book Ratio – Latest Reported')
) AS v(type, display_name)

giving:

enter image description here