Creating Snowflake SDF that can take optional SQL as argument

53 Views Asked by At

I've a simple function that calculates z-scores:

CREATE OR REPLACE FUNCTION z_score(x FLOAT)
  RETURNS FLOAT
  COMMENT = 'Simple z-score calculation.'
  AS 'NVL2(x, (x - AVG(x) OVER()) / STDDEV(x) OVER(), NULL)'
;

How can I extend the above definition to enable end user passing optional PARTITION BY / ORDER BY to the OVER() call so the score can be calculate within a group.

Details

In a non-functional form the one can execute

SELECT 
  NVL2(x, (x - AVG(x) OVER(PARTITION BY group_col)) / STDDEV(x) OVER(
             PARTITION BY group_col), NULL) 
  FROM ...

Is it possible to mimic this behaviour in the function? It would like to make it possible to call function in two ways:

  1. z_score(x) → No grouping argument. Executes:

    NVL2(x, (x - AVG(x) OVER()) / STDDEV(x) OVER(), NULL)
    
  2. z_score(x, PARTITION BY group_col) → With grouping argument, generating statement from example. Executes:

    NVL2(x, (x - AVG(x) OVER(PARTITION BY group_col)) / STDDEV(x) OVER(
          PARTITION BY group_col), NULL)
    

Example

--- Function experiment
CREATE
OR REPLACE TEMPORARY TABLE tmp_some_values (group_col VARCHAR, x_val FLOAT);
INSERT INTO
    tmp_some_values
VALUES
    ('A', 786783),
    ('A', 7653.22),
    ('A', 75553.11),
    ('B', 1.33),
    ('B', 1.1),
    ('B', 1.2);
SELECT
    x_val,
    NVL2(x_val, (x_val - AVG(x_val) OVER()) / STDDEV(x_val) OVER(), NULL) AS z_score,
    NVL2(x_val,(x_val - AVG(x_val) OVER(PARTITION BY group_col)) / STDDEV(x_val) OVER(PARTITION BY group_col), NULL)  AS z_score_group
FROM
    tmp_some_values;

Results

X_VAL Z_SCORE Z_SCORE_GROUP
786783 2.032252526 1.151122095
7653.22 -0.4349140533 -0.654227693
75553.11 -0.2199045084 -0.4968944024
1.33 -0.4591442749 1.040531963
1.1 -0.4591450032 -0.9538209665
1.2 -0.4591446866 -0.08671099695

Workaround solution

I was able to come up with a workaround using IFF

-- Define Z-Score calculating function
CREATE OR REPLACE FUNCTION z_score(x FLOAT, group_col VARCHAR DEFAULT NULL)
    RETURNS FLOAT
    COMMENT = 'Z-score calculation'
    AS
    $$
    IFF(group_col IS NULL,
        NVL2(x, (x - AVG(x) OVER ()) / STDDEV(x) OVER (), NULL),
        NVL2(x, (x - AVG(x) OVER (PARTITION BY group_col)) / STDDEV(x) OVER (PARTITION BY group_col), NULL)
        )
    $$
;

Still I would prefer a more intelligent way of handling optional SQL-like arguments.

0

There are 0 best solutions below