Returning Row Numbers When Conditions Are Met

64 Views Asked by At

I am working with Netezza SQL.

I have the following table:

  name year var1 var2
  John 2001    a    b
  John 2002    a    a
  John 2003    a    b
  Mary 2001    b    a
  Mary 2002    a    b
  Mary 2003    b    a
 Alice 2001    a    b
 Alice 2002    b    a
 Alice 2003    a    b
   Bob 2001    b    a
   Bob 2002    b    b
   Bob 2003    b    a

I want to answer the following question:

  • For each name, when (i.e., which row_num) does var1 change for the first time? Keep the full information for that row so we can see the change in var1_before/var1_after and var2_before/var2_after
  • If a name kept its var1 value throughout - return the full information row for the last available year corresponding to that name (along with the row_number)

I wrote this code to look at how var1 and var2 change year-to-year for each person:

WITH CTE AS (
    SELECT 
        name, 
        year, 
        var1, 
        var2,
        LAG(var1, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var1_before,
        LEAD(var1, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var1_after,
        LAG(var2, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var2_before,
        LEAD(var2, 1) OVER (PARTITION BY name ORDER BY year ASC) AS var2_after,
        ROW_NUMBER() OVER (PARTITION BY name ORDER BY year ASC) AS row_num
    FROM 
        mytable
)
SELECT 
  *
FROM 
    CTE;

But I don't know how to proceed from here. I tried to identify names with changes vs. names with no changes, but I keep getting confused and messing up.

Can someone please show me how I can do this?

The final result would look something like this (I included the first 2 rows as an example, but in reality there will be 1 row for all names in the original data):

 name           category total_number_of_rows year_when_var1_changed var1_before var1_after var2_before var2_after
 John Var1 Never Changed                    3                   NULL           a          a           a          b
 Mary       Var1 Changed                    3                      2           b          a           a          b
2

There are 2 best solutions below

0
Neftali Ramos On

To solve your problem, we need to extend the Common Table Expression (CTE) you've already created with additional logic to detect when var1 changes for the first time for each name, and also to handle cases where var1 does not change throughout the period. This involves using additional window functions and conditional logic in SQL.

some considerations:

  • Use the LAG function to compare the current var1 value with the previous year's var1 value. You've already done this part.
  • Add a column to flag rows where var1 changes.
  • For each name, find the first row where var1 changes.
  • For names where var1 does not change, we'll return the row for the last available year.

The code:

WITH CTE AS (
    SELECT 
        name, 
        year, 
        var1, 
        var2,
        LAG(var1) OVER (PARTITION BY name ORDER BY year) AS var1_before,
        LEAD(var1) OVER (PARTITION BY name ORDER BY year) AS var1_after,
        LAG(var2) OVER (PARTITION BY name ORDER BY year) AS var2_before,
        LEAD(var2) OVER (PARTITION BY name ORDER BY year) AS var2_after,
        ROW_NUMBER() OVER (PARTITION BY name ORDER BY year) AS row_num,
        CASE 
            WHEN LAG(var1) OVER (PARTITION BY name ORDER BY year) IS NOT NULL AND 
                 LAG(var1) OVER (PARTITION BY name ORDER BY year) != var1 THEN 1
            ELSE 0 
        END AS var1_changed_flag
    FROM 
        mytable
),
RankedChanges AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY name, var1_changed_flag ORDER BY year) AS change_rank
    FROM 
        CTE
    WHERE 
        var1_changed_flag = 1 OR 
        var1_after IS NULL -- This condition helps to include the last row for each name
)
SELECT 
    *
FROM 
    RankedChanges
WHERE 
    change_rank = 1 OR
    (var1_after IS NULL AND var1_changed_flag = 0) -- Select the last row if var1 never changed
ORDER BY 
    name, 
    year;
0
Sohail Aslam On
WITH CTE AS (
    SELECT 
        name, 
        year, 
        var1, 
        var2,
        LAG(var1) OVER (PARTITION BY name ORDER BY year) AS var1_before,
        LEAD(var1) OVER (PARTITION BY name ORDER BY year) AS var1_after,
        LAG(var2) OVER (PARTITION BY name ORDER BY year) AS var2_before,
        LEAD(var2) OVER (PARTITION BY name ORDER BY year) AS var2_after,
        ROW_NUMBER() OVER (PARTITION BY name ORDER BY year) AS row_num,
        CASE WHEN var1 = LAG(var1) OVER (PARTITION BY name ORDER BY year) THEN 'Var1 Never Changed'
             ELSE 'Var1 Changed' END AS category
    FROM 
        mytable
)
SELECT 
    name,
    category,
    COUNT(*) AS total_number_of_rows,
    MAX(CASE WHEN category = 'Var1 Changed' THEN year END) AS year_when_var1_changed,
    MAX(var1_before) AS var1_before,
    MAX(var1_after) AS var1_after,
    MAX(var2_before) AS var2_before,
    MAX(var2_after) AS var2_after
FROM 
    CTE
GROUP BY 
    name,
    category;

Demo: https://dbfiddle.uk/0sALtaU2