I have this table ("colors") in SQL:
CREATE TABLE colors (
color1 VARCHAR(50),
color2 VARCHAR(50),
year INT,
var1 INT,
var2 INT,
var3 INT,
var4 INT
);
INSERT INTO colors (color1, color2, year, var1, var2, var3, var4) VALUES
('red', 'blue', 2010, 1, 2, 1, 2),
('blue', 'red', 2010, 1, 2, 1, 2),
('red', 'blue', 2011, 1, 2, 5, 3),
('blue', 'red', 2011, 5, 3, 1, 2),
('orange', NULL, 2010, 5, 9, NULL, NULL)
('green', 'white', 2010, 5, 9, 6, 3);
The table looks like this:
color1 color2 year var1 var2 var3 var4
red blue 2010 1 2 1 2
blue red 2010 1 2 1 2
red blue 2011 1 2 5 3
blue red 2011 5 3 1 2
orange NULL 2010 5 9 NULL NULL
green white 2010 5 9 6 3
I am trying to do the following:
- For pairs of colors in the same year (e.g. red/blue/2010 and blue/red/2010) - if var1=var3 and var2=var4 : then keep only one pair
- For pairs of colors in the same year - if var1!=var3 OR var2!=var4 : then keep both pairs
- For colors that do not have pairs in the same year : keep those rows as well
The final result should look like this:
color1 color2 year var1 var2 var3 var4
red blue 2010 1 2 1 2
red blue 2011 1 2 5 3
blue red 2011 5 3 1 2
orange NULL 2010 5 9 NULL NULL
green white 2010 5 9 6 3
Here is my attempt to write the SQL code for this:
First I write CTEs to identify pairs - then I verify the OR conditions:
WITH pairs AS (
SELECT *,
CASE
WHEN color1 < color2 THEN color1 || color2 || CAST(year AS VARCHAR(4))
ELSE color2 || color1 || CAST(year AS VARCHAR(4))
END AS pair_id
FROM colors
),
ranked_pairs AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY pair_id ORDER BY color1, color2) as row_num
FROM pairs
)
SELECT color1, color2, year, var1, var2, var3, var4
FROM ranked_pairs
WHERE row_num = 1 OR var1 != var3 OR var2 != var4;
The output looks like this:
color1 color2 year var1 var2 var3 var4
orange <NA> 2010 5 9 NA NA
blue red 2010 1 2 1 2
blue red 2011 5 3 1 2
red blue 2011 1 2 5 3
green white 2010 5 9 6 3
Am I doing this correctly? The final result looks correct but I am not confident, e. this code might not work on some fringe cases.
Thanks!
Your ordering colors in
pair_idseems to be wrong in case the same pair represents a different order. Also you are handling null values as equal.Please check version below: