Can the two sides of a SQL Set operation be correlated?

63 Views Asked by At

Is there someway in SQL to correlate the sides of a set operation?

I want to run a query like

SELECT * except(id) FROM mytable where id = 1
EXCEPT DISTINCT
SELECT * except(id) FROM mytable where id = 2

Only I have thousands of pairs of (1,2) which I want to compare ie. (2,3), (3,4)

I could write something like:

with numbers as (
  SELECT n FROM unnest(generate_array(1,100)) n
)

,sample_data as (
  SELECT 
    n as id,
    n2, 
    date_add(current_date(), interval n2 day ) d 
  FROM numbers 
  CROSS JOIN unnest(generate_array(numbers.n, 100)) n2
)

SELECT sample_data.* except(id) 
FROM sample_data 
INNER JOIN numbers on n = id
EXCEPT DISTINCT
SELECT sample_data.* except(id) 
FROM sample_data 
INNER JOIN numbers ON sample_data.id = numbers.n + 1

But this gives only differences for the lowest id as all of the others are in both sets of the except operator.


"Expected result"

I've been asked for sample data and an expected result in the comments. I think that these obscure my question rather than explain it, but here goes.

If the first query is run 100 times, each time it returns one row, as id 1 and id 2 each have 100 rows 99 of which match. I want a result with 100 rows returned for this sample data, with n2 being 1 through 100.

To be really clear, there are a hundred ways to generate the numbers 1 through 100. The question itself includes the best one. Doing so is not part of my question. The question is about correlating the queries on either side of the EXCEPT keyword.


I have plenty of work arounds already. The obvious one is to issue the EXCEPT query many times from a program written outside SQL. The much better performing one is to generate a statement with all of the columns and a lead or lag of each column and filter in a qualify clause if any of the the columns have changed. That column-by-column-lag query is long and hard to read, but my question largely my academic curiosity.

0

There are 0 best solutions below