How can I union values across multiple columns of a result set without repeated table scans in the query plan?

Question

How can I union values across multiple columns of a result set without repeated table scans in the query plan?

87 Views Asked by Sean Werkema At 16 June 2023 at 15:34

I have data in table Foo in columns x1 and x2 and x3, and more data in table Bar in x4. Both tables are primary-keyed by a unique id, and each Bar has a foreign key to a single Foo, as in the schema below:

CREATE TABLE Foo (id INT, x1 INT, x2 INT, x3 INT, ...)
CREATE TABLE Bar (id INT, fooId INT, x4 INT, ...)

CREATE TABLE Qux (x INT, ...)

This is the right schema, and it's properly normalized for its use case.

I need a set of all distinct x values from Foo and Bar where the Foo records match some WHERE clause. I then need to use those values to look up the correct records in another table Qux.

I had solved it with UNION ALL, as in the example below:

WITH CTE_Ids AS (
    SELECT x1 AS x FROM Foo WHERE ...
    UNION ALL SELECT x2 AS x FROM Foo WHERE ...
    UNION ALL SELECT x3 AS x FROM Foo WHERE ...
    UNION ALL SELECT x4 AS x FROM Foo f LEFT OUTER JOIN Bar b ON f.id = b.fooId WHERE ...
),
CTE_UniqueIds AS (
    SELECT DISTINCT x FROM CTE_Ids
)
SELECT q.*
FROM CTE_UniqueIds ids
INNER JOIN Qux q ON ids.x = q.x

This produces the right result set, and I don't mind repeating the WHERE clause in the code — but unfortunately, it's very inefficient, scanning the Foo and Bar tables many times, because SQL Server doesn't realize it could scan the data exactly once. The inefficient query plan is bad enough that we're experiencing significant slowdowns in our production software.

So how can I get the unique set of x values unioned across columns without SQL Server scanning each table multiple times?

Original Q&A

There are 2 best solutions below

**Sean Werkema** · Answer 1 · 2023-06-16T15:34:11.623000

I puzzled over this for a while: It seemed like there ought to be a way to be able to simply write

SELECT
    f.x1, f.x2, f.x3, b.x4
FROM Foo f
    LEFT OUTER JOIN Bar b ON b.fooId = b.id
WHERE ...

and then somehow tell SQL Server to union all of the X columns across each resulting row into a unique result set of X values. I searched awhile, and I eventually found part of a solution to it in @MatBailie's solution for another question, and I expanded his solution to the answer below.

The key to unioning horizontally across columns is by abusing the OUTER APPLY operator, which can produce multiple result rows for each input row, and using it and UNION ALL to combine each result row with itself multiple times:

SELECT u.x
FROM Foo f
    LEFT OUTER JOIN Bar b ON b.fooId = b.id
    OUTER APPLY (
        SELECT f.x1 AS x
        UNION ALL SELECT f.x2 AS x
        UNION ALL SELECT f.x3 AS x
        UNION ALL SELECT b.x4 AS x
    ) AS u
WHERE ...
GROUP BY u.x

You can use DISTINCT at the top or GROUP BY at the bottom (I prefer GROUP BY, as SQL Server can sometimes optimize that better) to produce the unique set of x values if there are duplicates.

The full query would be structured something like this:

WITH CTE_Ids AS (
    SELECT u.x
    FROM Foo f
        LEFT OUTER JOIN Bar b ON b.fooId = b.id
        OUTER APPLY (
            SELECT f.x1 AS x
            UNION ALL SELECT f.x2 AS x
            UNION ALL SELECT f.x3 AS x
            UNION ALL SELECT b.x4 AS x
        ) AS u
    WHERE ...
    GROUP BY u.x
)
SELECT q.*
FROM CTE_Ids ids
INNER JOIN Qux q ON ids.x = q.x

The query plan for the above will only scan each of the correct Foo and Bar records exactly once, and then just perform some sorting and filtering in memory on the result, before then joining the unique set of resulting x values to Qux.

**siggemannen** · Answer 2 · 2023-06-16T15:46:19.023000

A little example of the values construct:

declare @foo table (id int primary key, x1 int, x2 int, x3 int)
declare @bar table (id int primary key, fooid int, x4 int)

insert into @foo values(1, 10,20,30)
,   (2, 5, 10, 40)
,   (3, 1, 1, 1)
,   (4, 10,20,20)

insert into @bar
values  (1, 1, 100)
,   (2, 1, 1337)
,   (3, 2, 66)
,   (4, 3, 1000)
,   (5, 4, 1)


select distinct z.x1
from @foo f
left join @bar b
    ON  b.fooid = f.id
cross apply (
    values(f.x1), (f.x2), (f.x3), (b.x4)
    ) z
where z.x1 between 5 and 60

I think it's often much more terser than UNION ALL. But both do the job indeed!

How can I union values across multiple columns of a result set without repeated table scans in the query plan?

There are 2 best solutions below

Related Questions in SQL-SERVER

Related Questions in T-SQL

Related Questions in SQL-EXECUTION-PLAN

Related Questions in UNION-ALL

Related Questions in OUTER-APPLY

Trending Questions

Popular # Hahtags

Popular Questions