I have data in table Foo in columns x1 and x2 and x3, and more data in table Bar in x4. Both tables are primary-keyed by a unique id, and each Bar has a foreign key to a single Foo, as in the schema below:
CREATE TABLE Foo (id INT, x1 INT, x2 INT, x3 INT, ...)
CREATE TABLE Bar (id INT, fooId INT, x4 INT, ...)
CREATE TABLE Qux (x INT, ...)
This is the right schema, and it's properly normalized for its use case.
I need a set of all distinct x values from Foo and Bar where the Foo records match some WHERE clause. I then need to use those values to look up the correct records in another table Qux.
I had solved it with UNION ALL, as in the example below:
WITH CTE_Ids AS (
SELECT x1 AS x FROM Foo WHERE ...
UNION ALL SELECT x2 AS x FROM Foo WHERE ...
UNION ALL SELECT x3 AS x FROM Foo WHERE ...
UNION ALL SELECT x4 AS x FROM Foo f LEFT OUTER JOIN Bar b ON f.id = b.fooId WHERE ...
),
CTE_UniqueIds AS (
SELECT DISTINCT x FROM CTE_Ids
)
SELECT q.*
FROM CTE_UniqueIds ids
INNER JOIN Qux q ON ids.x = q.x
This produces the right result set, and I don't mind repeating the WHERE clause in the code — but unfortunately, it's very inefficient, scanning the Foo and Bar tables many times, because SQL Server doesn't realize it could scan the data exactly once. The inefficient query plan is bad enough that we're experiencing significant slowdowns in our production software.
So how can I get the unique set of x values unioned across columns without SQL Server scanning each table multiple times?
I puzzled over this for a while: It seemed like there ought to be a way to be able to simply write
and then somehow tell SQL Server to union all of the X columns across each resulting row into a unique result set of X values. I searched awhile, and I eventually found part of a solution to it in @MatBailie's solution for another question, and I expanded his solution to the answer below.
The key to unioning horizontally across columns is by abusing the
OUTER APPLYoperator, which can produce multiple result rows for each input row, and using it andUNION ALLto combine each result row with itself multiple times:You can use
DISTINCTat the top orGROUP BYat the bottom (I preferGROUP BY, as SQL Server can sometimes optimize that better) to produce the unique set ofxvalues if there are duplicates.The full query would be structured something like this:
The query plan for the above will only scan each of the correct
FooandBarrecords exactly once, and then just perform some sorting and filtering in memory on the result, before then joining the unique set of resultingxvalues toQux.