I'm having a problem bringing back a truly DISTINCT
set of records using ROW_NUMBER()
with a LEFT JOIN
whose joined results are in turn concatenated into a comma-delimited list with the FOR XML PATH
trick.
The objective is to implement paging in a web application because of the large number of records that can be returned and only bringing back a subset each time depending on what page number was selected.
Consider the following SQL:
SELECT DISTINCT IndustryCode,
STUFF (
(SELECT ',' + FE_LocationCode
FROM tblLegacy_Codes i2
WHERE c.IndustryCode = i2.IndustryCode
FOR XML PATH(''))
,1,1,'') AS LegacyList
FROM Common_Clli c
LEFT JOIN tblLegacy_Codes legacy ON c.IndustryCode = legacy.IndustryCode
WHERE FE_LocationCode LIKE 'AUS%'
It brings back a very nice DISTINCT
list as expected:
IndustryCode LegacyList
AUSTTX85 AUST.GET,AUST.LDD,AUST.UU4
AUSTTXTE AUST.TE,AUST.TEH
AUSUTX78 AUST.AX3
SCHWAS01 AUSC01C1,AUSC01UT
SZBGASAH AUSB01C1,AUSB01OB,AUSB01TA
SZBGASAI AUSB01SN,AUVI01SN,AUVI02SN
SZBGASAK AUSB03C1,AUSB03V1
BUT - when the ROW_NUMBER syntax is added to the first line of the query above so that it looks like this:
SELECT DISTINCT ROW_NUMBER() OVER(ORDER BY IndustryCode) AS rowNum, IndustryCode,...
The results now include a row for each IndustryCode for as many are in the LegacyList because DISTINCT
of course now includes rowNum. Without including the entire result set, here is what is seen just for the first IndustryCode in the list because of its 3 associated child records:
rowNum IndustryCode LegacyList
1 AUSTTX85 AUST.GET,AUST.LDD,AUST.UU4
2 AUSTTX85 AUST.GET,AUST.LDD,AUST.UU4
3 AUSTTX85 AUST.GET,AUST.LDD,AUST.UU4
And of course the final implementation of paging is to be able to wrap all the SQL mentioned with
SELECT * FROM (
[ the SQL you've already seen ]
WHERE rowNum BETWEEN x and y
For the moment I've had to resort to executing the SQL first without ROW_NUMBER() and putting the results into a temp table and then getting the final product from there. And unfortunately that has totally defeated the intention of minimizing load by only getting a subset of records every time.
I'm fairly convinced there's just something I don't know. It's always a safe bet. Many regards for help.
The final product that satisfies all the requirements ends up being the SQL below. While both the first 2 solutions offered were helpful, neither addressed the need to NOT house the entire results of the query first in a temporary style table, whether that table be a common table expression or something else. This StackOverflow thread led me to a complete resolution.
SELECT RowNum, IndustryCode, FEList FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY IndustryCode) AS RowNum, IndustryCode, FEList
FROM
(
SELECT DISTINCT IndustryCode,
STUFF (
(SELECT ',' + FE_LocationCode
FROM tblLegacyCodes i2
WHERE c.IndustryCode = i2.IndustryCode
FOR XML PATH(''))
,1,1,'') AS LegacyList
FROM Common_Clli c
LEFT JOIN tblLegacyCodes legacy ON c.IndustryCode = legacy.IndustryCode
WHERE FE_LocationCode LIKE 'AUS%'
) subInner
) subOuter
WHERE RowNum BETWEEN x AND y
What about