Convert comma separated rows into a table without using CTE or split_string

85 Views Asked by At

I've hit an interesting problem in our system. A design decision was made to include IDs in a comma separated string which are referenced in the day-to-day functioning of the system. It wasn't causing any problems initially but, now we have a large amount of data, the query to extract information is taking 30-60 seconds. This is mainly down to it checking every row's string for a particular ID and using a function to split each string. I expect that extrapolating this information into an indexed view would speed things up significantly but, although I'm able to create the query to do the work, I can't put it into an indexed view because it uses a cross apply.

The data is currently in this form

ProductId OtherProductId
123 123, 234, 456
456 987, 987, 876, 512

and I need to get it into an indexed view like this

ProductId OtherProductId
123 123
123 234
123 456
456 987
456 876
456 512

I can then join by OtherProductId and get the ProductIds associated.

The following will provide the data in a form I can use but takes 30+ seconds to run

SELECT DISTINCT ProductId, 
                LTRIM(RTRIM(m.n.value('.[1]', 'varchar(10)'))) AS OtherProductId
FROM (
        SELECT ProductId,
             CAST('<XMLRoot><RowData>'
             + REPLACE(OtherProductIds, ',', '</RowData><RowData>')
             + '</RowData></XMLRoot>' AS XML) AS x
        FROM MySourceTable
     ) t
     CROSS APPLY x.nodes('/XMLRoot/RowData') m(n)

I'd like to convert this into an indexed view so I can join it to my actual query and speed things up.

We're using SQL Server 2008 compatibility mode so are unable to make use of split_string and all the other newer features. Changing the code would require too much upheaval and have far-reaching issues in implementing but I am able to change and update the database as necessary.

2

There are 2 best solutions below

0
The Betpet On BEST ANSWER

Thanks to Thom A for pointing me to this in his comment. The solution I've used is

CREATE VIEW dbo.ExplodedProductIds
WITH SCHEMABINDING
AS
WITH tmp (ProductId, DataItem, String)
AS (
   SELECT ProductId,
          CAST(LEFT(OtherProductIds, CHARINDEX(',', IC.AdditionalProductTestIds + ',') - 1) AS VARCHAR(100)) AS DataItem,
          CAST(STUFF(OtherProductIds, 1, CHARINDEX(',', IC.OtherProductIds + ','), '') AS VARCHAR(100)) AS String
   FROM ProductSalesTable IC
   UNION ALL
   SELECT ProductId,
          CAST(LEFT(String, CHARINDEX(',', String + ',') - 1) AS VARCHAR(100)),
          CAST(STUFF(String, 1, CHARINDEX(',', String + ','), '') ASVARCHAR(100))
   FROM tmp
   WHERE String > '')
SELECT DISTINCT
       ProductId,
       DataItem
FROM tmp
WHERE ProductId IS NOT NULL
      AND DataItem IS NOT NULL;
GO

The query has dropped to around 5 seconds using this. Thanks to everyone for their help!

1
Yitzhak Khabinsky On

It is possible to simulate relationship via SQL Server XQuery capabilities.

The proposed solution is using a so called Quantified Expression.

For the reference: Quantified Expressions (XQuery)

Please test it in your environment to check if the performance is acceptable.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (id INT PRIMARY KEY, productID_List VARCHAR(256));
INSERT @tbl (id, productID_List) VALUES
(123, '123,234,456'),
(456, '987,876,512'),
(777, '987,999,512');

DECLARE @tbl2 TABLE (id INT PRIMARY KEY, description VARCHAR(256));
INSERT @tbl2 (id, description) VALUES
(234, 'Car'),
(999, 'Bike');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = ',';

SELECT * 
FROM @tbl AS t
CROSS APPLY (SELECT CAST('<root><r><![CDATA[' + 
    REPLACE(productID_List, @separator, ']]></r><r><![CDATA[') + 
    ']]></r></root>' AS XML)) AS t1(c)
INNER JOIN @tbl2 AS t2 ON c.value('some $r in /root/r/text()
          satisfies sql:column("t2.id") =  $r', 'BIT') = 1;

Output

id productID_List id description
123 123,234,456 234 Car
777 987,999,512 999 Bike