How do I return identical values between two string aggregated columns in SQL?

Question

How do I return identical values between two string aggregated columns in SQL?

89 Views Asked by Yosh07 At 04 September 2023 at 13:43

I have 2 columns A & B in a SQL Server table that contain a string aggregated list of codes.

The codes in the lists are separated by a semicolon.

The string aggregated lists can take any length and the codes can be in any order.

What I would like to do is to create a new column C which contains the codes that appear somewhere in the same row of the lists of Column A and Column B.

Example:

Column_A	Column_B	Column_C
a;b;c;d;e	c;a;e;i;k	c;a;e
d;e;f;g	e;h;i;j;d	e;d

The example above returns "c", "a" and "e" for the first row of column C, because these codes are present in the same row of both Column A and Column B.

The same for the second row, here "e" and "d" are overlapping in Column A and B and thus returned in Column C.

I have tried something that works, but it does not seem like the best solution in terms of efficiency and performance. Especially because I have many (1m+) rows to check this for and the length of code lists to compare can be very long.

SELECT 
    STRING_AGG(CAST([value] AS NVARCHAR(MAX)),'; ') AS Overlapping_Code 
FROM
    (SELECT a.value 
     FROM MyTable t 
     CROSS APPLY STRING_SPLIT(t.Column_A, ';') a

     INTERSECT

     SELECT b.value 
     FROM MyTable t 
     CROSS APPLY STRING_SPLIT(t.Column_B, ';') b 
) ab

I am looking for a better solution in terms of performance and elegance to compare the string aggregated lists for two columns across (many) rows.

Original Q&A

There are 4 best solutions below

geek45 On 04 September 2023 at 14:05

Try to use CHARINDEX to check which characters are in A and B.

SELECT
    Column_A,
    Column_B,
    (
        SELECT
            STRING_AGG(code, ';') WITHIN GROUP (ORDER BY code) AS common_codes
        FROM (
            SELECT DISTINCT value AS code
            FROM STRING_SPLIT(Column_A, ';')
            WHERE CHARINDEX(';' + value + ';', ';' + Column_B + ';') > 0
        ) AS common_codes
    ) AS Column_C
FROM yourdata;

Thom A On 04 September 2023 at 14:22

One method would be to split your strings, check if the value is in the other column's dataset and then reaggregate. This differs from your attempt as this doesn't reference the same table twice, just once. This isn't ideal and the order will be arbitrary, as you are on SQL Server 2019 (not 2022):

SELECT *
FROM (VALUES('a;b;c;d;e','c;a;e;i;k','c;a;e'),
            ('d;e;f;g','e;h;i;j;d','e;d'))V(A,B,C)
     CROSS APPLY (SELECT STRING_AGG(BB.value,';') AS C
                  FROM STRING_SPLIT(V.B,';') BB
                  WHERE EXISTS (SELECT 1
                                FROM STRING_SPLIT(V.a,';') AA
                                WHERE BB.value = AA.value))E;

Ideally, as I mentioned, you should be treating your data in a normalised form. I normalise your data using the sample you've given us, and then you can use an EXISTS to get the values that are in both sets:

SELECT *
INTO #YourBadData
FROM (VALUES(1,'a;b;c;d;e','c;a;e;i;k','c;a;e'),
            (2,'d;e;f;g','e;h;i;j;d','e;d'))V(ID,A,B,C);
GO
--Create normalised tables
CREATE TABLE dbo.TableA (ID int, Code char(1));
CREATE TABLE dbo.TableB (ID int, Code char(1));
CREATE TABLE dbo.RelatedIDs (A int, B int);
GO
INSERT INTO dbo.TableA (ID, Code)
SELECT YBD.ID,
       SS.Value
FROM #YourBadData YBD
     CROSS APPLY STRING_SPLIT(YBD.A,';') SS;
GO
INSERT INTO dbo.TableB (ID, Code)
SELECT YBD.ID,
       SS.Value
FROM #YourBadData YBD
     CROSS APPLY STRING_SPLIT(YBD.B,';') SS;

GO
INSERT INTO dbo.RelatedIDs (A,B) --This is overly simplified, as I assume A and B would likely
                                 --have different IDs and come from different sources
SELECT ID, ID
FROM dbo.#YourBadData; 
GO

SELECT B.ID,
       B.Code
FROM dbo.TableB B
WHERE EXISTS (SELECT 1
              FROM dbo.RelatedIDs RI
                   JOIN dbo.TableA A ON RI.A = A.ID
              WHERE RI.B = B.ID
                AND A.Code = B.Code);


GO
--Clean up
DROP TABLE #YourBadData;
DROP TABLE dbo.TableA;
DROP TABLE dbo.TableB;
DROP TABLE RelatedIDs

db<>fiddle

Yitzhak Khabinsky On 04 September 2023 at 14:39

Please try the following solution based on the set operator INTERSECT

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, colA VARCHAR(20), colB VARCHAR(20));
INSERT @tbl (colA, colB) VALUES
('a;b;c;d;e', 'c;a;e;i;k'),
('d;e;f;g', 'e;h;i;j;d');
-- DDL and sample data population, end

SELECT t.*,
    (
        SELECT STRING_AGG(value, ';') WITHIN GROUP (ORDER BY value) AS common_codes
        FROM (
            SELECT value
            FROM STRING_SPLIT(ColA, ';')
            INTERSECT
            SELECT value
            FROM STRING_SPLIT(ColB, ';')
        ) AS common_codes
    ) AS ColC
FROM @tbl as t;

Output

ID	colA	colB	ColC
1	a;b;c;d;e	c;a;e;i;k	a;c;e
2	d;e;f;g	e;h;i;j;d	d;e

**nvogel** · Accepted Answer · 2023-09-04T20:50:55.480000

Here is another possible solution:

CREATE TABLE MyTable (a VARCHAR(100) NOT NULL, b VARCHAR(100) NOT NULL /* PRIMARY KEY ?? */);

INSERT INTO MyTable VALUES ('a;b;c;d;e','c;a;e;i;k'),('d;e;f;g','e;h;i;j;d');

SELECT *,
    (SELECT STRING_AGG(x.value,';')
    FROM STRING_SPLIT(a,';') AS x
    , STRING_SPLIT(b,';') AS z
    WHERE x.value=z.value)
FROM mytable AS t;



a          b          c          
---------- ---------- -------
a;b;c;d;e  c;a;e;i;k  c;a;e
d;e;f;g    e;h;i;j;d  e;d

How do I return identical values between two string aggregated columns in SQL?

There are 4 best solutions below

Related Questions in SQL

Related Questions in SQL-SERVER

Related Questions in STRING-AGGREGATION

Related Questions in STRING-AGG

Trending Questions

Popular # Hahtags

Popular Questions