Get pairs of distinct values from two many to many columns

54 Views Asked by At

Two columns from the same table with many to many relation. In other words, ID 1 can possibly match with matchID A or B. Meanwhile matchID A can possibly match with ID 1, 2 or 3.

ID matchID
1 A
1 B
2 A
2 B
3 A
3 C

The output should have each ID linked with only one distinct matchID. For example, once ID 1 is linked with mathchID A, no other ID should match with matchID A.

I would like to get the following output:

ID matchID
1 A
2 B
3 C
2

There are 2 best solutions below

1
Jeffrey On
WITH RankedMatches AS (
  SELECT
    ID,
    matchID,
    ROW_NUMBER() OVER (PARTITION BY matchID ORDER BY ID) AS rn
  FROM
    your_table_name
)
SELECT
  ID,
  matchID
FROM
  RankedMatches
WHERE
  rn = 1;

This query assumes that you want to prioritize the minimum ID when there are multiple matches for a given matchID. If you want to prioritize a different column, you can adjust the ORDER BY clause within the ROW_NUMBER() function accordingly

0
T N On

Below is a recursive CTE (common table expression) based solution that will probe the possible assignments to find one that satisfies the uniqueness criteria.

The query will first identify all unique Key1 values and order them so that those with the fewest options are processed first. It will assign all possible mappings and then recursively repeat the process with each additional Key1 value. At each level, the available Key2 values are checked against a list of already-assigned values, and any matches are skipped. Any recursive branches that reach the final Key1 value and find an eligible Key2 mapping will be a considered a successful mapping.

A combined string value of the form "|Key1=Key2|Key1=Key2|...|" is used to store the working mapping as it is constructed down the levels of recursion. This string is used both to check of already-assigned Key2 values and to generate the final results once a successful complete mapping is selected.

There may be multiple possible mappings, but only one will be arbitrarily chosen as the final result. At that point the mappings are unpacked and selected as the result.

If no successful mapping is found, this query will quietly return no results.

Each level of recursion potentially multiplies the possibilities N times, where N is the number of available choices for the current key. This can lead to exponential usage of resources. Ideally, the best performance would occur with a depth-first recursion execution plan (as apposed to breadth-first), but SQL Server does not implement option to select that preferred behavior. Without that option, SQL Server might use either of the two. As a result, performance may be unacceptable for anything but very small data sets. If performance is an issue, I suggest performing the work in a more generalized programming environment.

DECLARE @Delim1 CHAR = '|' -- Choses character must not be present in the Data
DECLARE @Delim2 CHAR = '='

;WITH CTE_Key1List AS (
    -- Get a distinct list of keys. Order the keys by number of options descending,
    -- so that those keys with the fewest options will be processed first. (Later
    -- processing starts with the highest RowNum and words its way down.) This
    -- will hopefully reduce backtracking in a tight scenario.
    SELECT Key1, ROW_NUMBER() OVER(ORDER BY COUNT(*) DESC) AS RowNum
    FROM Data
    GROUP BY Key1
),
CTE_Solver AS (
    -- CTE Bootstrap:
    -- Working from MAX(RowNum) backwards, select the first set of
    -- candidate mappings for the selected Key1 value
    -- Build a map list of the form "|Key1=Key2|Key1=Key2|...|".
    SELECT
        K.RowNum,
        CAST(CONCAT(@Delim1, D.Key1, @Delim2, D.Key2, @Delim1) AS VARCHAR(MAX)) AS Mapping
    FROM CTE_Key1List K
    JOIN Data D ON D.Key1 = K.Key1
    WHERE K.RowNum = (SELECT MAX(RowNum) FROM CTE_Key1List)
    -- CTE recursion:
    -- For each lower RowNum value down to 1, include candidate mappings for
    -- the next Key1 value. Exclude any cases that attempt to reuse an
    -- already-assigned Key2 value.
    UNION ALL
    SELECT
        K.RowNum,
        CONCAT(S.Mapping, D.Key1, @Delim2, D.Key2, @Delim1) AS Mapping
    FROM CTE_Solver S
    JOIN CTE_Key1List K
        ON K.RowNum = S.RowNum - 1 
    JOIN Data D
        ON D.Key1 = K.Key1
        AND CHARINDEX(CONCAT(@Delim2, D.Key2, @Delim1), S.Mapping) = 0 -- Unused
    WHERE S.RowNum > 1
),
CTE_Result AS (
    SELECT R.Key1, R.Key2
    FROM (
        -- Get just one sucessful mapping out of possibly many
        SELECT TOP 1 TRIM(@Delim1 FROM S.Mapping) AS TrimmedMapping
        FROM CTE_Solver S
        WHERE S.RowNum = 1
    ) S
    -- Split the mapping list into individual Key1=Key2 pairs
    CROSS APPLY STRING_SPLIT(S.TrimmedMapping, @Delim1) M
    -- Locate the = delimiter and separate the two values.
    CROSS APPLY (SELECT CHARINDEX(@Delim2, M.value) SplitPos) P
    CROSS APPLY (
        SELECT
            Key1 = CAST(LEFT(M.value, P.SplitPos - 1) AS INT),
            Key2 = STUFF(M.value, 1, P.SplitPos, '')
    ) R
)
--SELECT * FROM CTE_Solver S                          -- All working steps (unordered)
--SELECT * FROM CTE_Solver S ORDER BY MapList         -- All working steps
--SELECT * FROM CTE_Solver S WHERE S.RowNum = 1       -- All solutions
--SELECT TOP 1 * FROM CTE_Solver S WHERE S.RowNum = 1 -- One solution
SELECT * FROM CTE_Result ORDER BY Key1                -- Solution details

Intermediate results can be viewed by selectively commenting and uncommenting the final SELECT statements.

Results:

Key1 Key2
1 A
2 B
3 C

See this db<>fiddle for a demo with some extra test data.