SQL Server: Most efficient way to look for set based on 2 columns

71 Views Asked by At

Edit: I realized I asked the question wrong. What I really meant to ask was, "given a set of values for [Column B], check if there is a value in [Column A] that matches all of the set and no others."


I'm not sure if what I want to do has an official name, so searching for my question has been difficult.

Given a table with 2 columns and a list of values, I want to see if this combination (and only this combination) exists in the table.

For instance, given this table (assume it's the whole table):

|----------|----------|
| Column A | Column B |
|----------|----------|
| 12345    | abcde    |
|----------|----------|
| 12345    | xyz      |
|----------|----------|
| 12345    | abcd     |
|----------|----------|
| 12345    | abbba    |
|----------|----------|

And given this input parameter:

Declare @columnBs Varchar(Max) = '["abcde","xyz","abcd","abbba"]';

For that set, I want to return 12345. So, basically, I want to run a check and see if any value in [Column A] matches all of the values in [Column B] to all of the values in @columnBs AND NO OTHER VALUES.

Without a value for [Column A] as a starting point, I'm having trouble even conceiving of a long-form solution.


If it helps to conceptualize this better, this is a solution for messaging where:

  • [Column A] represents the thread's primary key
  • [Column B] represents a user assigned to the thread

So, if a new message comes in for a set of users, I want to see whether there is an existing thread for all of the users supplied by @columnBs and no other users.

2

There are 2 best solutions below

1
On BEST ANSWER

I read this that you need to find a value in ColumnA that corresponds to exactly, no more and no less, the same values in your query. So, you need to join the search values, ensure all of them exist for a single ColumnA, and then ensure that no more exist. You could do it by cross joining them, but for larger sets of data, that would have awful performance. This may be a little better:

-- Set up the source data.
create table MyTable (
        ColumnA int,
        ColumnB nvarchar(max)
    )
insert MyTable
    (ColumnA, ColumnB)
    Values
    -- Value 1 contains exactly the same values as we'll be looking for
    (1, 'abcde'),
    (1, 'xyz'),
    (1, 'abcd'),
    (1, 'abbba'),
    -- Value 2 contains all of them, plus one more different value
    (2, 'abcde'),
    (2, 'xyz'),
    (2, 'abcd'),
    (2, 'abbba'),
    (2, 'xxxxx'),
    -- Value 3 contains one less value
    (3, 'abcde'),
    (3, 'xyz'),
    (3, 'abcd'),
    -- Value 4 contains one different value
    (4, 'abcde'),
    (4, 'xyz'),
    (4, 'abcd'),
    (4, 'xxxxxxxxx')


-- These are the values we are looking for:
create table #searchValues (
            value nvarchar(max)
        )
insert #searchValues
    (value) values
    ('abcde'),
    ('xyz'), 
    ('abcd'), 
    ('abbba')

declare @valueCount int = (select COUNT(*) from #searchValues)


select  t.ColumnA
    from (
        -- This inner query finds all ColumnA values
        -- that link to all of the specified ColumnB values.
        select  tab.ColumnA
            from #searchValues t
            join MyTable tab on
                t.value = tab.ColumnB
            group by tab.ColumnA
            having COUNT(*) = @valueCount
        ) x
    -- And this outer join and group by will filter out those that 
    -- have all the values, plus some more.
    join MyTable t on
        t.ColumnA = x.ColumnA
    group by t.ColumnA
    having COUNT(*) = @valueCount



drop table #searchValues
drop table MyTable

This will produce just the value 1 as a result, because it matches exactly.

1
On

use STRING_SPLIT() if you are on SQL Server 2016 and later. If you use an earlier version of SQL Server, you can use this http://www.sqlservercentral.com/articles/Tally+Table/72993/ to parse the CSV into column wise result list

declare @table table
(
    ColumnA int,
    ColumnB varchar(6)
)

insert into @table select 12345, 'abcde'
insert into @table select 12345, 'xyz'
insert into @table select 12345, 'abcd'
insert into @table select 12345, 'abbba'

declare @columnBs varchar(max) = 'abcde,xyz,abcd,abbba';

; with cte as
(
    select  value, cnt = count(*) over()
    from    string_split(@columnBs, ',')
)
select  ColumnA
from    cte c
    inner join @table t on  c.value = t.ColumnB
group by ColumnA
having count(*) = max(c.cnt)