In SQL, create pairs based on a grouping and count their frequency

725 Views Asked by At

What I want is to create DISTINCT pairs of column 2 for each ID and arrange them by count.
Let's use this table as an example:

CREATE TABLE mytable
    (`ID` int, `C2` varchar(1), `C3` varchar(2))
;
    
INSERT INTO mytable
    (`ID`, `C2`, `C3`)
VALUES
    (1, 'A',' a1'),
    (1, 'B', 'b1'),
    (2, 'A', 'a2'),
    (3, 'A', 'a3'),
    (3, 'C', 'c3'),
    (3, 'A', 'a4'),
    (4, 'A', 'a1'),
    (4, 'B', 'b4'),
    (4, 'A', 'a2'),
    (4, 'D', 'd1');

For 1, pair would be A-B.
For 2, one wouldn't exist.
For 3, pair would be A-C.
For 4, pairs would be A-B, A-D, and B-D.

So the output would be:

| Pair | Cnt |
| A-B  | 2   |
| A-C  | 1   |
| A-D  | 1   | 
| B-D  | 1   |

Is this something we can do in SQL using something like GROUP_CONCAT?
I've been wrapping my head around this problem for days and still can't think of a simple solution.

Thanks!

2

There are 2 best solutions below

1
Gordon Linoff On BEST ANSWER

I think this is a self-join and count distinct. One method is:

select t1.c2, t2.c2, count(distinct t1.id) as cnt
from t t1 join 
     t t2
     on t1.id = t2.id and t1.c2 < t2.c2
group by t1.c2, t2.c2
order by cnt desc;

Depending on your data, it might be more efficient to remove duplicates first and then join:

with tt as (
      select distinct t.id, t.c2
      from t
     )
select t1.c2, t2.c2, count(t1.id) as cnt
from tt t1 join 
     tt t2
     on t1.id = t2.id and t1.c2 < t2.c2
group by t1.c2, t2.c2
order by cnt desc;
1
GMB On

You can self-join and aggregate:

select t1.c2 c21, t2.c2 c22, count(distinct t1.id) cnt 
from mytable t1
inner join mytable t2
    on  t1.id = t2.id
    and t1.c2  < t2.c2
group by t1.c2, t2.c2