What I have: an equivalence relation on the set {1,...,n}
, given as the full list of equivalent pairs (a,b)
(this is already transitive; no need to compute a transitive closure). n
is large (say, a few millions at most); the total number of equivalent pairs is at most O(n)
(and usually much less), and individual equivalence classes are small (I don't have a figure for this, but say O(log n)
).
What I want: a way to individually enumerate all equivalence classes, i.e. successively get the full set of elements of each equivalence class; with complexity at most O~(n)
(all other computations performed in my code are O(n log(n))
and I definitely don't want to do anything quadratic on that size if possible).
Using an union-find structure would help in constructing the transitive closure (which I don't need) but, I believe, not in constructing the equivalence classes. I can also easily compute (say) the smallest representative in each class (just scan the full equivalence relation until I find it) and even store all of them in time O(n)
(build an identity vector of 1...n then scan the full equivalence relation, replacing by the smallest value if possible); however, this also does not help much in enumerating the equivalence classes.
Does there exist a classical solution to this problem?
You can think of this as a graph search problem. You’re given a list of all the edges that make up a graph, and you want to find its connected components. If you had an adjacency list for the graph, this could be done with a DFS or BFS in time O(n), since there are n nodes and O(n) total edges.
Fortunately, you can construct an adjacency list in time O(n). Make an array of n lists in time O(n). Then, iterate across the edges and for each edge (a, b), append b to list number a.
Overall, this lets you find all the equivalence classes (connected components) in time O(n). This works even if the relation isn’t transitive.