Why is the time complexity of performing n union find (union by size) operations O(n log n)?

30.3k Views Asked by At

In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.

To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.

In Union by size -> When performing a union, we make the root of smaller tree point to the root of the larger. This implies O(n log n) time for performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.

I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?

3

There are 3 best solutions below

0
On BEST ANSWER

Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?

If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).

Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.

The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.

Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.

0
On

You can do n union find (union by rank or size) operations with complexity O(n lg* n) where lg* n is the inverse Ackermann function using path compression optimization.

Note that O(n lg* n) is better than O(n log n)

In the question Why is the Ackermann function related to the amortized complexity of union-find algorithm used for disjoint sets? you can find details about this relation.

0
On

We need to prove that maximum height of trees is log(N) where N is the number of items in UF (1)

In the base case, all trees have a height of 0. (1) of course satisfied

Now assuming all the trees satisfy (1) we need to prove that joining any 2 trees with i, j (i <= j) nodes will create a new tree with maximum height is log(i + j)(2):

Because the joining 2 trees procedure gets root node of the smaller tree and attach it to the root node of the bigger one so the height of the new tree will be:

max(log(j), 1 + log(i)) = max(log(j), log(2i)) <= log(i + j) => (2) proved
  • log(j): height of new tree is still the height of the bigger tree
  • 1 + log(i): when height of 2 trees are the same

See the picture below for more details:

enter image description here

Ref: book Algorithms