Fixing Karger's min cut algorithm with union-find data structure

581 Views Asked by At

I was trying to implement Karger's min cut algorithm in the same way it is explained here but I don't like the fact that at each step of the while loop we can pick an edge with it's two endpoints already in a supernode. More specifically, this part

// If two corners belong to same subset, 
// then no point considering this edge 
       if (subset1 == subset2) 
         continue; 

Is there a quick fix for avoiding this problem?

1

There are 1 best solutions below

5
On BEST ANSWER

It might help to back up and think about why there’s a union-find structure here at all and why it’s worth improving on the continue statement.

Conceptually, each contraction performed changes the graph in the following way:

  1. The nodes contracted get replaced with a single node.
  2. The edges incident to either node get replaced with an edge to the new joint node.
  3. The edges running between the two earlier nodes get removed.

The question, then, is how to actually do this to the graph. The code you’ve found does this lazily. It doesn’t actually change the graph when the contraction is done. Instead, it uses the union-find structure to show which nodes are now equivalent to one another. When it samples a random edge, it then has to check whether that edge is one of the ones that would have been deleted in step (3). If so, it skips it and moves on. This has the effect that early contractions are really fast (the likelihood of picking two edges that are part of contracted nodes is very low when few edges are contracted), but later contractions might be a lot slower (once edges have started being contracted, lots of edges may have been deleted).

Here’s a simple modification you can use to speed this step up. Whenever you pick an edge to contract and find that its endpoints are already connected, discard that edge, and remove it from the list of edges so that it never gets picked again. You can do this by swapping that edge to the end of the list of edges, then removing the last element of the list. This has the effect that every edge processed will never be seen again, so across all iterations of the algorithm every edge will be processed at most once. That gives a runtime of one randomized contraction phase as O(m + nα(n)), where m is the number of edges and n is the number of nodes. The factor of α(n) comes from the use of the union-find structure.

If you truly want to remove all semblance of that continue statement, an alternative approach would be to directly simulate the contraction. After each contraction, iterate over all m edges and adjust each one by seeing whether it needs to remain unchanged, point to the new contracted node, or be removed altogether. This will take time O(m) per contraction for a net cost of O(mn) for the overall min cut calculation.

There are ways to speed things up beyond this. Karger’s original paper suggests generating a random permutation of the edges and using binary search over that array with a clever use of BFS or DFS to find the cut produced in time O(m), which is slightly faster than the O(m + nα(n)) approach for large graphs. The basic idea is the following:

  1. Probe the middle element of the list of edges.
  2. Run a BFS on the graph formed by only using those edges and see if there are exactly two connected components.
  3. If so, great! Those two CCs are the ones you want.
  4. If there is only one CC, discard the back half of the array of edges and try again.
  5. If there is more than one CC, contract each CC into a single node and update a global table indicating which CC each node belongs to. Then discard the first half of the array and try again.

The cost of each BFS is O(m), where m is the number of edges in the graph, and this gives the recurrence T(m) = T(m/2) + O(m) because at each stage we’re throwing away half of the edges. That solves to O(m) total time, though as you can see, it’s a much trickier way of coding this algorithm up!

To summarize:

  1. With a very small modification to the provided code, you can keep the continue statement in but still have a very fast implementation of the randomized contraction algorithm.

  2. To eliminate that continue without sacrificing the runtime of the algorithm, you need to do some major surgery and change approaches to something only marginally asymptotically faster than keeping the continue in.

Hope this helps!