Very different run times of munkres algorithm for datasets of same size

31 Views Asked by At

I have an implementation of the hungarian (munkres) algorithm through the Munkres python library. I have noticed some unexpected behaviour that I am not sure I can explain.

I run the algorithm twice on two datasets of the same size. The first time, this is on "real" data, i.e. a dataframe with values in it. The second time, I run a control, i.e. I feed a matrix of 0s to the algorithm. This is because I am using the algorithm to match an input to an output and calculating how often I can match the correct input to the correct output based on my cost function, and I want to know what the baseline "random" performance is.

I initally noticed that in the controls the assignment does not work, i.e. I cannot input a matrix of 0s as I always get [(1,1),(2,2),(3,3),...], whilst I want a random selection ([(1,1),(2,2),(3,3),...] is the correct assignment, so I was getting 100% performance just because of the matrix structure). To fix this, I add a matrix of random numbers from a uniform distribution (np.random.uniform(0,10**(-7), mx.ravel()).reshape(mx.shape)) to the matrix, so that I do not get the assignment only based on column and row orders. The numbers drawn from the uniform distribution are small enough that they should not impact the "real" assignment but should randomise the control. I then repeat the assignment 10 times.

Mostly, the algorithm is giving the results I expect. What surprises me is that it is about 3x slower on "real" assignment compared to control. What could drive this? Is it because (potentially) the real assignment is harder to find an optimum for?

0

There are 0 best solutions below