Why is this mixed integer program so inefficient to solve?

3.7k Views Asked by At

I'm trying to solve a MIP using GLPK and CBC, and neither solver can efficiently find the solution. The GLPK solver log shows that it quickly finds a solution that is within 0.1% of the true optimum, but then takes forever trying to find that true optimum.

I know I can use the miptol arg to set a tolerance -- my question is, what about this problem causes the solver to be so inefficient finding the true optimum? I routinely solve versions of this problem with slightly different inputs and they solve in less than a second.

Here is a file with my problem specification which can be run in GLPK with glpsol --cpxlp anonymizedlp.lp.

And below is some of the GLPK log -- you'll see that a near optimal MIP solution is found within 54K iterations, and then the branch tree just starts growing and growing:

GLPSOL: GLPK LP/MIP Solver, v4.61
Parameter(s) specified in the command line:
 --cpxlp /var/folders/g6/fs71g8j575v4_whqythqw7h40000gn/T/11446-pulp.lp -o
 /var/folders/g6/fs71g8j575v4_whqythqw7h40000gn/T/11446-pulp.sol
Reading problem data from '/var/folders/g6/fs71g8j575v4_whqythqw7h40000gn/T/11446-pulp.lp'...
664 rows, 781 columns, 2576 non-zeros
443 integer variables, 338 of which are binary
3195 lines were read
GLPK Integer Optimizer, v4.61
664 rows, 781 columns, 2576 non-zeros
443 integer variables, 338 of which are binary
Preprocessing...
213 constraint coefficient(s) were reduced
524 rows, 652 columns, 2246 non-zeros
318 integer variables, 213 of which are binary
Scaling...
 A: min|aij| =  1.282e-01  max|aij| =  1.060e+07  ratio =  8.267e+07
GM: min|aij| =  7.573e-01  max|aij| =  1.320e+00  ratio =  1.744e+00
EQ: min|aij| =  6.108e-01  max|aij| =  1.000e+00  ratio =  1.637e+00
2N: min|aij| =  3.881e-01  max|aij| =  1.355e+00  ratio =  3.491e+00
Constructing initial basis...
Size of triangular part is 524
Solving LP relaxation...
GLPK Simplex Optimizer, v4.61
524 rows, 652 columns, 2246 non-zeros
      0: obj =  -0.000000000e+00 inf =   2.507e+02 (195)
    238: obj =  -5.821025664e+06 inf =   0.000e+00 (0)
*   363: obj =  -7.015287886e+04 inf =   0.000e+00 (0) 1
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
+   363: mip =     not found yet <=              +inf        (1; 0)
+  8606: mip =     not found yet <=  -7.015289436e+04        (8239; 0)
+ 13027: mip =     not found yet <=  -7.015289436e+04        (12660; 0)
+ 16367: mip =     not found yet <=  -7.015289436e+04        (16000; 0)
+ 19135: mip =     not found yet <=  -7.015289436e+04        (18768; 0)
+ 21523: mip =     not found yet <=  -7.015289436e+04        (21156; 0)
+ 23667: mip =     not found yet <=  -7.015289436e+04        (23300; 0)
+ 25415: mip =     not found yet <=  -7.015289436e+04        (25048; 0)
+ 27260: mip =     not found yet <=  -7.015289436e+04        (26893; 0)
+ 29055: mip =     not found yet <=  -7.015289436e+04        (28688; 0)
+ 30770: mip =     not found yet <=  -7.015289436e+04        (30403; 0)
+ 32362: mip =     not found yet <=  -7.015289436e+04        (31995; 0)
Time used: 60.0 secs.  Memory used: 14.1 Mb.
+ 33876: mip =     not found yet <=  -7.015289436e+04        (33509; 0)
+ 35338: mip =     not found yet <=  -7.015289436e+04        (34971; 0)
+ 36721: mip =     not found yet <=  -7.015289436e+04        (36354; 0)
+ 38080: mip =     not found yet <=  -7.015289436e+04        (37713; 0)
+ 39372: mip =     not found yet <=  -7.015289436e+04        (39005; 0)
+ 40601: mip =     not found yet <=  -7.015289436e+04        (40234; 0)
+ 41837: mip =     not found yet <=  -7.015289436e+04        (41470; 0)
+ 43036: mip =     not found yet <=  -7.015289436e+04        (42669; 0)
+ 44192: mip =     not found yet <=  -7.015289436e+04        (43825; 0)
+ 45350: mip =     not found yet <=  -7.015289436e+04        (44983; 0)
+ 46374: mip =     not found yet <=  -7.015289436e+04        (46007; 0)
+ 47281: mip =     not found yet <=  -7.015289436e+04        (46914; 0)
Time used: 120.0 secs.  Memory used: 20.4 Mb.
+ 48220: mip =     not found yet <=  -7.015289436e+04        (47853; 0)
+ 49195: mip =     not found yet <=  -7.015289436e+04        (48828; 0)
+ 50131: mip =     not found yet <=  -7.015289436e+04        (49764; 0)
+ 51110: mip =     not found yet <=  -7.015289436e+04        (50743; 0)
+ 52131: mip =     not found yet <=  -7.015289436e+04        (51764; 0)
+ 53092: mip =     not found yet <=  -7.015289436e+04        (52725; 0)
+ 53901: >>>>>  -7.015398607e+04 <=  -7.015289436e+04 < 0.1% (53532; 0)
+ 61014: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (57365; 3302)
+ 64785: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (61136; 3302)
+ 67671: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (64022; 3302)
+ 70330: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (66681; 3302)
+ 72613: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (68964; 3302)
+ 74731: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (71082; 3302)
Time used: 180.0 secs.  Memory used: 29.9 Mb.
+ 76685: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (73036; 3302)
+ 78508: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (74859; 3302)
+ 80220: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (76571; 3302)
+ 81852: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (78203; 3302)
+ 83368: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (79719; 3302)
+ 84815: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (81166; 3302)
+ 86126: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (82477; 3302)
+ 87358: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (83709; 3302)
+ 88612: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (84963; 3302)
+ 89821: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (86172; 3302)
+ 90989: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (87340; 3302)
+ 92031: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (88382; 3302)
2

There are 2 best solutions below

0
On BEST ANSWER

SCIP is able to solve the problem within a fraction of a second. Three heuristics (locks, shifting and oneopt) produce increasingly good solutions until the dual bound is hit. It's solved in the root node and without any cutting planes.

Without presolving, which removes half of the constraints and half of the binary variables, SCIP needs a bit longer to solve it. It's still solved in the root node with only very few cutting planes added and the same heuristics find the 3 integer feasible solutions, including the optimal one.

Although this does not answer your question regarding why this problem is hard for GLPK or CBC, it's at least not hard for SCIP, which is also open source and free for academics. Most likely one of the heuristics is not implemented in the other solvers, because disabling heuristics in SCIP makes it much harder to find the solution - branching simply doesn't solve this problem.

You need to have the right heuristic.

0
On

Theory

Even 0-1 integer-programming is NP-hard, which basically means, there is no efficient algorithm (for the general problem), unless P=NP.

What does that mean for your problem:

It means, that there are problems, which can be modelled as MIP, contain only 100 Variables and less and your solver is unable to solve it (to optimum). Let me be more clear: for every MIP-solver you give me, i can construct an instance with maybe 100 variables, which your solver cannot solve (this can be done for every problem which is NP-hard, although we are talking about general integer-programming here).

Approaching NP-hard problems is all about using assumptions about your problem and your data, to be able to prune away the exponentially big search-space (which needs to be traversed for every NP-hard problem). But: P!=NP means, that there is no algorithm which can do that (taking advantage of problem-specific stuff) for all problems in general (partially related: No free lunch theorem)! The consequence: all good MIP-solvers are built to solve common problems, which are important for many people and where because of this good and helpful heuristics (e.g. cutting-planes) were developed.

So now we know, that all successful MIP-solvers are heuristics, tuned to solve more common problems faster and may fail spectacularly on other problems (again: No Free Lunch Theorem). This won't go away given above assumptions. Trying different solvers and tuning different parameters can help (exagerrated: different parameters = different solvers)!

There is at least one more thing to say:

Even if we restrict ourselves to, let'say, a simple bin-packing problem, there are easy instances and hard instances. Some of these will be solved instantly, while others will never stop. It's very difficult to analyze which instances are hard and which are not. But these differences effect in a possibly very high variance in regards to solving-time, which is a natural consequence of NP-hardness!

There exists some interesting (statistical-physics based) research about the SAT-problem, where researchers discovered and quantified a phase-transition phenomenon, which tells us, what problems (in terms of variable/clause ratio) are hard in regards to random-formulas.

(Some introductory blog-entry covering some of this: SAT Solvers: Is SAT Hard or Easy?

Practice (only remarks)

Commercial solvers like Gurobi and CPLEX are much much more powerful than CBC and co. (and CBC is already a very good solver) and they both provide free licenses for academic-work. But they experience the same problems mentioned in this post.

In terms of parameters, one should mention, that parameters in general may be tuned to:

  • find a good solution quickly (high-frequency of heuristics)
  • get good bounds (high-frequency of cutting-planes)
  • prove optimum (i assume h-f of cutting-planes again, but i'm not sure)

These kind of tunings are explained within this excellent paper: "Practical Guideline for Solving Difficult Mixed Integer Linear Programs and you should read it.

Maybe also checkout complete vs. incomplete methods (example in SAT-world: DPLL/CDCL vs. Stochastic search) to tackle optimization problems to understand, why some tunings are better at making some progress = get better objective, while others are better at proving we reached the minimum.