Why is this mixed integer program so inefficient to solve?

Question

Why is this mixed integer program so inefficient to solve?

I am trying to solve MIP using GLPK and CBC and no solver can find a solution efficiently. The GLPK solver journal shows that it quickly finds a solution that is within 0.1% of the true optimum, but then forever tries to find that true optimum.

I know I can use an argument miptol

to set the tolerance - my question is, is the problem with this problem causing the solver to be so inefficient as to find the true optimum? I regularly tackle versions of this problem with slightly different inputs, and they are resolved in less than a second.

Here is my problem spec file that can be run in GLPK with glpsol --cpxlp anonymizedlp.lp

.

And below - this is from GLPK log - you will see that a near-optimal MIP solution is found within 54K iterations and then the branch tree starts growing and growing:

GLPSOL: GLPK LP/MIP Solver, v4.61
Parameter(s) specified in the command line:
 --cpxlp /var/folders/g6/fs71g8j575v4_whqythqw7h40000gn/T/11446-pulp.lp -o
 /var/folders/g6/fs71g8j575v4_whqythqw7h40000gn/T/11446-pulp.sol
Reading problem data from '/var/folders/g6/fs71g8j575v4_whqythqw7h40000gn/T/11446-pulp.lp'...
664 rows, 781 columns, 2576 non-zeros
443 integer variables, 338 of which are binary
3195 lines were read
GLPK Integer Optimizer, v4.61
664 rows, 781 columns, 2576 non-zeros
443 integer variables, 338 of which are binary
Preprocessing...
213 constraint coefficient(s) were reduced
524 rows, 652 columns, 2246 non-zeros
318 integer variables, 213 of which are binary
Scaling...
 A: min|aij| =  1.282e-01  max|aij| =  1.060e+07  ratio =  8.267e+07
GM: min|aij| =  7.573e-01  max|aij| =  1.320e+00  ratio =  1.744e+00
EQ: min|aij| =  6.108e-01  max|aij| =  1.000e+00  ratio =  1.637e+00
2N: min|aij| =  3.881e-01  max|aij| =  1.355e+00  ratio =  3.491e+00
Constructing initial basis...
Size of triangular part is 524
Solving LP relaxation...
GLPK Simplex Optimizer, v4.61
524 rows, 652 columns, 2246 non-zeros
      0: obj =  -0.000000000e+00 inf =   2.507e+02 (195)
    238: obj =  -5.821025664e+06 inf =   0.000e+00 (0)
*   363: obj =  -7.015287886e+04 inf =   0.000e+00 (0) 1
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
+   363: mip =     not found yet <=              +inf        (1; 0)
+  8606: mip =     not found yet <=  -7.015289436e+04        (8239; 0)
+ 13027: mip =     not found yet <=  -7.015289436e+04        (12660; 0)
+ 16367: mip =     not found yet <=  -7.015289436e+04        (16000; 0)
+ 19135: mip =     not found yet <=  -7.015289436e+04        (18768; 0)
+ 21523: mip =     not found yet <=  -7.015289436e+04        (21156; 0)
+ 23667: mip =     not found yet <=  -7.015289436e+04        (23300; 0)
+ 25415: mip =     not found yet <=  -7.015289436e+04        (25048; 0)
+ 27260: mip =     not found yet <=  -7.015289436e+04        (26893; 0)
+ 29055: mip =     not found yet <=  -7.015289436e+04        (28688; 0)
+ 30770: mip =     not found yet <=  -7.015289436e+04        (30403; 0)
+ 32362: mip =     not found yet <=  -7.015289436e+04        (31995; 0)
Time used: 60.0 secs.  Memory used: 14.1 Mb.
+ 33876: mip =     not found yet <=  -7.015289436e+04        (33509; 0)
+ 35338: mip =     not found yet <=  -7.015289436e+04        (34971; 0)
+ 36721: mip =     not found yet <=  -7.015289436e+04        (36354; 0)
+ 38080: mip =     not found yet <=  -7.015289436e+04        (37713; 0)
+ 39372: mip =     not found yet <=  -7.015289436e+04        (39005; 0)
+ 40601: mip =     not found yet <=  -7.015289436e+04        (40234; 0)
+ 41837: mip =     not found yet <=  -7.015289436e+04        (41470; 0)
+ 43036: mip =     not found yet <=  -7.015289436e+04        (42669; 0)
+ 44192: mip =     not found yet <=  -7.015289436e+04        (43825; 0)
+ 45350: mip =     not found yet <=  -7.015289436e+04        (44983; 0)
+ 46374: mip =     not found yet <=  -7.015289436e+04        (46007; 0)
+ 47281: mip =     not found yet <=  -7.015289436e+04        (46914; 0)
Time used: 120.0 secs.  Memory used: 20.4 Mb.
+ 48220: mip =     not found yet <=  -7.015289436e+04        (47853; 0)
+ 49195: mip =     not found yet <=  -7.015289436e+04        (48828; 0)
+ 50131: mip =     not found yet <=  -7.015289436e+04        (49764; 0)
+ 51110: mip =     not found yet <=  -7.015289436e+04        (50743; 0)
+ 52131: mip =     not found yet <=  -7.015289436e+04        (51764; 0)
+ 53092: mip =     not found yet <=  -7.015289436e+04        (52725; 0)
+ 53901: >>>>>  -7.015398607e+04 <=  -7.015289436e+04 < 0.1% (53532; 0)
+ 61014: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (57365; 3302)
+ 64785: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (61136; 3302)
+ 67671: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (64022; 3302)
+ 70330: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (66681; 3302)
+ 72613: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (68964; 3302)
+ 74731: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (71082; 3302)
Time used: 180.0 secs.  Memory used: 29.9 Mb.
+ 76685: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (73036; 3302)
+ 78508: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (74859; 3302)
+ 80220: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (76571; 3302)
+ 81852: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (78203; 3302)
+ 83368: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (79719; 3302)
+ 84815: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (81166; 3302)
+ 86126: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (82477; 3302)
+ 87358: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (83709; 3302)
+ 88612: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (84963; 3302)
+ 89821: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (86172; 3302)
+ 90989: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (87340; 3302)
+ 92031: mip =  -7.015398607e+04 <=  -7.015290061e+04 < 0.1% (88382; 3302)

+3

linear-programming glpk integer-programming

thomaskeefe May 26 '17 at 18:42

source to share

2 answers

Theory

Even 0-1 integer programming is NP-hard , which basically means there is no efficient algorithm (for a common problem) if P=NP

.

What does this mean for your problem:

This means that there are problems that can be modeled as MIPs, contain only 100 variables or less, and your solver is unable to solve it (up to optimal). Let me be clear: for every MIP solver you give me, I can build an instance with maybe 100 variables that your solver cannot solve (this can be done for every problem that is NP-hard, although we are talking about general integer programming).

Approaching NP-hard problems is the use of assumptions about your problem and your data to trim the exponentially large search space (which has to be traversed on each NP-hard problem). But: P!=NP

means there is no algorithm that can do this (using problem-specific things) for all problems in general (partially related: No free lunch theorem )! Corollary: All good MIP solvers are built to solve common problems that are important to many people and where because of these good and useful heuristics (like cutting planes) have been developed.

So now we know that all successful MIP heuristic solvers are tuned to more quickly solve more common problems and may inefficiently handle other problems (again: no free lunch theorem). It doesn't go away when you consider the above assumptions. Trying to use different solvers and tweaking different parameters may help (exagherated: different parameters = different solvers)!

There is at least one more thing to say:

Even if we restrict ourselves to, say, a simple bean packing problem, there are simple instances and rigid instances. Some of them will be resolved instantly, while others will never stop. It is very difficult to analyze which examples are difficult and which are not. But these differences contribute to a possibly very high variance with respect to solution time, which is a natural consequence of NP-hardness!

There are several interesting (statistical physics-based) studies of the SAT problem where researchers have discovered and quantified the phase transition phenomenon that tells us which problems (in terms of variable / sentence ratios) are difficult with respect to random formulas.

(Some introductory blog entries covering some of these: SAT Solvers: SAT Hard or Easy?

Practice (remarks only)

Commercial solvers like Gurobi and CPLEX are much more efficient than CBC and co. (and CBC is already a very good solver) and both provide free licenses for academic work. But they are experiencing the same problems mentioned in this post.

In terms of parameters, it should be noted that parameters in general can be set to:

find a good solution quickly (high-frequency heuristic)
get good grades (high frequency cutting planes)
prove optimality (I believe the hf cutting planes again, but I'm not sure)

This kind of customization is explained in this excellent article, "A Practical Guide to Solving Difficult Mixed Whole Linear Programs," and you should read that.

It is also possible to check complete or incomplete methods (example in the SAT world: DPLL / CDCL vs. Stochastic search) to solve optimization problems, to understand why some settings are better at making certain progress = get a better target while others prove better that we have reached a minimum.

+3

sascha May 26 '17 at 19:26

source to share

mattmilten · Accepted Answer · 2017-05-29T16:15:12+0000

SCIP is able to solve the problem in a split second. Three heuristics (locks, shifts, and one-shot) give all the better solutions until double-binding hits. It is solved at the root node and without any cutting planes.

Without a preset that removes half of the constraints and half of the binary variables, SCIP takes a little longer to resolve. It is still solved at the root node with only very few cutting planes added, and the same heuristics find 3 integer feasible solutions including the optimal one.

While this does not answer your question as to why , this problem is difficult for GLPK or CBC, it is at least not easy for SCIP, which is also open source and free for scientists. Most likely, one of the heuristics is not implemented in other solvers, because disabling heuristics in SCIP makes it difficult to find a solution - forking simply does not solve this problem.

You must have the correct heuristic.

Why is this mixed integer program so inefficient to solve?

Theory

Practice (remarks only)

More articles: