A tutorial on using Column Generation to solve MIP problems (Part 1)
Mixed Integer Problems (MIP) are central to solving most of the decision-making problems both in academia and industry. Although being used in the majority of decision-making problem formulations, MIPs are notorious for requiring high computation power and time to solve for practical-sized problems. For sure, recent advancements in solvers like Gurobi, FICO, Cplex, etc. have made cutting edge improvements in solving time for MIPs but the exponential nature of the problem makes it difficult to scale. As such the use of advanced decomposition algorithms like Column Generation (CG) can be useful. In this series of articles, we will look at the use of column generation to solve some of the most frequently encountered problems in decision making.
This article can be helpful to you if:
- You have used MIPs before to solve your problems. You have proof of concept models but when you try to solve a larger problem the solving time or memory crosses the roof.
- You have tried to learn more about CG but always get stuck halfway through when you want to implement it. You start reading about CG but almost 90% of the literature out there explains about “Cutting Stock Problem” and you have no clue how to fit it into your problem.
- You get stuck when the sub-problem kicks in. Most literature assumes you already know about the sub-problem but it is hard to find a proper tutorial on those materials.
- Last but not least, you don’t have access to commercial solvers and open-source solvers can’t solve your problem beyond the small examples. Your institution doesn’t want to invest in a solver or it is your personal project and you want to get your product out using the open-source solvers.
Although this article doesn’t intend to introduce all the theoretical knowledge related to CG, the intention is to make readers able to apply CG technique to their problems on their own. The concepts are introduced with some of the most frequently encountered problems like crew scheduling and vehicle routing problem. The codes are also accompanied. I plan to use the SCIP as the solver for linear and MIP problems and use C++ or Python (undecided as of now, will go with the popular result from readers, so please leave a comment) as the programming language. The article will be a series rather than a single one so that readers get to breathe before gulping all of it at once. Also, it will help me to mold the articles as required so that the concepts will be more clear.
CG applicable problems frequently used in industries
Some of the most frequently encountered problems in my career that are suitable for using CG are airline crew scheduling, aircraft routing and maintenance scheduling, and vehicle routing problems to name a few. Most of these problems can be formulated as a set covering or partitioning problem. (There are also other formulations for example two/three indexed formulation for vehicle routing problems. But those formulations have weaker bounds of the problem and yada yada yada. Don’t worry about these mathematical jargons, as the intention of these articles is to avoid such for now. They will be the topic of discussions for separate or later articles).
These problems have something in common, i.e. their structure. The problem can be decomposed into two problems. It is better to explain this with the help of an example. Let's take a simple Vehicle Routing Problem (VRP)for example. We can consider any variant of the VRP for example, VRP with Time Windows, VRP with Pickup and Delivery etc. but for simplicity let focus on just VRP for now (I promise, unlike most of other literature and tutorial we will look at more complex problems but as a building block lets focus on a simple problem for now).
Towards Master Problem
If you think about VRP formulation, the first decision you will think about is should I go from point A to B to C or should I go from A to C to B. Obviously, you want to minimize the travel time or distance or satisfy some other constraint. The natural instinct is to formulate the problem as a two-index problem where the decision variable is to whether consider going from A to B or not. But let us think this way for a while. Let us consider, someone provides you with a set of paths (for our purpose, a path is a series of points that a vehicle visits). Now you have all or some of these paths and your only job is to find the best subset of these paths that will minimize your objective value. Your problem reduces to the following:
where I is the set of the path that someone provided you. A path is of the following structure {A, B, D, E} which signifies that a vehicle visits point A, B, D, and E in that order by a single-vehicle. I consists of more of these types of paths along with the cost of each path (cost is denoted by c_i for path i in the formulation). Another set J consists of all points required to be visited. Equation 2 states that each of the points should be included in at least one vehicle's path. The d_ij is an incidence relationship specifying if or not i is visited in path j. Equation 3 puts an upper bound on the number of vehicles. The decision variable x_i takes a non-negative value. The objective function (1) minimizes the cost of all vehicle paths. (If you view the problem in matrix form, each path represents the column in a matrix formed by the problem. Each constraint forms a row).
Few things to pay attention to here are, the decision variables are not binary as your instinct would suggest. The reason being we are only solving the linear relaxation of the binary problem here. The advantage of solving the linear relaxation is twofold. First, linear problems unlike integer problems have strong duality property (Yes, duality. Another mathematical jargon, but this is a very important property that we cannot overlook. Nevertheless, we can certainly not go into depth and only surfacely look at how to utilize this property). The duality property helps our problem to be guided towards optimality. Second, the open-source solvers are almost as good as commercial solvers in solving linear problems, unlike integer problems.
Another thing to pay attention to in the formulation is that we only explicitly specify the decision variable x_i to be non-negative. No explicit constraint to limit it from 0 to 1 has been specified as one would think in the natural lineralization of the binary problem. This is because constraints (4) in combination with constraints (2) and (1) will satisfy this bound helping us to get better dual values. I found this the hard way when I coded my first CG algorithm and want to emphasize it here.
Towards Subproblem
Now the problem looks easier to solve. Right, but only when that “someone” who provided the paths (let's call paths as columns from now to be general) did his job right. In reality that “someone” is no other than you. And if you ponder for a while, the set of columns can be enormous for a decent number of points to be visited. In VRP, if there are no constraints of time windows, every point is accessible for all other points. Therefore, for a mere 10 points, the number of possible columns is over 1000. Increase the points to 15 and you end up with more than 32,000 columns. This is the power of exponent and a curse for this problem.
The creation of the columns in this problem is the sub-problem. Solving a sub-problem in CG is a very important skill or art and the efficiency of the CG depends upon the art of solving the sub-problem. For most of the problems, the sub-problem is a graph-based network problem and involves finding the shortest path in the graph. However, popular algorithms like Dijkstra and Bellman-Ford may be unsuitable due to negative arc cost and negative cost cycles. In such, you will need to come up with tailored shortest path problems (resource-constrained shortest path problems). These problems also tend to require exponential time for the solution so heuristics to solve sub-problem can be helpful and important.
Putting master problem and subproblem together
We saw two problems in isolation, now let us reinforce our knowledge by putting the two problems together. First, we said the number of columns for VRP can be exponential, and coming up with all the possible columns will be impossible for even about 30 points. Also, even if we magically came up with all the columns, the solver might be strained in finding a handful of columns from a set of billions of columns (solving Master Problem in Figure 1). The remedy is to consider only a restricted set of columns (set J in Figure 1) and solve the problem. Let's call this Restricted Master Problem (RMP). Obviously, the optimal solution to RMP may not be optimal for the global problem but can give us some useful information as to which other columns to generate (hence the name CG) and add it to our RMP and resolve to improve our solution.
As a side note, for some problems like aircraft routing for a day, workforce scheduling, etc enumerating all the columns may be possible, and then solving the Master Problem (without linear relaxation) with a few hundred thousand or even a million columns can be possible using commercial solvers. So if your problem size is limited and you have commercial solvers, you can solve the problem to optimality. In fact, this may be the only way followed in most industries. But as problem size increases and/or you don't have the commercial solver, CG starts to shine.
The useful information I mentioned above is the dual values. A lot of readers may get lost if I want to explain the dual here. So I would save that for later and just introduce how duals can be utilized to solve the problem. Keep in my mind, duals can easily be obtained from the linear solvers once the restricted master problem is solved to optimality. For each set of constraints in the MP in Figure 1, we will have corresponding dual variables. From constraint (2) we will have |J| number of dual variables (let's call them u_j), one for each point. And from constraint (3) we will have 1 dual variable (let’s call it v).
Now for the subproblem (which is the shortest path problem, and I have not described yet), the cost of the arcs is reduced by the dual variables obtained from the RMP. With the new arc costs, you should be able to apply the shortest path algorithm and find a new shortest path. If that shortest path has a negative reduced cost, add that column to the RMP and resolve the RMP and get new duals and find new columns. Once no more shortest path with negative reduced cost can be found, viola you found the optimal solution to your problem.
In the next part of this series (Part 2), we will look at the algorithm and greater detail about the subproblem.