Makespan Minimization for Unrelated Parallel Machines with Maintenance Windows
It’s a bit late, but last year during the Christmas break, Kaggle had a very interesting competition for the season. The story was that the toy list at Santa’s workshop was long, and elves had to make all the toys, with some constraints on their productivity rating and rest intervals. This is an adaptation of the classic problem of minimizing makespan, though with additional constraints. A good leaderboard position could be achieved using offline methods since the entire list of toys was supplied in advance, but as usual I pursued online solutions for fun and speed.
The problem they described can be reformulated as online makespan optimization for unrelated parallel machines with maintenance windows. For the moment we’ll just consider some basic approaches in the offline case, and I may write something else in the future which covers more online methods.
The inspiration for the proofs below and notation, as well as a broader overview of these topics, is found in lecture notes from a class Albert and Souza gave on Combinatorial Algorithms at Humboldt-Universität zu Berlin Institut für Informatik.
Problem statement
Let’s start by formalizing the problem a bit so we have a clear understanding of what’s happening.
Assume we have a set of machines $M := \{1, \ldots, m\}$ and a set of jobs $J := \{1, \ldots, n\}$ where job $j \in J$ on machine $i \in M$ requires $p_{ij}$ time units for processing. Further, let $J_i$ be the set of all jobs processed on machine $i$ and $c_j$ be the time at which job $j$ is completed. Now define the load on machine $i$ as $\ell_i := \sum_{j \in J_i} p_{ij}$. Also note the starting time of machine $i$ before the start of job $j$ as $s_j$. The makespan is then equivalent to the maximum load $\ell_{max} = c_j = max_{i \in M} \ell_i$.
Our basic objective is to find an assignment of jobs such that the makespan is minimized.
Furthermore, assume that that we must produce items without preemption, which means once an item has begun production on a given machine that it must continue until completion on the same machine.
In addition to this, the competition also had further constraints on intervals in which machines could operate, and their productivity bonuses (or penalties) for operating at different times. We’ll get to those later.
Also note that an algorithm is an $\alpha$-approximation for a problem if and only if for every instance of the problem the algorithm produces a solution which is within a factor $\alpha$ of the optimal solution. In our context we are dealing with minimization problems, so $\alpha \ge 1$ means that the solution found by the algorithm is at most $\alpha$ times the optimal solution.
For now, let’s consider the most basic formulation of the problem in the offline environment where we are presented with the complete job list.
Base case: Identical Parallel Machines
In this case, the machines are identical so the production cost/time of an item is the same regardless of which machine produces the item. This basic formulation is known to be NP-Hard, even for two machines. We can examine two approaches to this problem, the basic List Scheduling heuristic and next the Sorted List Scheduling/Least Processing Time (LPT) heuristic.
List Scheduling
This is approach is straightforward. Take the set of jobs $J$ in any order, and for each job $j \in J$ allocate the job to the machine $i \in M$ which has the smallest load $\ell_i$.
THEOREM: The List Scheduling algorithm is a 2-approximation for makespan scheduling on identical parallel machines.
PROOF: Let $T_{OPT}$ be the optimal makespan for a given instance of the problem. We can show that $s_j \le T_{OPT}$ for all $j \in J$. The implication then is that $c_j = s_j + p_j \le T_{OPT} + p_j \le 2 \cdot T_{OPT}$ for all $j \in J$ since it is necessarily true that for all $j \in J$ we have that $p_j \le T_{OPT}$.
By way of contradiction, assume there exists some $s_j \gt T_{OPT}$. Then the load on machine $i$ before the assignment of job $j$ is $\ell_i \gt T_{OPT}$ for all $i \in M$. Then all jobs $J^\prime \subseteq J$ scheduled before $j$ by the algorithm have total processing time $\sum_{j^\prime \in J^\prime}p_{j^\prime} \gt m \cdot T_{OPT}$. However, since the optimum schedule finishes all jobs $J$ at time $T_{OPT}$ we knows that $\sum_{j \in J} p_j \le m \cdot T_{OPT}$, a contradiction leading to the conclusion that the List Scheduling algorithm must start all jobs such that for all $j \in J$ we have $s_j \le T_{OPT}.$ $\blacksquare$
Sorted List Scheduling
This algorithms is similar to the one above, except that instead of simply taking the set of jobs $J$ in any order, we sort them in decreasing order by length.
THEOREM: The Sorted List Scheduling algorithm is a $3/2$-approximation for makespan scheduling on identical parallel machines.
PROOF: Let $T_{OPT}$ be the optimal makespan for a given instance of the problem. Then partition the jobs into a set of large jobs $J_L = \{j \in J \colon p_j \gt \frac{T_{OPT}}{2} \}$ and a set of small jobs $J_S = J - J_L$. Now notice that $|J_L| \le m$. Assume by way of contradiction that there are more than $m$ such jobs, then in any schedule (including the optimal one) there must be at least two such jobs scheduled on the same machine. Since the length a large job is more than $\frac{T_{OPT}}{2}$ this contradicts that $T_{OPT}$ is the optimal makespan.
Since there are at most $m$ large jobs and the algorithm schedules them first and on individual machines, we know that $\forall j \in J_L, c_j \le T_{OPT}$. Therefore, if a job completes later than $T_{OPT}$ it must be a small job with length at most $\frac{T_{OPT}}{2}$. Since all jobs start not later than $T_{OPT}$ we have that $\forall j \in J_S, c_j \le T_{OPT} + p_j \le \frac{3}{2} \cdot T_{OPT}$. $\blacksquare$
Extended case: Unrelated Parallel Machines
This case is the same as before, except that machines may not have the same rates of production. This means that a given job $j$ the processing time may be different depending on which machine processes the job, and $p_{ij}$ denotes the processing time for job $j$ on machine $i$.
The basic approach for this case is to relax the integer programming constraints to linear ones, use binary search to find a smallest feasible solution to the linear program, and then round to obtain the integer programming solution.
The range for the binary search is established by first greedily scheduling each job on the machine with the smallest load $\ell_i$. If $\alpha$ is the makespan of such a schedule, then the range for binary search is $[ \frac{\alpha}{m}, \alpha ]$.
Though this approach is more complicated, it can be shown that this is also a 2-approximation for the problem of makespan minimization on unrelated parallel machines.
The above examples are appropriate for the offline version of the problem where the list of items for production is known in advance, but does not touch at all upon the online case where the makespan must be optimized in a more streaming fashion.
Additional problem constraints
In the competition as designed, there were more constraints on the problem, notably that elves can only work during certain hours, with penalties incurred for time spend working outside these ours and increases in productivity for time spent working within those hours. This can be considered equivalent to the concept of maintenance windows for machines.
The winning solution
The winners of the competition posted posted a writeup (PDF) of their solution with lots of depth and explanation. They used a combination of techniques in addition to some observations about the data. Definitely recommend reading for further details.