Multifit algorithm

The multifit algorithm is an algorithm for multiway number partitioning, originally developed for the problem of identical-machines scheduling. It was developed by Coffman, Garey and Johnson.^[1] Its novelty comes from the fact that it uses an algorithm for another famous problem - the bin packing problem - as a subroutine.

The algorithm

The input to the algorithm is a set S of numbers, and a parameter n. The required output is a partition of S into n subsets, such that the largest subset sum (also called the makespan) in as small as possible.

The algorithm uses as a subroutine, an algorithm called first-fit-decreasing bin packing (FFD). The FFD algorithm takes as input the same set S of numbers, and a bin-capacity c. It heuristically packs numbers into bins such that the sum of numbers in each bin is at most C, aiming to use as few bins as possible. Multifit runs FFD multiple times, each time with a different capacity C, until it finds some C such that FFD with capacity C packs S into at most n bins. To find it, it uses binary search as follows.

Let L := max ( sum(S) / n, max(S) ). Note, with bin-capacity smaller than L, every packing must use more than n bins.
Let U := max ( 2 sum(S) / n, max(S) ). Note, with bin-capacity at least U, FFD uses at most n bins. Proof: suppose by contradiction that some input s_i did not fit into any of the first n bins. Clearly this is possible only if i ≥ n+1. If s_i > C/2, then, since the inputs are ordered in descending order, the same inequality holds for all the first n+1 inputs in S. This means that sum(S) > (n+1)C/2 > n U/2, a contradiction to the definition of U. Otherwise, s_i ≤ C/2. So the sum of each of the first n bins is more than C/2. This again implies sum(S) > n C/2 > n U/2, contradiction.
Iterate k times (where k is a precision parameter):
- Let C := (L+U)/2. Run FFD on S with capacity C.
  - If FFD needs at most n bins, then decrease U by letting U := C.
  - If FFD needs more than n bins, then increase L by letting L := C.
Finally, run FFD with capacity U. It is guaranteed to use at most n bins. Return the resulting scheduling.

Performance

Multifit is a constant-factor approximation algorithm. It always finds a partition in which the makespan is at most a constant factor larger than the optimal makespan. To find this constant, we must first analyze FFD. While the standard analysis of FFD considers approximation w.r.t. number of bins when the capacity is constant, here we need to analyze approximation w.r.t. capacity when the number of bins is constant. Formally, for every input size S and integer n, let $OPT(S,n)$ be the smallest capacity such that S can be packed into n bins of this capacity. Note that $OPT(S,n)$ is the value of the optimal solution to the original scheduling instance.

Let $r_{n}$ be the smallest real number such that, for every input S, FFD with capacity $r_{n}\cdot OPT(S,n)$ uses at most n bins. Coffman, Garey and Johnson prove the following upper bounds on $r_{n}$ :^[1]

$r_{n}\leq 8/7\approx 1.14$ for n = 2;
$r_{n}\leq 15/13\approx 1.15$ for n = 3;
$r_{n}\leq 20/17\approx 1.176$ for n = 4,5,6,7;
$r_{n}\leq 122/100=1.22$ for all n ≥ 8.

During the MultiFit algorithm, the lower bound L is always a capacity for which it is impossible to pack S into n bins. Therefore, $L<r_{n}\cdot OPT(S,n)$ . Initially, the difference $U-L$ is at most sum(S) / n, which is at most $OPT(S,n)$ . After the MultiFit algorithm runs for k iterations, the difference shrinks k times by half, so $U-L\leq (1/2)^{k}\cdot OPT(S,n)$ . Therefore, $U\leq (r_{n}+(1/2)^{k})\cdot OPT(S,n)$ . Therefore, the scheduling returned by MultiFit has makespan at most $r_{n}+(1/2)^{k}$ times the optimal makespan. When $k$ is sufficiently large, the approximation factor of MultiFit can be made arbitrarily close to $r_{n}$ , which is at most 1.22.

Later papers performed a more detailed analysis of MultiFit, and proved that its approximation ratio is at most 6/5=1.2,^[2] and later, at most 13/11≈1.182.^[3] The original proof of this missed some cases; ^[4] presented a complete and simpler proof. The 13/11 cannot be improved: there is an instance with n=13 in which the approximation ratio is exactly 13/11.^[3]

Proof idea

Minimal counterexamples

The upper bounds on $r_{n}$ are proved by contradiction. For any integers p ≥ q, if $r_{n}>p/q$ , then there exists a (p/q)-counterexample, defined as an instance S and a number n of bins such that

S can be packed into n bins with capacity q;
FFD does not manage to pack S into n bins with capacity p.

If there exists such a counterexample, then there also exists a minimal (p/q)-counterexample, which is a (p/q)-counterexample with a smallest number of items in S and a smallest number of bins n. In a minimal (p/q)-counterexample, FFD packs all items in S except the last (smallest) one into n bins with capacity p. Given a minimal (p/q)-counterexample, denote by P₁,...,P_n the (incomplete) FFD packing into these n bins with capacity p, by P_n+1 the bin containing the single smallest item, and by Q₁,...,Q_n the (complete) optimal packing into n bins with capacity q. The following lemmas can be proved:

No union of k subsets from {Q_1,...,Q_n} is dominated by a union of k subsets from {P_1,...,P_n+1} ("dominated" means that each item in the dominated subset is mapped to a weakly-larger item in the dominating subset). Otherwise we could get a smaller counterexample as follows. [1] Delete all items in the P_i. Clearly, the incomplete FFD packing now needs n-k bins, and still the smallest item (or an entire bin) remains unpacked. [2] In the optimal packing Q_i, exchange each item with its dominating item. Now, the k subsets Q_i are larger (probably larger than q), but all other n-k subsets are smaller (in particular, at most q). Therefore, after deleting all items in the P_i, the remaining items can be packed into at most n-k bins of size q.
Each of Q_1,...,Q_n contains at least 3 items. This is because [a] each Q_i with a single item is dominated by the P_j that contains that item; [b] for each Q_i with two items x and y, if both x and y are in the same P_j, then Q_i is dominated by this P_j; [c] Suppose x≥y, x is in some P_j, and y is in some P_k to its right. This means that y did not fit into P_j. But x+y ≤ q. This means that P_j must contain some item z ≥ y. So Q_i is dominated by P_j. [d] Suppose x≥y, x is in some P_j, and y is in some P_k to its left. This means that there must be a previous item z ≥ x. So Q_i is dominated by P_k.
Otherwise we had domination and, by the previous lemma, could get a smaller counterexample.
Each of P_1,...,P_n contains at least 2 items. This is because, if some P_i contains only a single item, this implies that the last (smallest) item does not fit into it. This means that this single item must be alone in an optimal bundle, contradicting the previous lemma.
Let s be the size of the smallest item. Then $s>{\frac {n}{n-1}}(p-q)$ . Proof: Since s does not fit into the first n bundles, we have $sum(P_{i})+s>p$ , so $\sum _{i=1}^{n}sum(P_{i})+n\cdot s>n\cdot p$ . On the other hand, since all items fit into n bins of capacity q, we have $\sum _{i=1}^{n}sum(P_{i})+s\leq n\cdot q$ . Subtracting the inequalities gives $s>{\frac {n}{n-1}}(p-q)$ .
The size of every item is at most $q-2s$ . This is because there are at least 3 items in each optimal bin (with capacity q).
The sum of items in every bin P_1,...,P_n is larger than $p-s$ ; otherwise we could add the smallest item.

Loose upper bound

From the above lemmas, it is already possible to prove a loose upper bound $r_{n}\leq 5/4=1.25$ . Proof. Let S, n be a minimal (5/4)-counterexample. The above lemmas imply that -

$s>{\frac {n}{n-1}}(5-4)>1$ . Since the optimal capacity is 4, no optimal bin can contain 4 or more items. Therefore, each optimal bin must contain at most 3 items, and the number of items is at most 3n.
The size of each item is at most $4-2s$ , and the size of each FFD bin is more than $5-s$ . If some FFD bin contained only two items, its sum would be at most $8-4s=5+(3-3s)-s<5-s$ ; so each FFD bin must contain at least 3 items. But this means that FFD yields exactly n bins - a contradiction.

Structure of FFD packing

To prove tighter bounds, one needs to take a closer look at the FFD packing of the minimal (p/q)-counterexample. The items and FFD bins P₁,...,P_n are termed as follows:

A regular item is an item added to some bin P_i, before the next bin P_i+1 was opened. Equivalently, a regular item is an item in P_i which is at least as large as every item in every bin P_j for j>i.
A fallback item is an item added to some bin P_i, after the next bin P_i+1 was opened. Equivalently, a fallback item is an item in P_i which is smaller than the largest item in P_i+1.
A regular k-bin is a bin that contains k regular items and no fallback items.
A fallback k-bin is a bin that contains k regular items and some fallback items.

The following lemmas follow immediately from these definitions and the operation of FFD.

If k₁<k₂, then all k₁-bins are to the left of all k₂-bins.
If P_i is a k-bin, then the sum of the k regular items in P_i is larger than ${\frac {k}{k+1}}\cdot p$ , since otherwise we could add another item before opening a new bin.
If P_i and P_i+1 are both k-bins, and then the sum of the k regular items in P_i is at least as large as in P_i+1 (this is because the items are ordered by decreasing size).
All regular k-bins are to the left of all fallback k-bins.

In a minimal counterexample, there are no regular 1-bins (since each bin contains at least 2 items), so by the above lemmas, the FFD bins P₁,...,P_n are ordered by type:

Zero or more fallback 1-bins;
Then, zero or more regular 2-bins;
Then, zero or more fallback 2-bins;
Then, zero or more regular 3-bins;
Then, zero or more fallback 3-bins;
and so on.

Tighter upper bound

The upper bound $r_{n}\leq 1.22$ ^[1] is proved by assuming a minimal (122/100)-counterexample, analyzing the sizes of items in the different types of FFD bundles, and deriving a contradiction. By the lemmas above, we know that:

The size of the smallest item satisfies s > p-q = 22, so s = 22+D for some D>0.
The size of every item is at most q-2s = 56-2D.
The sum in each FFD bin is larger than p-s = 100-D.
This D must be at most 4. This is because, if D>4, the size of each item is larger than 26, so each optimal bin (with capacity 100) must contain at most 3 items. Each item is smaller than 56-2D and each FFD bin has a sum larger than 100-D, so each FFD bin must contain at least 3 items. Therefore, there are at most n FFD bins - contradiction.

The upper bound $r_{n}\leq 13/11\approx 1.182$ ^[3] is proved by assuming a minimal ((120-3d)/100)-counterexample, with some d<20/33, and deriving a contradiction. By the lemmas above, we know that:

The size of the smallest item satisfies s > p-q = 20-3d, so s = 20-3d+D for some D>0.

References

^ ^a ^b ^c Coffman, Jr., E. G.; Garey, M. R.; Johnson, D. S. (1978-02-01). "An Application of Bin-Packing to Multiprocessor Scheduling". SIAM Journal on Computing. 7 (1): 1–17. doi:10.1137/0207001. ISSN 0097-5397.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Friesen, Donald K. (1984-02-01). "Tighter Bounds for the Multifit Processor Scheduling Algorithm". SIAM Journal on Computing. 13 (1): 170–181. doi:10.1137/0213013. ISSN 0097-5397.
^ ^a ^b ^c Yue, Minyi (1990-12-01). "On the exact upper bound for the multifit processor scheduling algorithm". Annals of Operations Research. 24 (1): 233–259. doi:10.1007/BF02216826. ISSN 1572-9338.
^ Cao, Feng (1995), Du, Ding-Zhu; Pardalos, Panos M. (eds.), "Determining the Performance Ratio of Algorithm Multifit for Scheduling", Minimax and Applications, Nonconvex Optimization and Its Applications, Boston, MA: Springer US, pp. 79–96, doi:10.1007/978-1-4613-3557-3_5, ISBN 978-1-4613-3557-3, retrieved 2021-08-23

[:0-1] Coffman, Jr., E. G.; Garey, M. R.; Johnson, D. S. (1978-02-01). "An Application of Bin-Packing to Multiprocessor Scheduling". SIAM Journal on Computing. 7 (1): 1–17. doi:10.1137/0207001. ISSN 0097-5397.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[:1-2] Friesen, Donald K. (1984-02-01). "Tighter Bounds for the Multifit Processor Scheduling Algorithm". SIAM Journal on Computing. 13 (1): 170–181. doi:10.1137/0213013. ISSN 0097-5397.

[:2-3] Yue, Minyi (1990-12-01). "On the exact upper bound for the multifit processor scheduling algorithm". Annals of Operations Research. 24 (1): 233–259. doi:10.1007/BF02216826. ISSN 1572-9338.

[4] Cao, Feng (1995), Du, Ding-Zhu; Pardalos, Panos M. (eds.), "Determining the Performance Ratio of Algorithm Multifit for Scheduling", Minimax and Applications, Nonconvex Optimization and Its Applications, Boston, MA: Springer US, pp. 79–96, doi:10.1007/978-1-4613-3557-3_5, ISBN 978-1-4613-3557-3, retrieved 2021-08-23

[1]

[2]

[3]

[4]