Greedy number partitioning

In computer science, greedy number partitioning is a greedy algorithm for multiway number partitioning. It was first analyzed by Ronald Graham in the 1960s in the context of the identical-machines scheduling problem;^[1]^[2]^: sec.5 In this context, it is often called Longest Processing Time (LPT).

The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible.

Algorithm

The standard algorithm first sorts the numbers in descending order, and then Iteratively adds the next-largest number to a set in which the current sum is smallest. For example, if the input set is S = {4,5,6,7,8} and k=2, then the resulting partition is {8,5,4}, {7,6}; if k=3, then the resulting 3-way partition is {8}, {7, 4}, {6, 5}.

The running time of this algorithm is dominated by the sorting, which takes O(n log n) time.

The algorithm might not find the optimal partition. For example, in the above instance the optimal partition {8,7}, {6,5,4}, where both sums are equal to 15. However, its suboptimality is bounded both in the worst case and in the average case.

Guarantees

Worst-case maximum sum

In the worst case, the largest sum in the greedy partition is at most ${\frac {4k-1}{3k}}={\frac {4}{3}}-{\frac {1}{3k}}$ times the optimal (minimum) largest sum.^[2]^: sec.5 In particular, when k =2 this ratio is 7/6.

Worst-case minimum sum

In the worst case, the smallest sum in the returned partition is at least ${\frac {3}{4}}$ times the optimal (maximum) smallest sum.^[3]

Not an appropriate template for mainspace, see MOS:COLLAPSE.

The proof is by contradiction. We consider a minimal counterexample, that is, a counterexample with a smallest k and fewest input numbers. Denote the greedy partition by P₁,...,P_k, and the optimal partition by Q₁,...,Q_k. Some properties of a minimal counterexample are:

The min-sum in the optimal partition is 4, and the min-sum in the greedy partition is less than 3 (this is just normalization - it is without loss of generality).
The max-sum in the greedy partition is more than 4 (since the total sum in both partitions is the same, and it is at least 4k).
If sum(P_i)≥3 for some greedy bin P_i, then P_i is not dominated by any optimal bin Q_j. Proof: if P_i is dominated by Q_j, then we can construct a smaller counterexample by decreasing k to k-1 and removing the items in P_i. The min-sum in the greedy partition remains less than 3. In the optimal partition, the items in P_i can be replaced by their dominating items in Q_j, so the min-sum remains at least 4.
If sum(P_i)≥3 for some greedy bin P_i, then P_i contains at least two numbers. Proof: if P_i contains only one number x, then it is dominated by the optimal bin Q_j which contains x.givese some input x is at least 3, and the greedy algorithm puts it in some P_i. Then, since there is a bundle with sum less than 3, the greedy algorithm will not put any other input in P_i, contradicting the previous lemma.
Every greedy bin P_i contains at most one input weakly-larger than 3/2. Proof: Let P_i be the first greedy bin which is assigned two such inputs. Since inputs are assigned in descending order, P_i is the first greedy bin assigned two inputs. This means that it must contain the smallest two inputs from among the largest k+1 inputs. Moreover, since the sum of these two items is at least 3/2+3/2=3, P_i is not assigned any other input. On the other hand, by the pigeonhole principle, there must be some optimal bin Q_j that contains some two inputs from among the largest k+1 inputs; so P_i is dominated by Q_j.
During the run of the greedy algorithm, the sum in every bin P_i becomes at least 8/3 before the sum of any bin exceeds 4. Proof: Let y be the first input added to some bin P_i, which made its sum larger than 4. Before y was added, P_i had the smallest sum, which by assumption was smaller than 8/3; this means that y>4/3. Let T denote the set of all inputs from the first one down to y; all these inputs are larger than 4/3 too. Since P_i was smaller than 8/3, it contained exactly one item x from T. So now P_i contains exactly 2 items {x,y}, and remains with these 2 items until the algorithm ends. Let k be the number of items from the first one down to x. We now show a contradiction by counting the items in T in two ways.
- First, consider the n optimal bins. If any such bin contains an item at least as large as x, then it cannot contain any other item of T, since otherwise it dominates {x,y}. Moreover, any such bin cannot contain three items from T, since the sum of any two of them is larger than 8/3, which is larger than x; and the third one is at least y, so it dominates {x,y}. Therefore, the number of items in T is at most 1*k + 2*(n-k) = 2n-k.
- Now, consider the n greedy bins. When y is added to the bundle containing x, it is the bundle with the smallest sum. Therefore, all elements of T that are smaller than x, must be in a greedy bin with at least one other item of T. The same is true for x and y. Therefore, the number of items in T is at least (k-1)+2*(n-k+1) = 2n-k+1 - contradiction.
We can assume, without loss of generality, that all inputs are either smaller than 1/3, or at least 1. Proof: Suppose some input x is in [1/3,1). We replace x with 1. This obviously does not decrease the optimal min-sum. We show that it does not change the greedy min-sum. We know that some greedy bundle P_i has a final sum larger than 4. Before the last input was added into P_i, its sum was smaller than 3; so P_i became larger than 4 when some input larger than 1 was added to it. By the previous lemma, at that point the sum of all other greedy bundles was at least 8/3. The algorithm arrives at x afterwards. Once the algorithm adds x to some bin P_j, the sum of P_j becomes at least 8/3+1/3=3, so no more items are added into P_j. So P_j contains only one input with size in [1/3,1). Once x is replaced with 1, it is still inserted into P_j, and its sum is still above 3. So the greedy min-sum does not change.
We can now partition the inputs into small (less than 1/3) and large (at least 1). The set of small items in P_i is denoted by S_i. Note that, when the algorithm starts processing small items, the sum in all bundles is at least 8/3.

The proof that a minimal counterexample does not exist uses a weighting scheme. Each input x is assigned a weight w(x) according to its size and greedy bundle P_i:

If x is a large item:
- If x is the single large item in P_i, then w(x)=8/3.
- If P_i contains exactly two items {x,y} and both of them are large, and x>y, and sum(P_i)≥3, then w(x)=8/3.
- Otherwise, w(x)=4/3.
If x is a small item:
- if sum(P_i)≥3, then w(x) = 4x/(3 sum(S_i)); so w(S_i) = 4/3.
- if sum(P_i)<3, then w(x) = 2x/(3 sum(S_i)); so w(S_i) = 2/3.

This weighting scheme has the following properties:

If x≥2, then w(x)=8/3. Proof: x is large. Suppose it is in P_i. If P_i contains another large item y, then x+y≥3 so there is no other item in P_i. Moreover, x>y since there is at most one item larger than 3/2. So w(x)=8/3.
If x<1/3, then w(x) > 2x. Proof: x is small. Suppose it is in P_i.
- If sum(P_i)≥3 then, since sum(P_i) was smaller than 3 before x was added to it, it is now smaller than 10/3. But when the algorithm started processing small items, sum(P_i) was at least 8/3. This means that sum(S_i) < 2/3, so w(x) = 4x/(3 sum(S_i)) > 2x.
- If sum(P_i)<3 then sum(S_i) < 3-8/3=1/3, so w(x) = 2x/(3 sum(S_i)) > 2x.
The weight of every greedy bin P_i is at most 4, and the weight of at least one greedy bin is at most 10/3. Proof:
- If all inputs in P_i are large, then it contains either a single input with weight 8/3, two inputs with weights 8/3+4/3, or three inputs with weights 4/3+4/3+4/3.
- If some inputs in P_i are small, then their total weight is at most 4/3. There are at most two large inputs, and their weights are either 8/3 or 4/3+4/3.
- Finally, the weight of the greedy bin with sum smaller than 3 is at most 8/3 (if it has only large inputs) or 10/3 (if it has some small inputs).
The weight of every optimal bin Q_j is at least 4. Proof:
- If Q_j contains only small items, then each of them satisfies w(x) > 2x, so w(Q_j) > 2 sum(Q_j) ≥ 8.
- If Q_j contains exactly one large item x, then it must contain some small items whose sum is at least 4-x and weight at least 8-2x. Then, either x<2 and the weight of small items is at least 8-4=4, or x in (2,3) and w(x)=8/3 and the weight of small items is at least 8-6=2. In both cases the total weight is at least 4.
- If Q_j contains exactly two large items x>y, and x≥2, then their is at least 8/3+4/3=4. If x+y≤10/3, then the sum of small items must be at least 2/3, so the total weight is at least 4/3+4/3+2*2/3=4. Otherwise, x>5/3. So x was the first input in some greedy bin P_k. Let z be the second input added into P_k. If x+z≥3, then there are no more inputs in P_k, so w(x)=8/3 and we are done. Otherwise, x+z<3. Let v be the smallest input in some greedy bin whose sum exceeds 4. Since x<8/3, z must have been processed before v, so z≥v. Consider now any small item t in Q_j, and suppose it is in some greedy bin P_i.
  - If sum(P_i)<3, then the fact that v was not put in P_i implies that v > 4-sum(large-items-in-P_i) > 1+sum(small-items-in-P_i). Therefore, 1+sum(S_i)+x < v+x ≤ z+x < 3 and sum(S_i) < 2-x. This means that 2*sum(S_i) < 4-2x ≤ 4-x-y ≤ sum(small-items-in-Q_j). So w(t) = 2t/(3sum(S_i)) > 4t/(3sum(small-items-in-Q_j)).
  - If sum(P_i)≥3, and sum(S_i)≤1, then w(t)=4/3 and we are done. Since sum(P_i) was less than 3 before t was added into it, sum(P_i)<3+sum(S_i)/2. The fact that v was not put in P_i implies that v > 4-sum(large-items-in-P_i) > 1+sum(small-items-in-P_i)/2. Similariy to the previous paragraph, w(t) > 4t/(3sum(small-items-in-Q_j)).
  - Therefore, the total weight of all small items in Q_j is at least 4/3, so the total weight of Q_j is at least 4/3+10/3>4.
- If Q_j contains exactly three or more large items, then its total weight is at least 4/3+4/3+4/3=4.
The last two claims are contradictory, since the former implies that the weight of all inputs is at most 4k-2/3, and the latter implies that the weight of all inputs is at least 4k. Therefore, a counterexample does not exist.

A more sophisticated analysis shows that the ratio is at most ${\frac {3k-1}{4k-2}}$ .^[4]^[5] It is tight.^[3] In particular, when k=2 the ratio is 5/6.

Average-case maximum sum

In the average case, if the input numbers are distributed uniformly in [0,1], then the largest sum is $1+O(\log {\log {n}}/n)$ times the optimum almost surely , and $1+O(1/n)$ in expectation.^[6]

Implementation

Below is an example, written in Python, for the greedy algorithm for k=2.

def find_partition(numbers):
    """Separate given numbers into two series of equal sum.

    Args:
        numbers: an collection of numbers, for an example a list of integers.

    Returns:
        Two lists of numbers.
    """
    A = []
    B = []
    sum_A = 0
    sum_B = 0
    for n in sorted(numbers, reverse=True):
        if sum_A < sum_B:
           A.append(n)
           sum_A = sum_A + n
        else:
           B.append(n)
           sum_B = sum_B + n
    return (A, B)

Example

>>> find_partition([1, 2, 3, 4, 5])
([4, 3], [5, 2, 1])

An exact algorithm

The complete greedy algorithm (CGA) is an exact algorithm, i.e., it always finds an optimal solution. It works in the following way. After sorting the numbers in descending order, it constructs a k-ary tree. Each level corresponds to a number, and each of the k branches corresponds to a different set in which the current number can be put. Traversing the tree in depth-first order requires only O(n) space, but might take O(kⁿ) time. The runtime can be improved by using the greedy heuristic: in each level, develop first the branch in which the current number is put in the set with the smallest sum. This algorithm finds the greedy solution first, but then proceeds to look for better solutions.

Several additional heuristics can be used in the case k=2 to improve the runtime:^[7]

In a node in which the current sum-difference is at least the sum of all remaining numbers, the remaining numbers can just be put in the smallest-sum subset.
If we reach a leaf in which the sum-difference is 0 or 1, then the algorithm can terminate since this is the optimum.
If the subset sums in the current node are equal, then we can put the current number only in one subset, thus reducing the size of the subtree by half.
The last number can be assigned only to the subset with the smaller sum.

Generalizations

In the fair item allocation problem, there are n items and k people, each of which assigns a possibly different value to each item. The goal is to partition the items among the people in as fair way as possible. The natural generalization of the greedy number partitioning algorithm is the envy-graph algorithm. It guarantees that the allocation is envy-free up to at most one item (EF1). Moreover, if the instance is ordered (- all agents rank the items in the same order), then the outcome is EFX, and guarantees to each agent at least ${\frac {2n}{3n-1}}$ of his maximin share. If the items are chores, then a similar algorithm guarantees ${\frac {4n-1}{3n}}$ MMS.^[8]

Online settings

Often, the inputs come online, and their sizes becomes known only when they arrive. In this case, it is not possible to sort them in advance. List scheduling is a similar algorithm that takes a list in any order, not necessarily sorted.Its approximation ratio is ${\frac {2k-1}{k}}=2-{\frac {1}{k}}$ .

A more sophisticated adaptation of LPT to an online setting attains an approximation ratio of 3/2.^[9]

References

^ Graham, R. L. (November 1966). "Bounds for Certain Multiprocessing Anomalies". Bell System Technical Journal. 45 (9): 1563–1581. doi:10.1002/j.1538-7305.1966.tb01709.x.
^ ^a ^b Graham, R. L. (March 1969). "Bounds on Multiprocessing Timing Anomalies". SIAM Journal on Applied Mathematics. 17 (2): 416–429. doi:10.1137/0117039.
^ ^a ^b Deuermeyer, Bryan L.; Friesen, Donald K.; Langston, Michael A. (June 1982). "Scheduling to Maximize the Minimum Processor Finish Time in a Multiprocessor System". SIAM Journal on Algebraic Discrete Methods. 3 (2): 190–196. doi:10.1137/0603019.
^ Csirik, János; Kellerer, Hans; Woeginger, Gerhard (June 1992). "The exact LPT-bound for maximizing the minimum completion time". Operations Research Letters. 11 (5): 281–287. doi:10.1016/0167-6377(92)90004-M.
^ Wu, Bang Ye (December 2005). "An analysis of the LPT algorithm for the max–min and the min–ratio partition problems". Theoretical Computer Science. 349 (3): 407–419. doi:10.1016/j.tcs.2005.08.032.
^ Frenk, J.B.G.; Kan, A.H.G.Rinnooy (June 1986). "The rate of convergence to optimality of the LPT rule". Discrete Applied Mathematics. 14 (2): 187–197. doi:10.1016/0166-218X(86)90060-0. hdl:1765/11698.
^ From approximate to optimal solutions: A case study of number partitioning. Ijcai'95. 20 August 1995. pp. 266–272. ISBN 9781558603639.
^ Barman, Siddharth; Krishnamurthy, Sanath Kumar (21 April 2020). "Approximation Algorithms for Maximin Fair Division". ACM Transactions on Economics and Computation. 8 (1): 1–28. arXiv:1703.01851. doi:10.1145/3381525. S2CID 217191332.
^ Chen, Bo; Vestjens, Arjen P. A. (1 November 1997). "Scheduling on identical machines: How good is LPT in an on-line setting?". Operations Research Letters. 21 (4): 165–169. doi:10.1016/S0167-6377(97)00040-0.

[:2-1] Graham, R. L. (November 1966). "Bounds for Certain Multiprocessing Anomalies". Bell System Technical Journal. 45 (9): 1563–1581. doi:10.1002/j.1538-7305.1966.tb01709.x.

[:1-2] Graham, R. L. (March 1969). "Bounds on Multiprocessing Timing Anomalies". SIAM Journal on Applied Mathematics. 17 (2): 416–429. doi:10.1137/0117039.

[:4-3] Deuermeyer, Bryan L.; Friesen, Donald K.; Langston, Michael A. (June 1982). "Scheduling to Maximize the Minimum Processor Finish Time in a Multiprocessor System". SIAM Journal on Algebraic Discrete Methods. 3 (2): 190–196. doi:10.1137/0603019.

[4] Csirik, János; Kellerer, Hans; Woeginger, Gerhard (June 1992). "The exact LPT-bound for maximizing the minimum completion time". Operations Research Letters. 11 (5): 281–287. doi:10.1016/0167-6377(92)90004-M.

[5] Wu, Bang Ye (December 2005). "An analysis of the LPT algorithm for the max–min and the min–ratio partition problems". Theoretical Computer Science. 349 (3): 407–419. doi:10.1016/j.tcs.2005.08.032.

[6] Frenk, J.B.G.; Kan, A.H.G.Rinnooy (June 1986). "The rate of convergence to optimality of the LPT rule". Discrete Applied Mathematics. 14 (2): 187–197. doi:10.1016/0166-218X(86)90060-0. hdl:1765/11698.

[:0-7] From approximate to optimal solutions: A case study of number partitioning. Ijcai'95. 20 August 1995. pp. 266–272. ISBN 9781558603639.

[:3-8] Barman, Siddharth; Krishnamurthy, Sanath Kumar (21 April 2020). "Approximation Algorithms for Maximin Fair Division". ACM Transactions on Economics and Computation. 8 (1): 1–28. arXiv:1703.01851. doi:10.1145/3381525. S2CID 217191332.

[9] Chen, Bo; Vestjens, Arjen P. A. (1 November 1997). "Scheduling on identical machines: How good is LPT in an on-line setting?". Operations Research Letters. 21 (4): 165–169. doi:10.1016/S0167-6377(97)00040-0.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]