Greedy number partitioning

Last updated September 15, 2024

In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible. Greedy algorithms process the numbers sequentially, and insert the next number into a bin in which the sum of numbers is currently smallest.

Approximate algorithms

The simplest greedy partitioning algorithm is called list scheduling . It just processes the inputs in any order they arrive. It always returns a partition in which the largest sum is at most $2-{\frac {1}{k}}$ times the optimal (minimum) largest sum.^[1] This heuristic can be used as an online algorithm, when the order in which the items arrive cannot be controlled.

An improved greedy algorithm is called LPT scheduling. It processes the inputs by descending order of value, from large to small. Since it needs to pre-order the inputs, it can be used only as an offline algorithm. It guarantees that the largest sum is at most ${\frac {4k-1}{3k}}$ times the optimal (minimum) largest sum, and the smallest sum is at least ${\frac {3k-1}{4k-2}}$ times the optimal (maximum) smallest sum. See LPT scheduling for more details.

Complete greedy algorithm

The complete greedy algorithm (CGA) is an exact algorithm, i.e., it always finds an optimal solution. It works in the following way. After sorting the numbers in descending order (as in LPT), it constructs a k-ary tree. Each level corresponds to a number, and each of the k branches corresponds to a different set in which the current number can be put. Traversing the tree in depth-first order requires only O(n) space, but might take O(kⁿ) time. The runtime can be improved by using the greedy heuristic: in each level, develop first the branch in which the current number is put in the set with the smallest sum. This algorithm finds the greedy (LPT) solution first, but then proceeds to look for better solutions.

Several additional heuristics can be used to improve the runtime:^[2]

In a node in which the current sum-difference is at least the sum of all remaining numbers, the remaining numbers can just be put in the smallest-sum subset.
If we reach a leaf in which the sum-difference is 0 or 1, then the algorithm can terminate since this is the optimum.
If two or more subset sums in the current node are equal, then we can put the current number only in one of these subsets, thus reducing the size of the subtree by at least half.
The last number can be assigned only to the subset with the smaller sum.

Generalizations

In the fair item allocation problem, there are n items and k people, each of which assigns a possibly different value to each item. The goal is to partition the items among the people in as fair way as possible. The natural generalization of the greedy number partitioning algorithm is the envy-graph algorithm. It guarantees that the allocation is envy-free up to at most one item (EF1). Moreover, if the instance is ordered (- all agents rank the items in the same order), then the outcome is EFX, and guarantees to each agent at least ${\frac {2n}{3n-1}}$ of his maximin share. If the items are chores, then a similar algorithm guarantees ${\frac {4n-1}{3n}}$ MMS.^[3]

Implementations

Python: The prtpy library contains implementations of the greedy algorithm and complete greedy algorithm.

Related Research Articles

The knapsack problem is the following problem in combinatorial optimization:

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset $of integers and a target-sum, and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:$

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, splitting a network prefix into multiple subnets, and technology mapping in FPGA semiconductor chip design.

In computer science, a selection algorithm is an algorithm for finding the $th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .$

<span class="mw-page-title-main">Set cover problem</span> Classical problem in combinatorics

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

List scheduling is a greedy algorithm for Identical-machines scheduling. The input to this algorithm is a list of jobs that should be executed on a set of m machines. The list is ordered in a fixed order, which can be determined e.g. by the priority of executing the jobs, or by their order of arrival. The algorithm repeatedly executes the following steps until a valid schedule is obtained:

In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S₁ and S₂ such that the sum of the numbers in S₁ equals the sum of the numbers in S₂. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".

Matroid partitioning is a problem arising in the mathematical study of matroids and in the design and analysis of algorithms. Its goal is to partition the elements of a matroid into as few independent sets as possible. An example is the problem of computing the arboricity of an undirected graph, the minimum number of forests needed to cover all of its edges. Matroid partitioning may be solved in polynomial time, given an independence oracle for the matroid. It may be generalized to show that a matroid sum is itself a matroid, to provide an algorithm for computing ranks and independent sets in matroid sums, and to compute the largest common independent set in the intersection of two given matroids.

In computational geometry, a maximum disjoint set (MDS) is a largest set of non-overlapping geometric shapes selected from a given set of candidate shapes.

Maximin share (MMS) is a criterion of fair item allocation. Given a set of items with different values, the 1-out-of-n maximin-share is the maximum value that can be gained by partitioning the items into $parts and taking the part with the minimum value. An allocation of items among agents with different valuations is called MMS-fair if each agent gets a bundle that is at least as good as his/her 1-out-of- n maximin-share. MMS fairness is a relaxation of the criterion of proportionality - each agent gets a bundle that is at least as good as the equal split (of every resource). Proportionality can be guaranteed when the items are divisible, but not when they are indivisible, even if all agents have identical valuations. In contrast, MMS fairness can always be guaranteed to identical agents, so it is a natural alternative to proportionality even when the agents are different.$

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as LDM.

In the bin covering problem, items of different sizes must be packed into a finite number of bins or containers, each of which must contain at least a certain given total size, in a way that maximizes the number of bins used.

The multifit algorithm is an algorithm for multiway number partitioning, originally developed for the problem of identical-machines scheduling. It was developed by Coffman, Garey and Johnson. Its novelty comes from the fact that it uses an algorithm for another famous problem - the bin packing problem - as a subroutine.

The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset $of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:$

Identical-machines scheduling is an optimization problem in computer science and operations research. We are given n jobs J₁, J₂, ..., J_n of varying processing times, which need to be scheduled on m identical machines, such that a certain objective function is optimized, for example, the makespan is minimized.

Longest-processing-time-first (LPT) is a greedy algorithm for job scheduling. The input to the algorithm is a set of jobs, each of which has a specific processing-time. There is also a number m specifying the number of machines that can process the jobs. The LPT algorithm works as follows:

Order the jobs by descending order of their processing-time, such that the job with the longest processing time is first.
Schedule each job in this sequence into a machine in which the current load is smallest.

Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of n items of different sizes, and two integers m, k. The output is a partition of the items into m subsets, such that the number of items in each subset is at most k. Subject to this, it is required that the sums of sizes in the m subsets are as similar as possible.

The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most $bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.$

References

↑ Graham, Ron L. (1969-03-01). "Bounds on Multiprocessing Timing Anomalies". SIAM Journal on Applied Mathematics. 17 (2): 416–429. doi:10.1137/0117039. ISSN 0036-1399.
↑ Korf, Richard E. (August 1995). "From approximate to optimal solutions: A case study of number partitioning". In Mellish, Chris S. (ed.). IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence. pp. 266–272. ISBN 978-1-55860-363-9.
↑ Barman, Siddharth; Krishnamurthy, Sanath Kumar (21 April 2020). "Approximation Algorithms for Maximin Fair Division". ACM Transactions on Economics and Computation. 8 (1): 1–28. arXiv: 1703.01851 . doi:10.1145/3381525. S2CID 217191332.