In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. [1] : sec.5 The problem is parametrized by a positive integer k, and called k-way number partitioning. [2] The input to the problem is a multiset S of numbers (usually integers), whose sum is k*T.
The associated decision problem is to decide whether S can be partitioned into k subsets such that the sum of each subset is exactly T. There is also an optimization problem: find a partition of S into k subsets, such that the k sums are "as near as possible". The exact optimization objective can be defined in several ways:
These three objective functions are equivalent when k=2, but they are all different when k≥3. [6] [7]
All these problems are NP-hard, [8] but there are various algorithms that solve it efficiently in many cases.
Some closely-related problems are:
There are various algorithms that obtain a guaranteed approximation of the optimal solution in polynomial time. There are different approximation algorithms for different objectives.
The approximation ratio in this context is the largest sum in the solution returned by the algorithm, divided by the largest sum in the optimal solution (the ratio is larger than 1). Most algorithms below were developed for identical-machines scheduling.
Several polynomial-time approximation schemes (PTAS) have been developed:
The approximation ratio in this context is the smallest sum in the solution returned by the algorithm, divided by the smallest sum in the optimal solution (the ratio is less than 1).
Jin [13] studies a problem in which the goal is to maximize the sum, over every set i in 1,...,k, of the product of numbers in set i. In a more general variant, each set i may have a weight wi, and the goal is to maximize the weighted sum of products. This problem has an exact solution that runs in time O(n2).
Let Ci (for i between 1 and k) be the sum of subset i in a given partition. Instead of minimizing the objective function max(Ci), one can minimize the objective function max(f(Ci)), where f is any fixed function. Similarly, one can minimize the objective function sum(f(Ci)), or maximize min(f(Ci)), or maximize sum(f(Ci)). Alon, Azar, Woeginger and Yadid [14] presented general PTAS-s (generalizing the PTAS-s of Sanhi, Hochbaum and Shmoys, and Woeginger) for these four problems. Their algorithm works for any f which satisfies the following two conditions:
The runtime of their PTAS-s is linear in n (the number of inputs), but exponential in the approximation precision. The PTAS for minimizing sum(f(Ci)) is based on some combinatorial observations:
The PTAS uses an input rounding technique. Given the input sequence S = (v1,...,vn) and a positive integer d, the rounded sequence S#(d) is defined as follows:
In S#(d), all inputs are integer multiples of L/d2. Moreover, the above two observations hold for S#(d) too:
Based on these observations, all inputs in S#(d) are of the form hL/d2, where h is an integer in the range . Therefore, the input can be represented as an integer vector , where is the number of hL/d2 inputs in S#(d). Moreover, each subset can be represented as an integer vector , where is the number of hL/d2 inputs in the subset. The subset sum is then . Denote by , the set of vectors with . Since the sum of elements in such a vector is at most 2d, the total number of these elements is smaller than , so .
There are two different ways to find an optimal solution to S#(d). One way uses dynamic programming: its run-time is a polynomial whose exponent depends on d. The other way uses Lenstra's algorithm for integer linear programming.
Define as the optimal (minimum) value of the objective function sum(f(Ci)), when the input vector is and it has to be partitioned into k subsets, among all partitions in which all subset sums are strictly between L#/2 and 2L#.
It can be solved by the following recurrence relation:
For each k and n, the recurrence relation requires to check at most vectors. The total number of vectors n to check is at most , where n is the original number of inputs. Therefore, the run-time of the dynamic programming algorithm is . It is linear in n for any fixed d.
For each vector t in T, introduce a variable xt denoting the number of subsets with this configuration. Minimizing sum(f(Ci)) can be attained by the solving the following ILP:
The number of variables is at most , and the number of equations is - both are constants independent of n, k. Therefore, Lenstra's algorithm can be used. Its run-time is exponential in the dimension (), but polynomial in the binary representation of the coefficients, which are in O(log(n)). Constructing the ILP itself takes time O(n).
The following lemmas relate the partitions of the rounded instance S#(d) and the original instance S.
Given a desired approximation precision ε>0, let δ>0 be the constant corresponding to ε/3, whose existence is guaranteed by Condition F*. Let . It is possible to show that converted partition of S has a total cost of at most , so the approximation ratio is 1+ε.
In contrast to the above result, if we take f(x) = 2x, or f(x)=(x-1)2, then no PTAS for minimizing sum(f(Ci)) exists unless P=NP. [14] : Sec.4 Note that these f(x) are convex, but they do not satisfy Condition F* above. The proof is by reduction from partition problem.
There are exact algorithms, that always find the optimal partition. Since the problem is NP-hard, such algorithms might take exponential time in general, but may be practically usable in certain cases.
The bin packing problem has many fast solvers. A BP solver can be used to find an optimal number partitioning. [25] The idea is to use binary search to find the optimal makespan. To initialize the binary search, we need a lower bound and an upper bound:
Given a lower and an upper bound, run the BP solver with bin size middle := (lower+upper)/2.
In the balanced number partitioning problem, there are constraints on the number of items that can be allocated to each subset (these are called cardinality constraints).
Another variant is the multidimensional number partitioning. [26]
One application of the partition problem is for manipulation of elections. Suppose there are three candidates (A, B and C). A single candidate should be elected using the veto voting rule, i.e., each voter vetoes a single candidate and the candidate with the fewest vetoes wins. If a coalition wants to ensure that C is elected, they should partition their vetoes among A and B so as to maximize the smallest number of vetoes each of them gets. If the votes are weighted, then the problem can be reduced to the partition problem, and thus it can be solved efficiently using CKK. For k=2, the same is true for any other voting rule that is based on scoring. However, for k>2 and other voting rules, some other techniques are required. [3]