Partition problem

Last updated January 10, 2024

In number theory and computer science, the partition problem, or number partitioning,^[1] is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S₁ and S₂ such that the sum of the numbers in S₁ equals the sum of the numbers in S₂. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".^[2]^[3]

There is an optimization version of the partition problem, which is to partition the multiset S into two subsets S₁, S₂ such that the difference between the sum of elements in S₁ and the sum of elements in S₂ is minimized. The optimization version is NP-hard, but can be solved efficiently in practice.^[4]

The partition problem is a special case of two related problems:

In the subset sum problem, the goal is to find a subset of S whose sum is a certain target number T given as input (the partition problem is the special case in which T is half the sum of S).
In multiway number partitioning, there is an integer parameter k, and the goal is to decide whether S can be partitioned into k subsets of equal sum (the partition problem is the special case in which k = 2).

However, it is quite different to the 3-partition problem: in that problem, the number of subsets is not fixed in advance – it should be |S|/3, where each subset must have exactly 3 elements. 3-partition is much harder than partition – it has no pseudo-polynomial time algorithm unless P = NP .^[5]

Examples

Given S = {3,1,1,2,2,1}, a valid solution to the partition problem is the two sets S₁ = {1,1,1,2} and S₂ = {2,3}. Both sets sum to 5, and they partition S. Note that this solution is not unique. S₁ = {3,1,1} and S₂ = {2,2,1} is another solution.

Not every multiset of positive integers has a partition into two subsets with equal sum. An example of such a set is S = {2,5}.

Computational hardness

The partition problem is NP hard. This can be proved by reduction from the subset sum problem.^[6] An instance of SubsetSum consists of a set S of positive integers and a target sum T; the goal is to decide if there is a subset of S with sum exactly T.

Given such an instance, construct an instance of Partition in which the input set contains the original set plus two elements: z₁ and z₂, with z₁ = sum(S) and z₂ = 2T. The sum of this input set is sum(S) + z₁ + z₂ = 2 sum(S) + 2T, so the target sum for Partition is sum(S) + T.

Suppose there exists a solution S′ to the SubsetSum instance. Then sum(S′) = T, so sum(S′ $\cup$ z_1) = sum(S) + T, so S′ $\cup$ z_1 is a solution to the Partition instance.
Conversely, suppose there exists a solution S′′ to the Partition instance. Then, S′′ must contain either z₁ or z₂, but not both, since their sum is more than sum(S) + T. If S'' contains z₁, then it must contain elements from S with a sum of exactly T, so S'' minus z₁ is a solution to the SubsetSum instance. If S'' contains z₂, then it must contain elements from S with a sum of exactly sum(S) − T, so the other objects in S are a solution to the SubsetSum instance.

Approximation algorithms

As mentioned above, the partition problem is a special case of multiway-partitioning and of subset-sum. Therefore, it can be solved by algorithms developed for each of these problems. Algorithms developed for multiway number partitioning include:

Greedy number partitioning – loops over the numbers, and puts each number in the set whose current sum is smallest. If the numbers are not sorted, then the runtime is O(n) and the approximation ratio is at most 3/2 ("approximation ratio" means the larger sum in the algorithm output, divided by the larger sum in an optimal partition). Sorting the numbers increases the runtime to O(n log n) and improves the approximation ratio to 7/6. If the numbers are distributed uniformly in [0,1], then the approximation ratio is at most $1+O\left({\frac {\log {\log {n}}}{n}}\right)$ almost surely, and $1+O\left({\frac {1}{n}}\right)$ in expectation.
Largest Differencing Method (also called the Karmarkar–Karp algorithm) sorts the numbers in descending order and repeatedly replaces numbers by their differences. The runtime complexity is O(n log n). In the worst case, its approximation ratio is similar – at most 7/6. However, in the average case it performs much better than the greedy algorithm: when numbers are distributed uniformly in [0,1], its approximation ratio is at most $1+1/n^{\Theta (\log {n})}$ in expectation. It also performs better in simulation experiments.
The Multifit algorithm uses binary search combined with an algorithm for bin packing. In the worst case, its approximation ratio is 8/7.
The subset sum problem has an FPTAS which can be used for the partition problem as well, by setting the target sum to sum(S)/2.

Exact algorithms

There are exact algorithms, that always find the optimal partition. Since the problem is NP-hard, such algorithms might take exponential time in general, but may be practically usable in certain cases. Algorithms developed for multiway number partitioning include:

The pseudopolynomial time number partitioning takes $O(nm)$ memory, where $m$ is the largest number in the input.
The Complete Greedy Algorithm (CGA) considers all partitions by constructing a binary tree. Each level in the tree corresponds to an input number, where the root corresponds to the largest number, the level below to the next-largest number, etc. Each branch corresponds to a different set in which the current number can be put. Traversing the tree in depth-first order requires only $O(n)$ space, but might take $O(2^{n})$ time. The runtime can be improved by using a greedy heuristic: in each level, develop first the branch in which the current number is put in the set with the smallest sum. This algorithm finds first the solution found by greedy number partitioning, but then proceeds to look for better solutions. Some variations of this idea are fully polynomial-time approximation schemes for the subset-sum problem, and hence for the partition problem as well.^[7]^[8]
The Complete Karmarkar-Karp algorithm (CKK) considers all partitions by constructing a binary tree. Each level corresponds to a pair of numbers. The left branch corresponds to putting them in different subsets (i.e., replacing them by their difference), and the right branch corresponds to putting them in the same subset (i.e., replacing them by their sum). This algorithm finds first the solution found by the largest differencing method, but then proceeds to find better solutions. It runs substantially faster than CGA on random instances. Its advantage is much larger when an equal partition exists, and can be of several orders of magnitude. In practice, problems of arbitrary size can be solved by CKK if the numbers have at most 12 significant digits.^[9] CKK can also run as an anytime algorithm: it finds the KK solution first, and then finds progressively better solutions as time allows (possibly requiring exponential time to reach optimality, for the worst instances).^[1] It requires $O(n)$ space, but in the worst case might take $O(2^{n})$ time.

Algorithms developed for subset sum include:

Horowitz and Sanhi – runs in time $O(2^{n/2}\cdot (n/2))$ , but requires $O(2^{n/2})$ space.
Schroeppel and Shamir – runs in time $O(2^{n/2}\cdot (n/4))$ , and requires much less space – $O(2^{n/4})$ .
Howgrave-Graham and Joux – runs in time $O(2^{n/3})$ , but it is a randomized algorithm that only solves the decision problem (not the optimization problem).

Hard instances and phase-transition

Sets with only one, or no partitions tend to be hardest (or most expensive) to solve compared to their input sizes. When the values are small compared to the size of the set, perfect partitions are more likely. The problem is known to undergo a "phase transition"; being likely for some sets and unlikely for others. If m is the number of bits needed to express any number in the set and n is the size of the set then $m/n<1$ tends to have many solutions and $m/n>1$ tends to have few or no solutions. As n and m get larger, the probability of a perfect partition goes to 1 or 0 respectively. This was originally argued based on empirical evidence by Gent and Walsh,^[10] then using methods from statistical physics by Mertens,^[11]^[12] and later proved by Borgs, Chayes, and Pittel.^[13]

Probabilistic version

A related problem, somewhat similar to the Birthday paradox, is that of determining the size of the input set so that we have a probability of one half that there is a solution, under the assumption that each element in the set is randomly selected with uniform distribution between 1 and some given value. The solution to this problem can be counter-intuitive, like the birthday paradox.

Variants and generalizations

Equal-cardinality partition is a variant in which both parts should have an equal number of items, in addition to having an equal sum. This variant is NP-hard too.^[5]^: SP12Proof. Given a standard Partition instance with some n numbers, construct an Equal-Cardinality-Partition instance by adding n zeros. Clearly, the new instance has an equal-cardinality equal-sum partition iff the original instance has an equal-sum partition. See also Balanced number partitioning .

Distinct partition is a variant in which all input integers are distinct. This variant is NP-hard too.^{[ citation needed ]}

Product partition is the problem of partitioning a set of integers into two sets with the same product (rather than the same sum). This problem is strongly NP-hard.^[14]

Kovalyov and Pesch^[15] discuss a generic approach to proving NP-hardness of partition-type problems.

Applications

One application of the partition problem is for manipulation of elections. Suppose there are three candidates (A, B and C). A single candidate should be elected using a voting rule based on scoring, e.g. the veto rule (each voter vetoes a single candidate and the candidate with the fewest vetoes wins). If a coalition wants to ensure that C is elected, they should partition their votes among A and B so as to maximize the smallest number of vetoes each of them gets. If the votes are weighted, then the problem can be reduced to the partition problem, and thus it can be solved efficiently using CKK. The same is true for any other voting rule that is based on scoring.^[16]

Notes

1 2 Korf 1998.
↑ Hayes, Brian (March–April 2002), "The Easiest Hard Problem" (PDF), American Scientist , Sigma Xi, The Scientific Research Society, vol. 90, no. 2, pp. 113–117, JSTOR 27857621
↑ Mertens 2006, p. 125.
↑ Korf, Richard E. (2009). Multi-Way Number Partitioning (PDF). IJCAI.
1 2 Garey, Michael; Johnson, David (1979). Computers and Intractability; A Guide to the Theory of NP-Completeness . pp. 96–105. ISBN 978-0-7167-1045-5.
↑ Goodrich, Michael. "More NP complete and NP hard problems" (PDF).
↑ Hans Kellerer; Ulrich Pferschy; David Pisinger (2004). Knapsack problems. Springer. p. 97. ISBN 9783540402862.
↑ Martello, Silvano; Toth, Paolo (1990). "4 Subset-sum problem". Knapsack problems: Algorithms and computer interpretations. Wiley-Interscience. pp. 105–136. ISBN 978-0-471-92420-3. MR 1086874.
↑ Korf, Richard E. (1995-08-20). "From approximate to optimal solutions: a case study of number partitioning". Proceedings of the 14th International Joint Conference on Artificial Intelligence. IJCAI'95. Vol. 1. Montreal, Quebec, Canada: Morgan Kaufmann Publishers. pp. 266–272. ISBN 978-1-55860-363-9.
↑ Gent & Walsh 1996.
↑ Mertens 1998.
↑ Mertens 2001, p. 130.
↑ Borgs, Chayes & Pittel 2001.
↑ Ng, C. T.; Barketau, M. S.; Cheng, T. C. E.; Kovalyov, Mikhail Y. (2010-12-01). ""Product Partition" and related problems of scheduling and systems reliability: Computational complexity and approximation". European Journal of Operational Research. 207 (2): 601–604. doi:10.1016/j.ejor.2010.05.034. ISSN 0377-2217.
↑ Kovalyov, Mikhail Y.; Pesch, Erwin (2010-10-28). "A generic approach to proving NP-hardness of partition type problems". Discrete Applied Mathematics. 158 (17): 1908–1912. doi: 10.1016/j.dam.2010.08.001 . ISSN 0166-218X.
↑ Walsh, Toby (2009-07-11). "Where Are the Really Hard Manipulation Problems? The Phase Transition in Manipulating the Veto Rule" (PDF). Written at Pasadena, California, USA. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence. San Francisco, California, USA: Morgan Kaufmann Publishers Inc. pp. 324–329. Archived (PDF) from the original on 2020-07-10. Retrieved 2021-10-05.

Related Research Articles

The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved.

The knapsack problem is the following problem in combinatorial optimization:

In computational complexity theory, NP is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time by a deterministic Turing machine, or alternatively the set of problems that can be solved in polynomial time by a nondeterministic Turing machine.

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset $of integers and a target-sum, and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-hard. Moreover, some restricted variants of it are NP-complete too, for example:$

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.

<span class="mw-page-title-main">Set cover problem</span> Classical problem in combinatorics

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.

In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the numeric value of the input —but not necessarily in the length of the input, which is the case for polynomial time algorithms.

The 3-partition problem is a strongly NP-complete problem in computer science. The problem is to decide whether a given multiset of integers can be partitioned into triplets that all have the same sum. More precisely:

In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable.

In computational complexity, an NP-complete problem is weakly NP-complete if there is an algorithm for the problem whose running time is polynomial in the dimension of the problem and the magnitudes of the data involved, rather than the base-two logarithms of their magnitudes. Such algorithms are technically exponential functions of their input size and are therefore not considered polynomial.

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets $S$ and $T$ , such that the number of edges between $S$ and $T$ is as large as possible. Finding such a cut is known as the max-cut problem.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible. Greedy algorithms process the numbers sequentially, and insert the next number into a bin in which the sum of numbers is currently smallest.

In computer science, pseudopolynomial time number partitioning is a pseudopolynomial time algorithm for solving the partition problem.

In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as LDM.

The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset $of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:$

Unrelated-machines scheduling is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We need to schedule n jobs J₁, J₂, ..., J_n on m different machines, such that a certain objective function is optimized. The time that machine i needs in order to process job j is denoted by p_i,j. The term unrelated emphasizes that there is no relation between values of p_i,j for different i and j. This is in contrast to two special cases of this problem: uniform-machines scheduling - in which p_i,j = p_i / s_j, and identical-machines scheduling - in which p_i,j = p_i.

Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of n items of different sizes, and two integers m, k. The output is a partition of the items into m subsets, such that the number of items in each subset is at most k. Subject to this, it is required that the sums of sizes in the m subsets are as similar as possible.

References

Borgs, Christian; Chayes, Jennifer; Pittel, Boris (2001), "Phase transition and finite-size scaling for the integer partitioning problem", Random Structures and Algorithms, 19 (3–4): 247–288, CiteSeerX 10.1.1.89.9577 , doi:10.1002/rsa.10004, S2CID 6819493
Gent, Ian; Walsh, Toby (August 1996). "Phase Transitions and Annealed Theories: Number Partitioning as a Case Study". In Wolfgang Wahlster (ed.). Proceedings of 12th European Conference on Artificial Intelligence. ECAI-96. John Wiley and Sons. pp. 170–174. CiteSeerX 10.1.1.2.4475 .
Gent, Ian; Walsh, Toby (1998), "Analysis of Heuristics for Number Partitioning", Computational Intelligence, 14 (3): 430–451, CiteSeerX 10.1.1.149.4980 , doi:10.1111/0824-7935.00069, S2CID 15344203
Korf, Richard E. (1998), "A complete anytime algorithm for number partitioning", Artificial Intelligence, 106 (2): 181–203, CiteSeerX 10.1.1.90.993 , doi:10.1016/S0004-3702(98)00086-1, ISSN 0004-3702
Mertens, Stephan (November 1998), "Phase Transition in the Number Partitioning Problem", Physical Review Letters, 81 (20): 4281–4284, arXiv: cond-mat/9807077 , Bibcode:1998PhRvL..81.4281M, doi:10.1103/PhysRevLett.81.4281, S2CID 119541289
Mertens, Stephan (2001), "A physicist's approach to number partitioning", Theoretical Computer Science, 265 (1–2): 79–108, arXiv: cond-mat/0009230 , doi:10.1016/S0304-3975(01)00153-0, S2CID 16534837
Mertens, Stephan (2006). "The Easiest Hard Problem: Number Partitioning". In Allon Percus; Gabriel Istrate; Cristopher Moore (eds.). Computational complexity and statistical physics. USA: Oxford University Press. pp. 125–140. arXiv: cond-mat/0310317 . Bibcode:2003cond.mat.10317M. ISBN 9780195177374.
Mertens, Stephan (1999), "A complete anytime algorithm for balanced number partitioning", arXiv: cs/9903011

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTEKorf1998-1] 1 2 Korf 1998.

[hayes-2] Hayes, Brian (March–April 2002), "The Easiest Hard Problem" (PDF), American Scientist , Sigma Xi, The Scientific Research Society, vol. 90, no. 2, pp. 113–117, JSTOR 27857621

[FOOTNOTEMertens2006[httpsbooksgooglecombooksid4YD6AxV95zECpgPA125_125]-3] Mertens 2006, p. 125.

[multi-4] Korf, Richard E. (2009). Multi-Way Number Partitioning (PDF). IJCAI.

[Garey_&_Johnson-5] 1 2 Garey, Michael; Johnson, David (1979). Computers and Intractability; A Guide to the Theory of NP-Completeness . pp. 96–105. ISBN 978-0-7167-1045-5.

[6] Goodrich, Michael. "More NP complete and NP hard problems" (PDF).

[7] Hans Kellerer; Ulrich Pferschy; David Pisinger (2004). Knapsack problems. Springer. p. 97. ISBN 9783540402862.

[8] Martello, Silvano; Toth, Paolo (1990). "4 Subset-sum problem". Knapsack problems: Algorithms and computer interpretations. Wiley-Interscience. pp. 105–136. ISBN 978-0-471-92420-3. MR 1086874.

[9] Korf, Richard E. (1995-08-20). "From approximate to optimal solutions: a case study of number partitioning". Proceedings of the 14th International Joint Conference on Artificial Intelligence. IJCAI'95. Vol. 1. Montreal, Quebec, Canada: Morgan Kaufmann Publishers. pp. 266–272. ISBN 978-1-55860-363-9.

[FOOTNOTEGentWalsh1996-10] Gent & Walsh 1996.

[FOOTNOTEMertens1998-11] Mertens 1998.

[FOOTNOTEMertens2001130-12] Mertens 2001, p. 130.

[FOOTNOTEBorgsChayesPittel2001-13] Borgs, Chayes & Pittel 2001.

[14] Ng, C. T.; Barketau, M. S.; Cheng, T. C. E.; Kovalyov, Mikhail Y. (2010-12-01). ""Product Partition" and related problems of scheduling and systems reliability: Computational complexity and approximation". European Journal of Operational Research. 207 (2): 601–604. doi:10.1016/j.ejor.2010.05.034. ISSN 0377-2217.

[15] Kovalyov, Mikhail Y.; Pesch, Erwin (2010-10-28). "A generic approach to proving NP-hardness of partition type problems". Discrete Applied Mathematics. 158 (17): 1908–1912. doi: 10.1016/j.dam.2010.08.001 . ISSN 0166-218X.

[Walsh_324–329-16] Walsh, Toby (2009-07-11). "Where Are the Really Hard Manipulation Problems? The Phase Transition in Manipulating the Veto Rule" (PDF). Written at Pasadena, California, USA. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence. San Francisco, California, USA: Morgan Kaufmann Publishers Inc. pp. 324–329. Archived (PDF) from the original on 2020-07-10. Retrieved 2021-10-05.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]