Subset sum problem

Last updated

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . [1] The problem is known to be NP-hard. Moreover, some restricted variants of it are NP-complete too, for example: [1]

Contents

SSP can also be regarded as an optimization problem: find a subset whose sum is at most T, and subject to that, as close as possible to T. It is NP-hard, but there are several algorithms that can solve it reasonably quickly in practice.

SSP is a special case of the knapsack problem and of the multiple subset sum problem.

Computational hardness

The run-time complexity of SSP depends on two parameters:

As both n and L grow large, SSP is NP-hard. The complexity of the best known algorithms is exponential in the smaller of the two parameters n and L. The problem is NP-hard even when all input integers are positive (and the target-sum T is a part of the input). This can be proved by a direct reduction from 3SAT. [2] It can also be proved by reduction from 3-dimensional matching (3DM): [3]

The following variants are also known to be NP-hard:

The analogous counting problem #SSP, which asks to enumerate the number of subsets summing to the target, is #P-complete. [4]

Exponential time algorithms

There are several ways to solve SSP in time exponential in n. [5]

Inclusion–exclusion

The most naïve algorithm would be to cycle through all subsets of n numbers and, for every one of them, check if the subset sums to the right number. The running time is of order , since there are subsets and, to check each subset, we need to sum at most n elements.

The algorithm can be implemented by depth-first search of a binary tree: each level in the tree corresponds to an input number; the left branch corresponds to excluding the number from the set, and the right branch corresponds to including the number (hence the name Inclusion-Exclusion). The memory required is . The run-time can be improved by several heuristics: [5]

Horowitz and Sahni

In 1974, Horowitz and Sahni [6] published a faster exponential-time algorithm, which runs in time , but requires much more space - . The algorithm splits arbitrarily the n elements into two sets of each. For each of these two sets, it stores a list of the sums of all possible subsets of its elements. Each of these two lists is then sorted. Using even the fastest comparison sorting algorithm, Mergesort for this step would take time . However, given a sorted list of sums for elements, the list can be expanded to two sorted lists with the introduction of a ()th element, and these two sorted lists can be merged in time . Thus, each list can be generated in sorted form in time . Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to T in time . To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than T, the algorithm moves to the next element in the first array. If it is less than T, the algorithm moves to the next element in the second array. If two elements that sum to T are found, it stops. (The sub-problem for two elements sum is known as two-sum. [7] )

Schroeppel and Shamir

In 1981, Schroeppel and Shamir presented an algorithm [8] based on Horowitz and Sanhi, that requires similar runtime - , much less space - . Rather than generating and storing all subsets of n/2 elements in advance, they partition the elements into 4 sets of n/4 elements each, and generate subsets of n/2 element pairs dynamically using a min heap, which yields the above time and space complexities since this can be done in and space given 4 lists of length k.

Due to space requirements, the HS algorithm is practical for up to about 50 integers, and the SS algorithm is practical for up to 100 integers. [5]

Howgrave-Graham and Joux

In 2010, Howgrave-Graham and Joux [9] presented a probabilistic algorithm that runs faster than all previous ones - in time using space . It solves only the decision problem, cannot prove there is no solution for a given sum, and does not return the subset sum closest to T.

The techniques of Howgrave-Graham and Joux were subsequently extended [10] bringing the time-complexity to .

Pseudo-polynomial time dynamic programming solutions

SSP can be solved in pseudo-polynomial time using dynamic programming. Suppose we have the following sequence of elements in an instance:

We define a state as a pair (i, s) of integers. This state represents the fact that

"there is a nonempty subset of which sums to s."

Each state (i, s) has two next states:

Starting from the initial state (0, 0), it is possible to use any graph search algorithm (e.g. BFS) to search the state (N, T). If the state is found, then by backtracking we can find a subset with a sum of exactly T.

The run-time of this algorithm is at most linear in the number of states. The number of states is at most N times the number of different possible sums. Let A be the sum of the negative values and B the sum of the positive values; the number of different possible sums is at most B-A, so the total runtime is in . For example, if all input values are positive and bounded by some constant C, then B is at most N C, so the time required is .

This solution does not count as polynomial time in complexity theory because is not polynomial in the size of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the values of A and B, which are exponential in their numbers of bits. However, Subset Sum encoded in unary is in P, since then the size of the encoding is linear in B-A. Hence, Subset Sum is only weakly NP-Complete.

For the case that each is positive and bounded by a fixed constant C, in 1999, Pisinger found a linear time algorithm having time complexity (note that this is for the version of the problem where the target sum is not necessarily zero, as otherwise the problem would be trivial). [11] In 2015, Koiliaris and Xu found a deterministic algorithm for the subset sum problem where T is the sum we need to find. [12] In 2017, Bringmann found a randomized time algorithm. [13]

In 2014, Curtis and Sanches found a simple recursion highly scalable in SIMD machines having time and space, where p is the number of processing elements, and is the lowest integer. [14] This is the best theoretical parallel complexity known so far.

A comparison of practical results and the solution of hard instances of the SSP is discussed by Curtis and Sanches. [15]

Polynomial time approximation algorithms

Suppose all inputs are positive. An approximation algorithm to SSP aims to find a subset of S with a sum of at most T and at least r times the optimal sum, where r is a number in (0,1) called the approximation ratio.

Simple 1/2-approximation

The following very simple algorithm has an approximation ratio of 1/2: [16]

When this algorithm terminates, either all inputs are in the subset (which is obviously optimal), or there is an input that does not fit. The first such input is smaller than all previous inputs that are in the subset and the sum of inputs in the subset is more than T/2 otherwise the input also is less than T/2 and it would fit in the set. Such a sum greater than T/2 is obviously more than OPT/2.

Fully-polynomial time approximation scheme

The following algorithm attains, for every , an approximation ratio of . Its run time is polynomial in n and . Recall that n is the number of inputs and T is the upper bound to the subset sum.

initialize a list L to contain one element 0.  for eachi from 1 to ndo     let Ui be a list containing all elements y in L, and all sums xi + y for all y in L.     sort Ui in ascending order     make L empty      let y be the smallest element of Ui     add y to Lfor each element z of Ui in increasing order do// Trim the list by eliminating numbers close to one another// and throw out elements greater than the target sum T.ify +  ε T/n < zTtheny = z             add z to Lreturn the largest element in L.

Note that without the trimming step (the inner "for each" loop), the list L would contain the sums of all subsets of inputs. The trimming step does two things:

These properties together guarantee that the list L contains no more than elements; therefore the run-time is polynomial in .

When the algorithm ends, if the optimal sum is in L, then it is returned and we are done. Otherwise, it must have been removed in a previous trimming step. Each trimming step introduces an additive error of at most , so n steps together introduce an error of at most . Therefore, the returned solution is at least which is at least .

The above algorithm provides an exact solution to SSP in the case that the input numbers are small (and non-negative). If any sum of the numbers can be specified with at most P bits, then solving the problem approximately with is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in n and (i.e., exponential in P).

Kellerer, Mansini, Pferschy and Speranza [17] and Kellerer, Pferschy and Pisinger [18] present other FPTAS-s for subset sum.

See also

Related Research Articles

The P versus NP problem is a major unsolved problem in theoretical computer science. In informal terms, it asks whether every problem whose solution can be quickly verified can also be quickly solved.

<span class="mw-page-title-main">Knapsack problem</span> Problem in combinatorial optimization

The knapsack problem is the following problem in combinatorial optimization:

<span class="mw-page-title-main">NP (complexity)</span> Complexity class used to classify decision problems

In computational complexity theory, NP is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time by a deterministic Turing machine, or alternatively the set of problems that can be solved in polynomial time by a nondeterministic Turing machine.

In computational complexity theory, the complexity class #P (pronounced "sharp P" or, sometimes "number P" or "hash P") is the set of the counting problems associated with the decision problems in the set NP. More formally, #P is the class of function problems of the form "compute f(x)", where f is the number of accepting paths of a nondeterministic Turing machine running in polynomial time. Unlike most well-known complexity classes, it is not a class of decision problems but a class of function problems. The most difficult, representative problems of this class are #P-complete.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.

A fully polynomial-time approximation scheme (FPTAS) is an algorithm for finding approximate solutions to function problems, especially optimization problems. An FPTAS takes as input an instance of the problem and a parameter ε > 0. It returns as output a value is at least times the correct value, and at most times the correct value.

In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the numeric value of the input —but not necessarily in the length of the input, which is the case for polynomial time algorithms.

In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".

In computer science, a kernelization is a technique for designing efficient algorithms that achieve their efficiency by a preprocessing stage in which inputs to the algorithm are replaced by a smaller input, called a "kernel". The result of solving the problem on the kernel should either be the same as on the original input, or it should be easy to transform the output on the kernel to the desired output for the original problem.

In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case complexity which considers the maximal complexity of the algorithm over all possible inputs.

MAXEkSAT is a problem in computational complexity theory that is a maximization version of the Boolean satisfiability problem 3SAT. In MAXEkSAT, each clause has exactly k literals, each with distinct variables, and is in conjunctive normal form. These are called k-CNF formulas. The problem is to determine the maximum number of clauses that can be satisfied by a truth assignment to the variables in the clauses.

The quadratic knapsack problem (QKP), first introduced in 19th century, is an extension of knapsack problem that allows for quadratic terms in the objective function: Given a set of items, each with a weight, a value, and an extra profit that can be earned if two items are selected, determine the number of items to include in a collection without exceeding capacity of the knapsack, so as to maximize the overall profit. Usually, quadratic knapsack problems come with a restriction on the number of copies of each kind of item: either 0, or 1. This special type of QKP forms the 0-1 quadratic knapsack problem, which was first discussed by Gallo et al. The 0-1 quadratic knapsack problem is a variation of knapsack problems, combining the features of unbounded knapsack problem, 0-1 knapsack problem and quadratic knapsack problem.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:

Identical-machines scheduling is an optimization problem in computer science and operations research. We are given n jobs J1, J2, ..., Jn of varying processing times, which need to be scheduled on m identical machines, such that a certain objective function is optimized, for example, the makespan is minimized.

Unrelated-machines scheduling is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We need to schedule n jobs J1, J2, ..., Jn on m different machines, such that a certain objective function is optimized. The time that machine i needs in order to process job j is denoted by pi,j. The term unrelated emphasizes that there is no relation between values of pi,j for different i and j. This is in contrast to two special cases of this problem: uniform-machines scheduling - in which pi,j = pi / sj, and identical-machines scheduling - in which pi,j = pi.

References

  1. 1 2 Kleinberg, Jon; Tardos, Éva (2006). Algorithm Design (2nd ed.). p.  491. ISBN   0-321-37291-3.
  2. Goodrich, Michael. "More NP complete and NP hard problems" (PDF). Archived (PDF) from the original on 2022-10-09.
  3. Garey, Michael R.; Johnson, David S. (1979), Computers and Intractability: A Guide to the Theory of NP-Completeness , W. H. Freeman, ISBN   0-7167-1045-5 , Section 3.1 and problem SP1 in Appendix A.3.1.
  4. Filmus, Yuval (30 January 2016). Answer to: "Is there a known, fast algorithm for counting all subsets that sum to below a certain number?". Theoretical Computer Science Stack Exchange. Note that Filmus' citation in support of the claim (Faliszewski, Piotr; Hemaspaandra, Lane (2009). "The complexity of power-index comparison". Theoretical Computer Science. Elsevier. 410: 101-107. DOI  10.1016/j.tcs.2008.09.034) does not in fact prove the claim, instead directing readers to another citation (Papadimitriou, Christos (1994). Computational Complexity. Addison-Wesley: Reading, MA. Chapter 9. ISBN   0-201-53082-1   via the Internet Archive), which does not explicitly prove the claim either. Papadimitriou's proof that SSP is NP-complete via reduction of 3SAT does, however, generalize to a reduction from #3SAT to #SSP.
  5. 1 2 3 Richard E. Korf, Ethan L. Schreiber, and Michael D. Moffitt (2014). "Optimal Sequential Multi-Way Number Partitioning" (PDF). Archived (PDF) from the original on 2022-10-09.{{cite web}}: CS1 maint: multiple names: authors list (link)
  6. Horowitz, Ellis; Sahni, Sartaj (1974). "Computing partitions with applications to the knapsack problem" (PDF). Journal of the Association for Computing Machinery . 21 (2): 277–292. doi:10.1145/321812.321823. hdl: 1813/5989 . MR   0354006. S2CID   16866858. Archived (PDF) from the original on 2022-10-09.
  7. "The Two-Sum Problem" (PDF). Archived (PDF) from the original on 2022-10-09.
  8. Schroeppel, Richard; Shamir, Adi (1981-08-01). "A T = O(2n/2), S = O(2n/4) algorithm for certain NP-complete problems". SIAM Journal on Computing. 10 (3): 456–464. doi:10.1137/0210033. ISSN   0097-5397.
  9. Howgrave-Graham, Nick; Joux, Antoine (2010). "New Generic Algorithms for Hard Knapsacks". In Gilbert, Henri (ed.). Advances in Cryptology – EUROCRYPT 2010. Lecture Notes in Computer Science. Vol. 6110. Berlin, Heidelberg: Springer. pp. 235–256. doi: 10.1007/978-3-642-13190-5_12 . ISBN   978-3-642-13190-5.
  10. Becker, Anja; Joux, Antoine (2010). "Improved Generic Algorithms for Hard Knapsacks". In Gilbert, Henri (ed.). Advances in Cryptology – EUROCRYPT 2011. Lecture Notes in Computer Science. Vol. 6632. Berlin, Heidelberg: Springer. pp. 364–385. doi: 10.1007/978-3-642-20465-4_21 . ISBN   978-3-642-20465-4.
  11. Pisinger, David (1999). "Linear time algorithms for knapsack problems with bounded weights". Journal of Algorithms. 33 (1): 1–14. doi:10.1006/jagm.1999.1034. MR   1712690.
  12. Koiliaris, Konstantinos; Xu, Chao (2015-07-08). "A Faster Pseudopolynomial Time Algorithm for Subset Sum". arXiv: 1507.02318 [cs.DS].
  13. Bringmann, Karl (2017). "A near-linear pseudopolynomial time algorithm for subset sum". In Klein, Philip N. (ed.). Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2017). SIAM. pp. 1073–1084. arXiv: 1610.04712 . doi:10.1137/1.9781611974782.69.
  14. Curtis, V. V.; Sanches, C. A. A. (January 2016). "An efficient solution to the subset-sum problem on GPU: An efficient solution to the subset-sum problem on GPU". Concurrency and Computation: Practice and Experience. 28 (1): 95–113. doi:10.1002/cpe.3636. S2CID   20927927.
  15. Curtis, V. V.; Sanches, C. A. A. (July 2017). "A low-space algorithm for the subset-sum problem on GPU". Computers & Operations Research. 83: 120–124. doi:10.1016/j.cor.2017.02.006.
  16. Caprara, Alberto; Kellerer, Hans; Pferschy, Ulrich (2000-02-01). "The Multiple Subset Sum Problem". SIAM Journal on Optimization. 11 (2): 308–319. doi:10.1137/S1052623498348481. ISSN   1052-6234.
  17. Kellerer, Hans; Mansini, Renata; Pferschy, Ulrich; Speranza, Maria Grazia (2003-03-01). "An efficient fully polynomial approximation scheme for the Subset-Sum Problem". Journal of Computer and System Sciences. 66 (2): 349–370. doi:10.1016/S0022-0000(03)00006-0. ISSN   0022-0000.
  18. Hans Kellerer; Ulrich Pferschy; David Pisinger (2004). Knapsack problems. Springer. p. 97. ISBN   9783540402862.

Further reading