Pseudopolynomial time number partitioning

Last updated

In computer science, pseudopolynomial time number partitioning is a pseudopolynomial time algorithm for solving the partition problem.

Contents

The problem can be solved using dynamic programming when the size of the set and the size of the sum of the integers in the set are not too big to render the storage requirements infeasible.

Suppose the input to the algorithm is a multiset of cardinality :

S = {x1, ..., xN}

Let K be the sum of all elements in S. That is: K = x1 + ... + xN. We will build an algorithm that determines whether there is a subset of S that sums to . If there is a subset, then:

if K is even, the rest of S also sums to
if K is odd, then the rest of S sums to . This is as good a solution as possible.

e.g.1 S = {1, 2, 3, 5}, K = sum(S) = 11, K/2 = 5, Find a subset from S that is closest to K/2 -> {2, 3} = 5, 11 - 5 * 2 = 1

e.g.2 S = {1, 3, 7}, K = sum(S) = 11, K/2 = 5, Find a subset from S that is closest to K/2 -> {1, 3} = 4, 11 - 4 * 2 = 3

Recurrence relation

We wish to determine if there is a subset of S that sums to . Let:

p(i, j) be True if a subset of { x1, ..., xj } sums to i and False otherwise.

Then p(, N) is True if and only if there is a subset of S that sums to . The goal of our algorithm will be to compute p(, N). In aid of this, we have the following recurrence relation:

p(i, j) is True if either p(i, j − 1) is True or if p(ixj, j − 1) is True
p(i, j) is False otherwise

The reasoning for this is as follows: there is some subset of S that sums to i using numbers

x1, ..., xj

if and only if either of the following is true:

There is a subset of { x1, ..., xj−1 } that sums to i;
there is a subset of { x1, ..., xj−1 } that sums to ixj, since xj + that subset's sum = i.

The pseudo-polynomial algorithm

The algorithm consists of building up a table of size by containing the values of the recurrence. Remember that is the sum of all elements in . Once the entire table is filled in, we return . Below is a depiction of the table . There is a blue arrow from one block to another if the value of the target-block might depend on the value of the source-block. This dependence is a property of the recurrence relation.

Dependencies of table entry (i, j) Partition Problem DP table showing dependencies.png
Dependencies of table entry (i, j)
function can_be_partitioned_equally(S) isinput: A list of integers S.     output: True if S can be partitioned into two subsets that have equal sum.      n ← |S|     K ← sum(S)     P ← empty boolean table of size ( + 1) by (n + 1)initialize top row (P(0,x)) of P to True     initialize leftmost column (P(x, 0)) of P, except for P(0, 0) to False      forifrom 1 toforjfrom 1 to n             x = S[j-1]             if (i-x) >= 0 thenP(i, j) ← P(i, j-1) orP(i-x, j-1)             elseP(i, j) ← P(i, j-1)     returnP(, n)

Example

Below is the table P for the example set used above S = {3, 1, 1, 2, 2, 1}:

Result of example execution of algorithm on the table P Partition Prob DP table example.jpg
Result of example execution of algorithm on the table P

Analysis

This algorithm runs in time O(K/2 N), where N is the number of elements in the input set and K is the sum of elements in the input set.

The algorithm can be extended to the k-way multi-partitioning problem, but then takes O(n(k − 1)mk − 1) memory where m is the largest number in the input, making it impractical even for k = 3 unless the inputs are very small numbers. [1]

This algorithm can be generalized to a solution for the subset sum problem.

Related Research Articles

Binary search algorithm Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

Knapsack problem Problem in combinatorial optimization

The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively.

In computer science, the subset sum problem is an important decision problem in complexity theory and cryptography. There are several equivalent formulations of the problem. One of them is: given a set of integers, is there a non-empty subset whose sum is zero? For example, given the set , the answer is yes because the subset sums to zero. The problem is NP-complete, meaning roughly that while it is easy to confirm whether a proposed solution is valid, it may inherently be prohibitively difficult to determine in the first place whether any solution exists.

Floor and ceiling functions Mathematical functions taking a real input and rounding it down or up, respectively

In mathematics and computer science, the floor function is the function that takes as input a real number , and gives as output the greatest integer less than or equal to , denoted or . Similarly, the ceiling function maps to the least integer greater than or equal to , denoted or .

Derangement

In combinatorial mathematics, a derangement is a permutation of the elements of a set, such that no element appears in its original position. In other words, a derangement is a permutation that has no fixed points.

In computer science, the Akra–Bazzi method, or Akra–Bazzi theorem, is used to analyze the asymptotic behavior of the mathematical recurrences that appear in the analysis of divide and conquer algorithms where the sub-problems have substantially different sizes. It is a generalization of the master theorem for divide-and-conquer recurrences, which assumes that the sub-problems have equal size. It is named after mathematicians Mohamad Akra and Louay Bazzi.

Turán graph

The Turán graphT(n,r) is a complete multipartite graph formed by partitioning a set of n vertices into r subsets, with sizes as equal as possible, and connecting two vertices by an edge if and only if they belong to different subsets. The graph will have subsets of size , and subsets of size . That is, it is a complete r-partite graph

Time complexity An estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to differ by at most a constant factor.

Sperner's theorem, in discrete mathematics, describes the largest possible families of finite sets none of which contain any other sets in the family. It is one of the central results in extremal set theory. It is named after Emanuel Sperner, who published it in 1928.

In the mathematical field of enumerative combinatorics, identities are sometimes established by arguments that rely on singling out one "distinguished element" of a set.

Cutting-plane method optimization technique for solving (mixed) integer linear programs

In mathematical optimization, the cutting-plane method is any of a variety of optimization methods that iteratively refine a feasible set or objective function by means of linear inequalities, termed cuts. Such procedures are commonly used to find integer solutions to mixed integer linear programming (MILP) problems, as well as to solve general, not necessarily differentiable convex optimization problems. The use of cutting planes to solve MILP was introduced by Ralph E. Gomory.

Stirling numbers of the second kind Numbers parameterizing ways to partition a set

In mathematics, particularly in combinatorics, a Stirling number of the second kind is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by or . Stirling numbers of the second kind occur in the field of mathematics called combinatorics and the study of partitions.

Quicksort A divide and conquer sorting algorithm

Quicksort is an efficient sorting algorithm. Developed by British computer scientist Tony Hoare in 1959 and published in 1961, it is still a commonly used algorithm for sorting. When implemented well, it can be about two or three times faster than its main competitors, merge sort and heapsort.

In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".

Rand index

The Rand index or Rand measure in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used.

In mathematics, a Beatty sequence is the sequence of integers found by taking the floor of the positive multiples of a positive irrational number. Beatty sequences are named after Samuel Beatty, who wrote about them in 1926.

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, in which the size of the data structure depends upon the particular data being represented.

Dedekind number Combinatorial sequence of numbers

In mathematics, the Dedekind numbers are a rapidly growing sequence of integers named after Richard Dedekind, who defined them in 1897. The Dedekind number M(n) counts the number of monotone boolean functions of n variables. Equivalently, it counts the number of antichains of subsets of an n-element set, the number of elements in a free distributive lattice with n generators, or the number of abstract simplicial complexes with n elements.

In computer science, merge-insertion sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson. It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the contest of the multiprocessor scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k T.

References

  1. Korf, Richard E. (2009). Multi-Way Number Partitioning (PDF). IJCAI.