Multiway number partitioning

Last updated

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. [1] :sec.5 The problem is parametrized by a positive integer k, and called k-way number partitioning. [2] The input to the problem is a multiset S of numbers (usually integers), whose sum is k*T.

Contents

The associated decision problem is to decide whether S can be partitioned into k subsets such that the sum of each subset is exactly T. There is also an optimization problem: find a partition of S into k subsets, such that the k sums are "as near as possible". The exact optimization objective can be defined in several ways:

These three objective functions are equivalent when k=2, but they are all different when k≥3. [6] [7]

All these problems are NP-hard, [8] but there are various algorithms that solve it efficiently in many cases.

Some closely-related problems are:

Approximation algorithms

There are various algorithms that obtain a guaranteed approximation of the optimal solution in polynomial time. There are different approximation algorithms for different objectives.

Minimizing the largest sum

The approximation ratio in this context is the largest sum in the solution returned by the algorithm, divided by the largest sum in the optimal solution (the ratio is larger than 1). Most algorithms below were developed for identical-machines scheduling.

Several polynomial-time approximation schemes (PTAS) have been developed:

Maximizing the smallest sum

The approximation ratio in this context is the smallest sum in the solution returned by the algorithm, divided by the smallest sum in the optimal solution (the ratio is less than 1).

Maximizing the sum of products

Jin [13] studies a problem in which the goal is to maximize the sum, over every set i in 1,...,k, of the product of numbers in set i. In a more general variant, each set i may have a weight wi, and the goal is to maximize the weighted sum of products. This problem has an exact solution that runs in time O(n2).

A PTAS for general objective functions

Let Ci (for i between 1 and k) be the sum of subset i in a given partition. Instead of minimizing the objective function max(Ci), one can minimize the objective function max(f(Ci)), where f is any fixed function. Similarly, one can minimize the objective function sum(f(Ci)), or maximize min(f(Ci)), or maximize sum(f(Ci)). Alon, Azar, Woeginger and Yadid [14] presented general PTAS-s (generalizing the PTAS-s of Sanhi, Hochbaum and Shmoys, and Woeginger) for these four problems. Their algorithm works for any f which satisfies the following two conditions:

  1. A strong continuity condition called Condition F*: for every ε>0 there exists δ>0 such that, if |y-x|<δx, then |f(y)-f(x)|<εf(x).
  2. Convexity (for the minimization problems) or concavity (for the maximization problems).

The runtime of their PTAS-s is linear in n (the number of inputs), but exponential in the approximation precision. The PTAS for minimizing sum(f(Ci)) is based on some combinatorial observations:

  1. Let L := the average sum in a single subset (1/k the sum of all inputs). If some input x is at least L, then there is an optimal partition in which one part contains only x. This follows from the convexity of f. Therefore, the input can be pre-processes by assigning each such input to a unique subset. After this preprocessing, one can assume that all inputs are smaller than L.
  2. There is an optimal partition in which all subsets sums are strictly betweel L/2 and 2L (L/2 < Ci < 2L for all i in 1,...,k). Particularly, the partition minimizing the sum of squares Ci2, among all optimal partitions, satisfies these inequalities.

The PTAS uses an input rounding technique. Given the input sequence S = (v1,...,vn) and a positive integer d, the rounded sequence S#(d) is defined as follows:

In S#(d), all inputs are integer multiples of L/d2. Moreover, the above two observations hold for S#(d) too:

  1. Let L# be the average sum in a single subset (1/k the sum of all inputs in S#(d)). By construction, L# is at least L. Since L itself is an integer multiple of L/d2, the rounding-up of inputs smaller than L cannot make them larger than L. Therefore, all inputs in S#(d) are smaller than L, and hence smaller than L#.
  2. There is an optimal partition of S#(d) in which all subset sums are strictly between L#/2 and 2L#. Therefore, all subsets contain at most 2d elements (since all inputs in S#(d) are at least L/d).

Based on these observations, all inputs in S#(d) are of the form hL/d2, where h is an integer in the range . Therefore, the input can be represented as an integer vector , where is the number of hL/d2 inputs in S#(d). Moreover, each subset can be represented as an integer vector , where is the number of hL/d2 inputs in the subset. The subset sum is then . Denote by , the set of vectors with . Since the sum of elements in such a vector is at most 2d, the total number of these elements is smaller than , so .

There are two different ways to find an optimal solution to S#(d). One way uses dynamic programming: its run-time is a polynomial whose exponent depends on d. The other way uses Lenstra's algorithm for integer linear programming.

Dynamic programming solution

Define as the optimal (minimum) value of the objective function sum(f(Ci)), when the input vector is and it has to be partitioned into k subsets, among all partitions in which all subset sums are strictly between L#/2 and 2L#.

It can be solved by the following recurrence relation:

  • - since their objective sum is empty.
  • if - since all inputs must be assigned to a single subset, so its sum is .
  • otherwise - since we do not consider optimal solutions outside this range.
  • for all : we check all options for the k-th subset, and combine it with an optimal partition of the remainder into k-1 subsets.

For each k and n, the recurrence relation requires to check at most vectors. The total number of vectors n to check is at most , where n is the original number of inputs. Therefore, the run-time of the dynamic programming algorithm is . It is linear in n for any fixed d.

Integer linear programming solution

For each vector t in T, introduce a variable xt denoting the number of subsets with this configuration. Minimizing sum(f(Ci)) can be attained by the solving the following ILP:

  • Minimize
  • subject to (the total number of subsets)
  • and (the total vector of inputs)
  • and .

The number of variables is at most , and the number of equations is - both are constants independent of n, k. Therefore, Lenstra's algorithm can be used. Its run-time is exponential in the dimension (), but polynomial in the binary representation of the coefficients, which are in O(log(n)). Constructing the ILP itself takes time O(n).

Converting the solution from the rounded to the original instance

The following lemmas relate the partitions of the rounded instance S#(d) and the original instance S.

  • For every partition of S with sums Ci, there is a partition of S#(d) with sums Ci#, where .
  • For every partition of S#(d) with sums Ci#, there is a partition of S with sums Ci, where , and it can be found in time O(n).

Given a desired approximation precision ε>0, let δ>0 be the constant corresponding to ε/3, whose existence is guaranteed by Condition F*. Let . It is possible to show that converted partition of S has a total cost of at most , so the approximation ratio is 1+ε.

Non-existence of PTAS for some objective functions

In contrast to the above result, if we take f(x) = 2x, or f(x)=(x-1)2, then no PTAS for minimizing sum(f(Ci)) exists unless P=NP. [14] :Sec.4 Note that these f(x) are convex, but they do not satisfy Condition F* above. The proof is by reduction from partition problem.

Exact algorithms

There are exact algorithms, that always find the optimal partition. Since the problem is NP-hard, such algorithms might take exponential time in general, but may be practically usable in certain cases.

Reduction to bin packing

The bin packing problem has many fast solvers. A BP solver can be used to find an optimal number partitioning. [25] The idea is to use binary search to find the optimal makespan. To initialize the binary search, we need a lower bound and an upper bound:

Given a lower and an upper bound, run the BP solver with bin size middle := (lower+upper)/2.

Variants

In the balanced number partitioning problem, there are constraints on the number of items that can be allocated to each subset (these are called cardinality constraints).

Another variant is the multidimensional number partitioning. [26]

Applications

One application of the partition problem is for manipulation of elections. Suppose there are three candidates (A, B and C). A single candidate should be elected using the veto voting rule, i.e., each voter vetoes a single candidate and the candidate with the fewest vetoes wins. If a coalition wants to ensure that C is elected, they should partition their vetoes among A and B so as to maximize the smallest number of vetoes each of them gets. If the votes are weighted, then the problem can be reduced to the partition problem, and thus it can be solved efficiently using CKK. For k=2, the same is true for any other voting rule that is based on scoring. However, for k>2 and other voting rules, some other techniques are required. [3]

Implementations

Related Research Articles

<span class="mw-page-title-main">Knapsack problem</span> Problem in combinatorial optimization

The knapsack problem is the following problem in combinatorial optimization:

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-hard. Moreover, some restricted variants of it are NP-complete too, for example:

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

Set packing is a classical NP-complete problem in computational complexity theory and combinatorics, and was one of Karp's 21 NP-complete problems. Suppose one has a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint.

A fully polynomial-time approximation scheme (FPTAS) is an algorithm for finding approximate solutions to function problems, especially optimization problems. An FPTAS takes as input an instance of the problem and a parameter ε > 0. It returns as output a value which is at least times the correct value, and at most times the correct value.

In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".

Uniform machine scheduling is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We are given n jobs J1, J2, ..., Jn of varying processing times, which need to be scheduled on m different machines. The goal is to minimize the makespan - the total time required to execute the schedule. The time that machine i needs in order to process job j is denoted by pi,j. In the general case, the times pi,j are unrelated, and any matrix of positive processing times is possible. In the specific variant called uniform machine scheduling, some machines are uniformly faster than others. This means that, for each machine i, there is a speed factor si, and the run-time of job j on machine i is pi,j = pj / si.

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically just one. These algorithms are designed to operate with limited memory, generally logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item.

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied as a streaming algorithm and the objective is, given a sequence of points, to construct a good clustering of the stream, using a small amount of memory and time.

In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible. Greedy algorithms process the numbers sequentially, and insert the next number into a bin in which the sum of numbers is currently smallest.

In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as LDM.

The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:

Identical-machines scheduling is an optimization problem in computer science and operations research. We are given n jobs J1, J2, ..., Jn of varying processing times, which need to be scheduled on m identical machines, such that a certain objective function is optimized, for example, the makespan is minimized.

Unrelated-machines scheduling is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We need to schedule n jobs J1, J2, ..., Jn on m different machines, such that a certain objective function is optimized. The time that machine i needs in order to process job j is denoted by pi,j. The term unrelated emphasizes that there is no relation between values of pi,j for different i and j. This is in contrast to two special cases of this problem: uniform-machines scheduling - in which pi,j = pi / sj, and identical-machines scheduling - in which pi,j = pi.

Longest-processing-time-first (LPT) is a greedy algorithm for job scheduling. The input to the algorithm is a set of jobs, each of which has a specific processing-time. There is also a number m specifying the number of machines that can process the jobs. The LPT algorithm works as follows:

  1. Order the jobs by descending order of their processing-time, such that the job with the longest processing time is first.
  2. Schedule each job in this sequence into a machine in which the current load is smallest.

Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of n items of different sizes, and two integers mk. The output is a partition of the items into m subsets, such that the number of items in each subset is at most k. Subject to this, it is required that the sums of sizes in the m subsets are as similar as possible.

The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.

References

  1. 1 2 Graham, Ron L. (1969-03-01). "Bounds on Multiprocessing Timing Anomalies". SIAM Journal on Applied Mathematics. 17 (2): 416–429. doi:10.1137/0117039. ISSN   0036-1399.
  2. 1 2 Mertens, Stephan (2006), "The Easiest Hard Problem: Number Partitioning", in Allon Percus; Gabriel Istrate; Cristopher Moore (eds.), Computational complexity and statistical physics, Oxford University Press US, p. 125, arXiv: cond-mat/0310317 , Bibcode:2003cond.mat.10317M, ISBN   978-0-19-517737-4
  3. 1 2 Walsh, Toby (2009-07-11). "Where Are the Really Hard Manipulation Problems? The Phase Transition in Manipulating the Veto Rule" (PDF). Written at Pasadena, California, USA. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence. San Francisco, California, USA: Morgan Kaufmann Publishers Inc. pp. 324–329. Archived (PDF) from the original on 2020-07-10. Retrieved 2021-10-05.
  4. Friesen, D. K.; Deuermeyer, B. L. (1981-02-01). "Analysis of Greedy Solutions for a Replacement Part Sequencing Problem". Mathematics of Operations Research . 6 (1): 74–87. doi:10.1287/moor.6.1.74. ISSN   0364-765X.
  5. Deuermeyer, Bryan L.; Friesen, Donald K.; Langston, Michael A. (1982-06-01). "Scheduling to Maximize the Minimum Processor Finish Time in a Multiprocessor System". SIAM Journal on Algebraic and Discrete Methods. 3 (2): 190–196. doi:10.1137/0603019. ISSN   0196-5212.
  6. Korf, Richard Earl (2010-08-25). "Objective Functions for Multi-Way Number Partitioning". Third Annual Symposium on Combinatorial Search. 1: 71–72. doi: 10.1609/socs.v1i1.18172 . S2CID   45875088.
  7. Walter, Rico (2013-01-01). "Comparing the minimum completion times of two longest-first scheduling-heuristics". Central European Journal of Operations Research. 21 (1): 125–139. doi:10.1007/s10100-011-0217-4. ISSN   1613-9178. S2CID   17222989.
  8. Garey, Michael R.; Johnson, David S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness . W. H. Freeman and Company. p.  238. ISBN   978-0716710448.
  9. 1 2 Hochbaum, Dorit S.; Shmoys, David B. (1987-01-01). "Using dual approximation algorithms for scheduling problems theoretical and practical results". Journal of the ACM. 34 (1): 144–162. doi: 10.1145/7531.7535 . ISSN   0004-5411. S2CID   9739129.
  10. 1 2 Sahni, Sartaj K. (1976-01-01). "Algorithms for Scheduling Independent Tasks". Journal of the ACM. 23 (1): 116–127. doi: 10.1145/321921.321934 . ISSN   0004-5411. S2CID   10956951.
  11. 1 2 Woeginger, Gerhard J. (1997-05-01). "A polynomial-time approximation scheme for maximizing the minimum machine completion time". Operations Research Letters . 20 (4): 149–154. doi:10.1016/S0167-6377(96)00055-7. ISSN   0167-6377.
  12. Csirik, János; Kellerer, Hans; Woeginger, Gerhard (1992-06-01). "The exact LPT-bound for maximizing the minimum completion time". Operations Research Letters . 11 (5): 281–287. doi:10.1016/0167-6377(92)90004-M. ISSN   0167-6377.
  13. Jin, Kai (2017). "Optimal Partitioning Which Maximizes the Weighted Sum of Products". In Xiao, Mingyu; Rosamond, Frances (eds.). Frontiers in Algorithmics. Lecture Notes in Computer Science. Vol. 10336. Cham: Springer International Publishing. pp. 127–138. doi:10.1007/978-3-319-59605-1_12. ISBN   978-3-319-59605-1.
  14. 1 2 Alon, Noga; Azar, Yossi; Woeginger, Gerhard J.; Yadid, Tal (1998). "Approximation schemes for scheduling on parallel machines". Journal of Scheduling. 1 (1): 55–66. doi:10.1002/(SICI)1099-1425(199806)1:1<55::AID-JOS2>3.0.CO;2-J. ISSN   1099-1425.
  15. 1 2 Korf, Richard E. (2009). Multi-Way Number Partitioning (PDF). IJCAI.
  16. Korf, Richard E. (1995-08-20). "From approximate to optimal solutions: a case study of number partitioning". Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 1. IJCAI'95. Montreal, Quebec, Canada: Morgan Kaufmann Publishers Inc.: 266–272. ISBN   978-1-55860-363-9.
  17. Korf, Richard E. (1998-12-01). "A complete anytime algorithm for number partitioning". Artificial Intelligence. 106 (2): 181–203. doi: 10.1016/S0004-3702(98)00086-1 . ISSN   0004-3702.
  18. Schreiber, Ethan L.; Korf, Richard E.; Moffitt, Michael D. (2018-07-25). "Optimal Multi-Way Number Partitioning". Journal of the ACM. 65 (4): 24:1–24:61. doi: 10.1145/3184400 . ISSN   0004-5411. S2CID   63854223.
  19. Korf, Richard E. (2011-07-16). "A hybrid recursive multi-way number partitioning algorithm". Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume One. IJCAI'11. Barcelona, Catalonia, Spain: AAAI Press: 591–596. ISBN   978-1-57735-513-7.
  20. Schreiber, Ethan L.; Korf, Richard E. (2013-08-03). "Improved bin completion for optimal bin packing and number partitioning" (PDF). Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. IJCAI '13. Beijing, China: AAAI Press: 651–658. ISBN   978-1-57735-633-2.
  21. Moffitt, Michael D. (2013-08-03). "Search strategies for optimal multi-way number partitioning" (PDF). Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. IJCAI '13. Beijing, China: AAAI Press: 623–629. ISBN   978-1-57735-633-2.
  22. Korf, Richard; Schreiber, Ethan (2013-06-02). "Optimally Scheduling Small Numbers of Identical Parallel Machines". Proceedings of the International Conference on Automated Planning and Scheduling. 23: 144–152. doi: 10.1609/icaps.v23i1.13544 . ISSN   2334-0843. S2CID   12458816.
  23. Schreiber, Ethan L.; Korf, Richard E. (2014-07-27). "Cached Iterative Weakening for Optimal Multi-Way Number Partitioning". Proceedings of the AAAI Conference on Artificial Intelligence. AAAI'14. Québec City, Québec, Canada: AAAI Press. 28: 2738–2744. doi: 10.1609/aaai.v28i1.9122 . S2CID   8594071.
  24. Richard E. Korf, Ethan L. Schreiber, and Michael D. Moffitt (2014). "Optimal Sequential Multi-Way Number Partitioning" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
  25. Schreiber, Ethan L.; Korf, Richard E. (2013-08-03). "Improved bin completion for optimal bin packing and number partitioning". Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. IJCAI '13. Beijing, China: AAAI Press: 651–658. ISBN   978-1-57735-633-2.
  26. Pop, Petrică C.; Matei, Oliviu (2013-11-01). "A memetic algorithm approach for solving the multidimensional multi-way number partitioning problem". Applied Mathematical Modelling. 37 (22): 9191–9202. doi: 10.1016/j.apm.2013.03.075 . ISSN   0307-904X.