Greedy number partitioning

Last updated

In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible. Greedy algorithms process the numbers sequentially, and insert the next number into a bin in which the sum of numbers is currently smallest.

Contents

Approximate algorithms

The simplest greedy partitioning algorithm is called list scheduling . It just processes the inputs in any order they arrive. It always returns a partition in which the largest sum is at most times the optimal (minimum) largest sum. [1] This heuristic can be used as an online algorithm, when the order in which the items arrive cannot be controlled.

An improved greedy algorithm is called LPT scheduling. It processes the inputs by descending order of value, from large to small. Since it needs to pre-order the inputs, it can be used only as an offline algorithm. It guarantees that the largest sum is at most times the optimal (minimum) largest sum, and the smallest sum is at least times the optimal (maximum) smallest sum. See LPT scheduling for more details.

Complete greedy algorithm

The complete greedy algorithm (CGA) is an exact algorithm, i.e., it always finds an optimal solution. It works in the following way. After sorting the numbers in descending order (as in LPT), it constructs a k-ary tree. Each level corresponds to a number, and each of the k branches corresponds to a different set in which the current number can be put. Traversing the tree in depth-first order requires only O(n) space, but might take O(kn) time. The runtime can be improved by using the greedy heuristic: in each level, develop first the branch in which the current number is put in the set with the smallest sum. This algorithm finds the greedy (LPT) solution first, but then proceeds to look for better solutions.

Several additional heuristics can be used to improve the runtime: [2]

Generalizations

In the fair item allocation problem, there are n items and k people, each of which assigns a possibly different value to each item. The goal is to partition the items among the people in as fair way as possible. The natural generalization of the greedy number partitioning algorithm is the envy-graph algorithm. It guarantees that the allocation is envy-free up to at most one item (EF1). Moreover, if the instance is ordered (- all agents rank the items in the same order), then the outcome is EFX, and guarantees to each agent at least of his maximin share. If the items are chores, then a similar algorithm guarantees MMS. [3]

Related Research Articles

Knapsack problem Problem in combinatorial optimization

The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively.

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP. Moreover, some restricted variants of it are NP-complete too, for example:

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media and technology mapping in FPGA semiconductor chip design.

Decision tree learning Machine learning algorithm

Decision Tree Learning is supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

A fully polynomial-time approximation scheme (FPTAS) is an algorithm for finding approximately-optimal solutions to optimization problems. An FPTAS takes as input an instance of the problem and a parameter ε > 0. It returns as output a solution whose value is -

List scheduling is a greedy algorithm for Identical-machines scheduling. The input to this algorithm is a list of jobs that should be executed on a set of m machines. The list is ordered in a fixed order, which can be determined e.g. by the priority of executing the jobs, or by their order of arrival. The algorithm repeatedly executes the following steps until a valid schedule is obtained:

In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".

Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far.

In the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster distances.

In computational geometry, a maximum disjoint set (MDS) is a largest set of non-overlapping geometric shapes selected from a given set of candidate shapes.

Maximin share (MMS) is a criterion of fair item allocation. Given a set of items with different values, the 1-out-of-n maximin-share is the maximum value that can be gained by partitioning the items into n parts and taking the part with the minimum value.

In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the Identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.

In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as LDM.

The multifit algorithm is an algorithm for multiway number partitioning, originally developed for the problem of identical-machines scheduling. It was developed by Coffman, Garey and Johnson. Its novelty comes from the fact that it uses an algorithm for another famous problem - the bin packing problem - as a subroutine.

Egalitarian item allocation, also called max-min item allocation is a fair item allocation problem, in which the fairness criterion follows the egalitarian rule. The goal is to maximize the minimum value of an agent. That is, among all possible allocations, the goal is to find an allocation in which the smallest value of an agent is as large as possible. In case there are two or more allocations with the same smallest value, then the goal is to select, from among these allocations, the one in which the second-smallest value is as large as possible, and so on. Therefore, an egalitarian item allocation is sometimes called a leximin item allocation.

The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:

Identical-machines scheduling is an optimization problem in computer science and operations research. We are given n jobs J1, J2, ..., Jn of varying processing times, which need to be scheduled on m identical machines, such that a certain objective function is optimized, for example, the makespan is minimized.

Longest-processing-time-first (LPT) is a greedy algorithm for job scheduling. The input to the algorithm is a set of jobs, each of which has a specific processing-time. There is also a number m specifying the number of machines that can process the jobs. The LPT algorithm works as follows:

  1. Order the jobs by descending order of their processing-time, such that the job with the longest processing time is first.
  2. Schedule each job in this sequence into a machine in which the current load is smallest.

Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of n items of different sizes, and two integers mk. The output is a partition of the items into m subsets, such that the number of items in each subset is at most k. Subject to this, it is required that the sums of sizes in the m subsets are as similar as possible.

References

  1. Graham, Ron L. (1969-03-01). "Bounds on Multiprocessing Timing Anomalies". SIAM Journal on Applied Mathematics. 17 (2): 416–429. doi:10.1137/0117039. ISSN   0036-1399.
  2. Korf, Richard E. (August 1995). "From approximate to optimal solutions: A case study of number partitioning". In Mellish, Chris S. (ed.). IJCAI'95: Proceedings of the 14th international joint conference on Artificial intelligence. pp. 266–272. ISBN   978-1-55860-363-9.
  3. Barman, Siddharth; Krishnamurthy, Sanath Kumar (21 April 2020). "Approximation Algorithms for Maximin Fair Division". ACM Transactions on Economics and Computation. 8 (1): 1–28. arXiv: 1703.01851 . doi:10.1145/3381525. S2CID   217191332.

Further reading