Longest-processing-time-first (LPT) is a greedy algorithm for job scheduling. The input to the algorithm is a set of jobs, each of which has a specific processing-time. There is also a number m specifying the number of machines that can process the jobs. The LPT algorithm works as follows:
Step 2 of the algorithm is essentially the list-scheduling (LS) algorithm. The difference is that LS loops over the jobs in an arbitrary order, while LPT pre-orders them by descending processing time.
LPT was first analyzed by Ronald Graham in the 1960s in the context of the identical-machines scheduling problem. [1] Later, it was applied to many other variants of the problem.
LPT can also be described in a more abstract way, as an algorithm for multiway number partitioning. The input is a set S of numbers, and a positive integer m; the output is a partition of S into m subsets. LPT orders the input from largest to smallest, and puts each input in turn into the part with the smallest sum so far.
If the input set is S = {4, 5, 6, 7, 8} and m = 2, then the resulting partition is {8, 5, 4}, {7, 6}. If m = 3, then the resulting 3-way partition is {8}, {7, 4}, {6, 5}.
LPT might not find the optimal partition. For example, in the above instance the optimal partition {8,7}, {6,5,4}, where both sums are equal to 15. However, its suboptimality is bounded both in the worst case and in the average case; see Performance guarantees below.
The running time of LPT is dominated by the sorting, which takes O(n log n) time, where n is the number of inputs.
LPT is monotone in the sense that, if one of the input numbers increases, the objective function (the largest sum or the smallest sum of a subset in the output) weakly increases. [2] This is in contrast to Multifit algorithm.
When used for identical-machines scheduling, LPT attains the following approximation ratios.
In the worst case, the largest sum in the greedy partition is at most times the optimal (minimum) largest sum. [3] [lower-alpha 1]
A more detailed analysis yields a factor of times the optimal (minimum) largest sum. [1] [4] (for example, when m =2 this ratio is ). [lower-alpha 2]
The factor is tight. Suppose there are inputs (where m is even): . Then the greedy algorithm returns:
with a maximum of , but the optimal partition is:
with a maximum of .
An even more detailed analysis takes into account the number of inputs in the max-sum part.
In the worst case, the smallest sum in the returned partition is at least times the optimal (maximum) smallest sum. [6]
The proof is by contradiction. We consider a minimal counterexample, that is, a counterexample with a smallest m and fewest input numbers. Denote the greedy partition by P1,...,Pm, and the optimal partition by Q1,...,Qm. Some properties of a minimal counterexample are:
The proof that a minimal counterexample does not exist uses a weighting scheme. Each input x is assigned a weight w(x) according to its size and greedy bundle Pi:
This weighting scheme has the following properties:
A more sophisticated analysis shows that the ratio is at most (for example, when m=2 the ratio is 5/6). [7] [8]
The above ratio is tight. [6]
Suppose there are 3m-1 inputs (where m is even). The first 2m inputs are: 2m-1, 2m-1, 2m-2, 2m-2, ..., m, m. The last m-1 inputs are all m. Then the greedy algorithm returns:
with a minimum of 3m-1. But the optimal partition is:
with a minimum of 4m-2.
There is a variant of LPT, called Restricted-LPT or RLPT, [9] in which the inputs are partitioned into subsets of size m called ranks (rank 1 contains the largest m inputs, rank 2 the next-largest m inputs, etc.). The inputs in each rank must be assigned to m different bins: rank 1 first, then rank 2, etc. The minimum sum in RLPT is at most the minimum sum at LPT. The approximation ratio of RLPT for maximizing the minimum sum is at most m.
In the average case, if the input numbers are distributed uniformly in [0,1], then the largest sum in an LPT schedule satisfies the following properties:
Let Ci (for i between 1 and m) be the sum of subset i in a given partition. Instead of minimizing the objective function max(Ci), one can minimize the objective function max(f(Ci)), where f is any fixed function. Similarly, one can minimize the objective function sum(f(Ci)). Alon, Azar, Woeginger and Yadid [13] prove that, if f satisfies the following two conditions:
Then the LPT rule has a finite approximation ratio for minimizing sum(f(Ci)).
An important special case is that the item sizes form a divisible sequence (also called factored). A special case of divisible item sizes occurs in memory allocation in computer systems, where the item sizes are all powers of 2. If the item sizes are divisible, and in addition, the largest item sizes divides the bin size, then LPT always finds a scheduling that minimizes the maximum size, [14] : Thm.4 and maximizes the minimum size. [14] : Thm.5
Besides the simple case of identical-machines scheduling, LPT has been adapted to more general settings.
In uniform-machines scheduling, different machines may have different speeds. The LPT rule assigns each job to the machine on which its completion time will be earliest (that is, LPT may assign a job to a machine with a larger current load, if this machine is so fast that it would finish that job earlier than all other machines). [15]
In the balanced partition problem, there are constraints on the number of jobs that can be assigned to each machine. A simple constraint is that each machine can process at most c jobs. The LPT rule assigns each job to the machine with the smallest load from among those with fewer than c jobs. This rule is called modified LPT or MLPT.
Another constraint is that the number of jobs on all machines should be rounded either up or down. In an adaptation of LPT called restricted LPT or RLPT, inputs are assigned in pairs - one to each machine (for m=2 machines). [10] The resulting partition is balanced by design.
In the kernel partitioning problem, there are some m pre-specified jobs called kernels, and each kernel must be scheduled to a unique machine. An equivalent problem is scheduling when machines are available in different times: each machine i becomes available at some time ti ≥ 0 (the time ti can be thought of as the length of the kernel job).
A simple heuristic algorithm, called SLPT, [23] assigns each kernel to a different subset, and then runs the LPT algorithm.
Often, the inputs come online, and their sizes becomes known only when they arrive. In this case, it is not possible to sort them in advance. List scheduling is a similar algorithm that takes a list in any order, not necessarily sorted. Its approximation ratio is .
A more sophisticated adaptation of LPT to an online setting attains an approximation ratio of 3/2. [27]
The knapsack problem is the following problem in combinatorial optimization:
The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.
The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time to O(N log N) for highly composite N (smooth numbers). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.
List scheduling is a greedy algorithm for Identical-machines scheduling. The input to this algorithm is a list of jobs that should be executed on a set of m machines. The list is ordered in a fixed order, which can be determined e.g. by the priority of executing the jobs, or by their order of arrival. The algorithm repeatedly executes the following steps until a valid schedule is obtained:
In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".
In computer science, the earth mover's distance (EMD) is a measure of dissimilarity between two frequency distributions, densities, or measures, over a metric space D. Informally, if the distributions are interpreted as two different ways of piling up earth (dirt) over D, the EMD captures the minimum cost of building the smaller pile using dirt taken from the larger, where cost is defined as the amount of dirt moved multiplied by the distance over which it is moved.
In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.
In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine. It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.
In computer science, multiway number partitioning is the problem of partitioning a multiset of numbers into a fixed number of subsets, such that the sums of the subsets are as similar as possible. It was first presented by Ronald Graham in 1969 in the context of the identical-machines scheduling problem. The problem is parametrized by a positive integer k, and called k-way number partitioning. The input to the problem is a multiset S of numbers, whose sum is k*T.
In computer science, greedy number partitioning is a class of greedy algorithms for multiway number partitioning. The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible. Greedy algorithms process the numbers sequentially, and insert the next number into a bin in which the sum of numbers is currently smallest.
In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as LDM.
In the bin covering problem, items of different sizes must be packed into a finite number of bins or containers, each of which must contain at least a certain given total size, in a way that maximizes the number of bins used.
The multifit algorithm is an algorithm for multiway number partitioning, originally developed for the problem of identical-machines scheduling. It was developed by Coffman, Garey and Johnson. Its novelty comes from the fact that it uses an algorithm for another famous problem - the bin packing problem - as a subroutine.
The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:
Identical-machines scheduling is an optimization problem in computer science and operations research. We are given n jobs J1, J2, ..., Jn of varying processing times, which need to be scheduled on m identical machines, such that a certain objective function is optimized, for example, the makespan is minimized.
Next-fit is an online algorithm for bin packing. Its input is a list of items of different sizes. Its output is a packing - a partition of the items into bins of fixed capacity, such that the sum of sizes of items in each bin is at most the capacity. Ideally, we would like to use as few bins as possible, but minimizing the number of bins is an NP-hard problem. The next-fit algorithm uses the following heuristic:
First-fit (FF) is an online algorithm for bin packing. Its input is a list of items of different sizes. Its output is a packing - a partition of the items into bins of fixed capacity, such that the sum of sizes of items in each bin is at most the capacity. Ideally, we would like to use as few bins as possible, but minimizing the number of bins is an NP-hard problem. The first-fit algorithm uses the following heuristic:
Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of n items of different sizes, and two integers m, k. The output is a partition of the items into m subsets, such that the number of items in each subset is at most k. Subject to this, it is required that the sums of sizes in the m subsets are as similar as possible.
The configuration linear program (configuration-LP) is a linear programming technique used for solving combinatorial optimization problems. It was introduced in the context of the cutting stock problem. Later, it has been applied to the bin packing and job scheduling problems. In the configuration-LP, there is a variable for each possible configuration - each possible multiset of items that can fit in a single bin. Usually, the number of configurations is exponential in the problem size, but in some cases it is possible to attain approximate solutions using only a polynomial number of configurations.
The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.