The knapsack problem is the following problem in combinatorial optimization:
It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively.
The knapsack problem has been studied for more than a century, with early works dating as far back as 1897. [1]
Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, [2] selection of investments and portfolios, [3] selection of assets for asset-backed securitization, [4] and generating keys for the Merkle–Hellman [5] and other knapsack cryptosystems.
One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. For small examples, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values, it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score. [6]
A 1999 study of the Stony Brook University Algorithm Repository showed that, out of 75 algorithmic problems related to the field of combinatorial algorithms and algorithm engineering, the knapsack problem was the 19th most popular and the third most needed after suffix trees and the bin packing problem. [7]
The most common problem being solved is the 0-1 knapsack problem, which restricts the number of copies of each kind of item to zero or one. Given a set of items numbered from 1 up to , each with a weight and a value , along with a maximum weight capacity ,
Here represents the number of instances of item to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity.
The bounded knapsack problem (BKP) removes the restriction that there is only one of each item, but restricts the number of copies of each kind of item to a maximum non-negative integer value :
The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item and can be formulated as above except that the only restriction on is that it is a non-negative integer.
One example of the unbounded knapsack problem is given using the figure shown at the beginning of this article and the text "if any number of each book is available" in the caption of that figure.
The knapsack problem is interesting from the perspective of computer science for many reasons:
There is a link between the "decision" and "optimization" problems in that if there exists a polynomial algorithm that solves the "decision" problem, then one can find the maximum value for the optimization problem in polynomial time by applying this algorithm iteratively while increasing the value of k. On the other hand, if an algorithm finds the optimal value of the optimization problem in polynomial time, then the decision problem can be solved in polynomial time by comparing the value of the solution output by this algorithm with the value of k. Thus, both versions of the problem are of similar difficulty.
One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, [8] [9] or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. [10] The goal in finding these "hard" instances is for their use in public-key cryptography systems, such as the Merkle–Hellman knapsack cryptosystem. More generally, better understanding of the structure of the space of instances of an optimization problem helps to advance the study of the particular problem and can improve algorithm selection.
Furthermore, notable is the fact that the hardness of the knapsack problem depends on the form of the input. If the weights and profits are given as integers, it is weakly NP-complete, while it is strongly NP-complete if the weights and profits are given as rational numbers. [11] However, in the case of rational weights and profits it still admits a fully polynomial-time approximation scheme.
The NP-hardness of the Knapsack problem relates to computational models in which the size of integers matters (such as the Turing machine). In contrast, decision trees count each decision as a single step. Dobkin and Lipton [12] show an lower bound on linear decision trees for the knapsack problem, that is, trees where decision nodes test the sign of affine functions. [13] This was generalized to algebraic decision trees by Steele and Yao. [14] If the elements in the problem are real numbers or rationals, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). [15] This model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions. An upper bound for a decision-tree model was given by Meyer auf der Heide [16] who showed that for every n there exists an O(n4)-deep linear decision tree that solves the subset-sum problem with n items. Note that this does not imply any upper bound for an algorithm that should solve the problem for any given n.
Several algorithms are available to solve knapsack problems, based on the dynamic programming approach, [17] the branch and bound approach [18] or hybridizations of both approaches. [10] [19] [20] [21]
The unbounded knapsack problem (UKP) places no restriction on the number of copies of each kind of item. Besides, here we assume that
Observe that has the following properties:
1. (the sum of zero items, i.e., the summation of the empty set).
2. , , where is the value of the -th kind of item.
The second property needs to be explained in detail. During the process of the running of this method, how do we get the weight ? There are only ways and the previous weights are where there are total kinds of different item (by saying different, we mean that the weight and the value are not completely the same). If we know each value of these items and the related maximum value previously, we just compare them to each other and get the maximum value ultimately and we are done.
Here the maximum of the empty set is taken to be zero. Tabulating the results from up through gives the solution. Since the calculation of each involves examining at most items, and there are at most values of to calculate, the running time of the dynamic programming solution is . Dividing by their greatest common divisor is a way to improve the running time.
Even if P≠NP, the complexity does not contradict the fact that the knapsack problem is NP-complete, since , unlike , is not polynomial in the length of the input to the problem. The length of the input to the problem is proportional to the number of bits in , , not to itself. However, since this runtime is pseudopolynomial, this makes the (decision version of the) knapsack problem a weakly NP-complete problem.
A similar dynamic programming solution for the 0-1 knapsack problem also runs in pseudo-polynomial time. Assume are strictly positive integers. Define to be the maximum value that can be attained with weight less than or equal to using items up to (first items).
We can define recursively as follows: (Definition A)
The solution can then be found by calculating . To do this efficiently, we can use a table to store previous computations.
The following is pseudocode for the dynamic program:
// Input:// Values (stored in array v)// Weights (stored in array w)// Number of distinct items (n)// Knapsack capacity (W)// NOTE: The array "v" and array "w" are assumed to store all relevant values starting at index 1.arraym[0..n,0..W];forjfrom0toWdo:m[0,j]:=0forifrom1tondo:m[i,0]:=0forifrom1tondo:forjfrom1toWdo:ifw[i]>jthen:m[i,j]:=m[i-1,j]else:m[i,j]:=max(m[i-1,j],m[i-1,j-w[i]]+v[i])
This solution will therefore run in time and space. (If we only need the value m[n,W], we can modify the code so that the amount of memory required is O(W) which stores the recent two lines of the array "m".)
However, if we take it a step or two further, we should know that the method will run in the time between and . From Definition A, we know that there is no need to compute all the weights when the number of items and the items themselves that we chose are fixed. That is to say, the program above computes more than necessary because the weight changes from 0 to W often. From this perspective, we can program this method so that it runs recursively.
// Input:// Values (stored in array v)// Weights (stored in array w)// Number of distinct items (n)// Knapsack capacity (W)// NOTE: The array "v" and array "w" are assumed to store all relevant values starting at index 1.Definevalue[n,W]Initializeallvalue[i,j]=-1Definem:=(i,j)// Define function m so that it represents the maximum value we can get under the condition: use first i items, total weight limit is j{ifi==0orj<=0then:value[i,j]=0returnif(value[i-1,j]==-1)then:// m[i-1, j] has not been calculated, we have to call function mm(i-1,j)ifw[i]>jthen:// item cannot fit in the bagvalue[i,j]=value[i-1,j]else:if(value[i-1,j-w[i]]==-1)then:// m[i-1,j-w[i]] has not been calculated, we have to call function mm(i-1,j-w[i])value[i,j]=max(value[i-1,j],value[i-1,j-w[i]]+v[i])}Runm(n,W)
For example, there are 10 different items and the weight limit is 67. So, If you use above method to compute for , you will get this, excluding calls that produce :
Besides, we can break the recursion and convert it into a tree. Then we can cut some leaves and use parallel computing to expedite the running of this method.
To find the actual subset of items, rather than just their total value, we can run this after running the function above:
/** * Returns the indices of the items of the optimal knapsack. * i: We can include items 1 through i in the knapsack * j: maximum weight of the knapsack */functionknapsack(i:int,j:int):Set<int>{ifi==0then:return{}ifm[i,j]>m[i-1,j]then:return{i}∪knapsack(i-1,j-w[i])else:returnknapsack(i-1,j)}knapsack(n,W)
Another algorithm for 0-1 knapsack, discovered in 1974 [22] and sometimes called "meet-in-the-middle" due to parallels to a similarly named algorithm in cryptography, is exponential in the number of different items but may be preferable to the DP algorithm when is large compared to n. In particular, if the are nonnegative but not integers, we could still use the dynamic programming algorithm by scaling and rounding (i.e. using fixed-point arithmetic), but if the problem requires fractional digits of precision to arrive at the correct answer, will need to be scaled by , and the DP algorithm will require space and time.
algorithm Meet-in-the-middle isinput: A set of items with weights and values. output: The greatest combined value of a subset. partition the set {1...n} into two sets A and B of approximately equal size compute the weights and values of all subsets of each set for each subset ofAdo find the subset of B of greatest value such that the combined weight is less than W keep track of the greatest combined value seen so far
The algorithm takes space, and efficient implementations of step 3 (for instance, sorting the subsets of B by weight, discarding subsets of B which weigh more than other subsets of B of greater or equal value, and using binary search to find the best match) result in a runtime of . As with the meet in the middle attack in cryptography, this improves on the runtime of a naive brute force approach (examining all subsets of ), at the cost of using exponential rather than constant space (see also baby-step giant-step). The current state of the art improvement to the meet-in-the-middle algorithm, using insights from Schroeppel and Shamir's Algorithm for Subset Sum, provides as a corollary a randomized algorithm for Knapsack which preserves the (up to polynomial factors) running time and reduces the space requirements to (see [23] Corollary 1.4). In contrast, the best known deterministic algorithm runs in time with a slightly worse space complexity of . [24]
As for most NP-complete problems, it may be enough to find workable solutions even if they are not optimal. Preferably, however, the approximation comes with a guarantee of the difference between the value of the solution found and the value of the optimal solution.
As with many useful but computationally complex algorithms, there has been substantial research on creating and analyzing algorithms that approximate a solution. The knapsack problem, though NP-Hard, is one of a collection of algorithms that can still be approximated to any specified degree. This means that the problem has a polynomial time approximation scheme. To be exact, the knapsack problem has a fully polynomial time approximation scheme (FPTAS). [25]
George Dantzig proposed a greedy approximation algorithm to solve the unbounded knapsack problem. [26] His version sorts the items in decreasing order of value per unit of weight, . It then proceeds to insert them into the sack, starting with as many copies as possible of the first kind of item until there is no longer space in the sack for more. Provided that there is an unlimited supply of each kind of item, if is the maximum value of items that fit into the sack, then the greedy algorithm is guaranteed to achieve at least a value of .
For the bounded problem, where the supply of each kind of item is limited, the above algorithm may be far from optimal. Nevertheless, a simple modification allows us to solve this case: Assume for simplicity that all items individually fit in the sack ( for all ). Construct a solution by packing items greedily as long as possible, i.e. where . Furthermore, construct a second solution containing the first item that did not fit. Since provides an upper bound for the LP relaxation of the problem, one of the sets must have value at least ; we thus return whichever of and has better value to obtain a -approximation.
It can be shown that the average performance converges to the optimal solution in distribution at the error rate [27]
The fully polynomial time approximation scheme (FPTAS) for the knapsack problem takes advantage of the fact that the reason the problem has no known polynomial time solutions is because the profits associated with the items are not restricted. If one rounds off some of the least significant digits of the profit values then they will be bounded by a polynomial and 1/ε where ε is a bound on the correctness of the solution. This restriction then means that an algorithm can find a solution in polynomial time that is correct within a factor of (1-ε) of the optimal solution. [25]
algorithm FPTAS isinput: ε ∈ (0,1] a list A of n items, specified by their values, , and weights output: S' the FPTAS solution P := max // the highest item value K := ε for i from 1 to n do := end forreturn the solution, S', using the values in the dynamic program outlined above
Theorem: The set computed by the algorithm above satisfies , where is an optimal solution.
Quantum approximate optimization algorithm (QAOA) can be employed to solve Knapsack problem using quantum computation by minimizing the Hamiltonian of the problem. The Knapsack Hamiltonian is constructed via embedding the constraint condition to the cost function of the problem with a penalty term. [28] where is the penalty constant which is determined by case-specific fine-tuning.
Solving the unbounded knapsack problem can be made easier by throwing away items which will never be needed. For a given item , suppose we could find a set of items such that their total weight is less than the weight of , and their total value is greater than the value of . Then cannot appear in the optimal solution, because we could always improve any potential solution containing by replacing with the set . Therefore, we can disregard the -th item altogether. In such cases, is said to dominate. (Note that this does not apply to bounded knapsack problems, since we may have already used up the items in .)
Finding dominance relations allows us to significantly reduce the size of the search space. There are several different types of dominance relations, [10] which all satisfy an inequality of the form:
, and for some
where and . The vector denotes the number of copies of each member of .
There are many variations of the knapsack problem that have arisen from the vast number of applications of the basic problem. The main variations occur by changing the number of some problem parameter such as the number of items, number of objectives, or even the number of knapsacks.
This variation changes the goal of the individual filling the knapsack. Instead of one objective, such as maximizing the monetary profit, the objective could have several dimensions. For example, there could be environmental or social concerns as well as economic goals. Problems frequently addressed include portfolio and transportation logistics optimizations. [29] [30]
As an example, suppose you ran a cruise ship. You have to decide how many famous comedians to hire. This boat can handle no more than one ton of passengers and the entertainers must weigh less than 1000 lbs. Each comedian has a weight, brings in business based on their popularity and asks for a specific salary. In this example, you have multiple objectives. You want, of course, to maximize the popularity of your entertainers while minimizing their salaries. Also, you want to have as many entertainers as possible.
In this variation, the weight of knapsack item is given by a D-dimensional vector and the knapsack has a D-dimensional capacity vector . The target is to maximize the sum of the values of the items in the knapsack so that the sum of weights in each dimension does not exceed .
Multi-dimensional knapsack is computationally harder than knapsack; even for , the problem does not have EPTAS unless PNP. [31] However, the algorithm in [32] is shown to solve sparse instances efficiently. An instance of multi-dimensional knapsack is sparse if there is a set for such that for every knapsack item , such that and . Such instances occur, for example, when scheduling packets in a wireless network with relay nodes. [32] The algorithm from [32] also solves sparse instances of the multiple choice variant, multiple-choice multi-dimensional knapsack.
The IHS (Increasing Height Shelf) algorithm is optimal for 2D knapsack (packing squares into a two-dimensional unit size square): when there are at most five squares in an optimal packing. [33]
This variation is similar to the Bin Packing Problem. It differs from the Bin Packing Problem in that a subset of items can be selected, whereas, in the Bin Packing Problem, all items have to be packed to certain bins. The concept is that there are multiple knapsacks. This may seem like a trivial change, but it is not equivalent to adding to the capacity of the initial knapsack. This variation is used in many loading and scheduling problems in Operations Research and has a Polynomial-time approximation scheme. [34]
The quadratic knapsack problem maximizes a quadratic objective function subject to binary and linear capacity constraints. [35] The problem was introduced by Gallo, Hammer, and Simeone in 1980, [36] however the first treatment of the problem dates back to Witzgall in 1975. [37]
The subset sum problem is a special case of the decision and 0-1 problems where each kind of item, the weight equals the value: . In the field of cryptography, the term knapsack problem is often used to refer specifically to the subset sum problem. The subset sum problem is one of Karp's 21 NP-complete problems. [38]
A generalization of subset sum problem is called multiple subset-sum problem, in which multiple bins exist with the same capacity. It has been shown that the generalization does not have an FPTAS. [39]
In the geometric knapsack problem, there is a set of rectangles with different values, and a rectangular knapsack. The goal is to pack the largest possible value into the knapsack. [40]
The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:
The Merkle–Hellman knapsack cryptosystem was one of the earliest public key cryptosystems. It was published by Ralph Merkle and Martin Hellman in 1978. A polynomial time attack was published by Adi Shamir in 1984. As a result, the cryptosystem is now considered insecure.
The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows:
The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, splitting a network prefix into multiple subnets, and technology mapping in FPGA semiconductor chip design.
In computer science, parameterized complexity is a branch of computational complexity theory that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input or output. The complexity of a problem is then measured as a function of those parameters. This allows the classification of NP-hard problems on a finer scale than in the classical setting, where the complexity of a problem is only measured as a function of the number of bits in the input. This appears to have been first demonstrated in Gurevich, Stockmeyer & Vishkin (1984). The first systematic work on parameterized complexity was done by Downey & Fellows (1999).
A fully polynomial-time approximation scheme (FPTAS) is an algorithm for finding approximate solutions to function problems, especially optimization problems. An FPTAS takes as input an instance of the problem and a parameter ε > 0. It returns as output a value which is at least times the correct value, and at most times the correct value.
In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the numeric value of the input —but not necessarily in the length of the input, which is the case for polynomial time algorithms.
In theoretical computer science, the continuous knapsack problem is an algorithmic problem in combinatorial optimization in which the goal is to fill a container with fractional amounts of different materials chosen to maximize the value of the selected materials. It resembles the classic knapsack problem, in which the items to be placed in the container are indivisible; however, the continuous knapsack problem may be solved in polynomial time whereas the classic knapsack problem is NP-hard. It is a classic example of how a seemingly small change in the formulation of a problem can have a large impact on its computational complexity.
In applied mathematics, the maximum generalized assignment problem is a problem in combinatorial optimization. This problem is a generalization of the assignment problem in which both tasks and agents have a size. Moreover, the size of each task might vary from one agent to the other.
In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.
The maximum coverage problem is a classical question in computer science, computational complexity theory, and operations research. It is a problem that is widely taught in approximation algorithms.
In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.
In applied mathematics, Graver bases enable iterative solutions of linear and various nonlinear integer programming problems in polynomial time. They were introduced by Jack E. Graver. Their connection to the theory of Gröbner bases was discussed by Bernd Sturmfels. The algorithmic theory of Graver bases and its application to integer programming is described by Shmuel Onn.
The quadratic knapsack problem (QKP), first introduced in 19th century, is an extension of knapsack problem that allows for quadratic terms in the objective function: Given a set of items, each with a weight, a value, and an extra profit that can be earned if two items are selected, determine the number of items to include in a collection without exceeding capacity of the knapsack, so as to maximize the overall profit. Usually, quadratic knapsack problems come with a restriction on the number of copies of each kind of item: either 0, or 1. This special type of QKP forms the 0-1 quadratic knapsack problem, which was first discussed by Gallo et al. The 0-1 quadratic knapsack problem is a variation of the knapsack problem, combining the features of the 0-1 knapsack problem and the quadratic knapsack problem.
The multiple subset sum problem is an optimization problem in computer science and operations research. It is a generalization of the subset sum problem. The input to the problem is a multiset of n integers and a positive integer m representing the number of subsets. The goal is to construct, from the input integers, some m subsets. The problem has several variants:
Unrelated-machines scheduling is an optimization problem in computer science and operations research. It is a variant of optimal job scheduling. We need to schedule n jobs J1, J2, ..., Jn on m different machines, such that a certain objective function is optimized. The time that machine i needs in order to process job j is denoted by pi,j. The term unrelated emphasizes that there is no relation between values of pi,j for different i and j. This is in contrast to two special cases of this problem: uniform-machines scheduling - in which pi,j = pi / sj, and identical-machines scheduling - in which pi,j = pi.
The configuration linear program (configuration-LP) is a linear programming technique used for solving combinatorial optimization problems. It was introduced in the context of the cutting stock problem. Later, it has been applied to the bin packing and job scheduling problems. In the configuration-LP, there is a variable for each possible configuration - each possible multiset of items that can fit in a single bin. Usually, the number of configurations is exponential in the problem size, but in some cases it is possible to attain approximate solutions using only a polynomial number of configurations.
The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.
A knapsack auction is an auction in which several identical items are sold, and there are several bidders with different valuations interested in different amounts of items. The goal is to choose a subset of the bidders with a total demand, at most, the number of items and, subject to that, a maximum total value. Finding this set of bidders requires solving an instance of the knapsack problem, which explains the term "knapsack auction".