Approximation algorithm

Last updated

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solution to the optimal one. [1] Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos [2] for scheduling on unrelated parallel machines.

Contents

The design and analysis of approximation algorithms crucially involves a mathematical proof certifying the quality of the returned solutions in the worst case. [1] This distinguishes them from heuristics such as annealing or genetic algorithms, which find reasonably good solutions on some inputs, but provide no clear indication at the outset on when they may succeed or fail.

There is widespread interest in theoretical computer science to better understand the limits to which we can approximate certain famous optimization problems. For example, one of the long-standing open questions in computer science is to determine whether there is an algorithm that outperforms the 2-approximation for the Steiner Forest problem by Agrawal et al. [3] The desire to understand hard optimization problems from the perspective of approximability is motivated by the discovery of surprising mathematical connections and broadly applicable techniques to design algorithms for hard optimization problems. One well-known example of the former is the Goemans–Williamson algorithm for maximum cut, which solves a graph theoretic problem using high dimensional geometry. [4]

Introduction

A simple example of an approximation algorithm is one for the minimum vertex cover problem, where the goal is to choose the smallest set of vertices such that every edge in the input graph contains at least one chosen vertex. One way to find a vertex cover is to repeat the following process: find an uncovered edge, add both its endpoints to the cover, and remove all edges incident to either vertex from the graph. As any vertex cover of the input graph must use a distinct vertex to cover each edge that was considered in the process (since it forms a matching), the vertex cover produced, therefore, is at most twice as large as the optimal one. In other words, this is a constant-factor approximation algorithm with an approximation factor of 2. Under the recent unique games conjecture, this factor is even the best possible one. [5]

NP-hard problems vary greatly in their approximability; some, such as the knapsack problem, can be approximated within a multiplicative factor , for any fixed , and therefore produce solutions arbitrarily close to the optimum (such a family of approximation algorithms is called a polynomial-time approximation scheme or PTAS). Others are impossible to approximate within any constant, or even polynomial, factor unless P = NP, as in the case of the maximum clique problem. Therefore, an important benefit of studying approximation algorithms is a fine-grained classification of the difficulty of various NP-hard problems beyond the one afforded by the theory of NP-completeness. In other words, although NP-complete problems may be equivalent (under polynomial-time reductions) to each other from the perspective of exact solutions, the corresponding optimization problems behave very differently from the perspective of approximate solutions.

Algorithm design techniques

By now there are several established techniques to design approximation algorithms. These include the following ones.

  1. Greedy algorithm
  2. Local search
  3. Enumeration and dynamic programming (which is also often used for parameterized approximations)
  4. Solving a convex programming relaxation to get a fractional solution. Then converting this fractional solution into a feasible solution by some appropriate rounding. The popular relaxations include the following.
  5. Primal-dual methods
  6. Dual fitting
  7. Embedding the problem in some metric and then solving the problem on the metric. This is also known as metric embedding.
  8. Random sampling and the use of randomness in general in conjunction with the methods above.

A posteriori guarantees

While approximation algorithms always provide an a priori worst case guarantee (be it additive or multiplicative), in some cases they also provide an a posteriori guarantee that is often much better. This is often the case for algorithms that work by solving a convex relaxation of the optimization problem on the given input. For example, there is a different approximation algorithm for minimum vertex cover that solves a linear programming relaxation to find a vertex cover that is at most twice the value of the relaxation. Since the value of the relaxation is never larger than the size of the optimal vertex cover, this yields another 2-approximation algorithm. While this is similar to the a priori guarantee of the previous approximation algorithm, the guarantee of the latter can be much better (indeed when the value of the LP relaxation is far from the size of the optimal vertex cover).

Hardness of approximation

Approximation algorithms as a research area is closely related to and informed by inapproximability theory where the non-existence of efficient algorithms with certain approximation ratios is proved (conditioned on widely believed hypotheses such as the P ≠ NP conjecture) by means of reductions. In the case of the metric traveling salesman problem, the best known inapproximability result rules out algorithms with an approximation ratio less than 123/122 ≈ 1.008196 unless P = NP, Karpinski, Lampis, Schmied. [6] Coupled with the knowledge of the existence of Christofides' 1.5 approximation algorithm, this tells us that the threshold of approximability for metric traveling salesman (if it exists) is somewhere between 123/122 and 1.5.

While inapproximability results have been proved since the 1970s, such results were obtained by ad hoc means and no systematic understanding was available at the time. It is only since the 1990 result of Feige, Goldwasser, Lovász, Safra and Szegedy on the inapproximability of Independent Set [7] and the famous PCP theorem, [8] that modern tools for proving inapproximability results were uncovered. The PCP theorem, for example, shows that Johnson's 1974 approximation algorithms for Max SAT, set cover, independent set and coloring all achieve the optimal approximation ratio, assuming P ≠ NP. [9]

Practicality

Not all approximation algorithms are suitable for direct practical applications. Some involve solving non-trivial linear programming/semidefinite relaxations (which may themselves invoke the ellipsoid algorithm), complex data structures, or sophisticated algorithmic techniques, leading to difficult implementation issues or improved running time performance (over exact algorithms) only on impractically large inputs. Implementation and running time issues aside, the guarantees provided by approximation algorithms may themselves not be strong enough to justify their consideration in practice. Despite their inability to be used "out of the box" in practical applications, the ideas and insights behind the design of such algorithms can often be incorporated in other ways in practical algorithms. In this way, the study of even very expensive algorithms is not a completely theoretical pursuit as they can yield valuable insights.

In other cases, even if the initial results are of purely theoretical interest, over time, with an improved understanding, the algorithms may be refined to become more practical. One such example is the initial PTAS for Euclidean TSP by Sanjeev Arora (and independently by Joseph Mitchell) which had a prohibitive running time of for a approximation. [10] Yet, within a year these ideas were incorporated into a near-linear time algorithm for any constant . [11]

Performance guarantees

For some approximation algorithms it is possible to prove certain properties about the approximation of the optimum result. For example, a ρ-approximation algorithmA is defined to be an algorithm for which it has been proven that the value/cost, f(x), of the approximate solution A(x) to an instance x will not be more (or less, depending on the situation) than a factor ρ times the value, OPT, of an optimum solution.

The factor ρ is called the relative performance guarantee. An approximation algorithm has an absolute performance guarantee or bounded errorc, if it has been proven for every instance x that

Similarly, the performance guarantee, R(x,y), of a solution y to an instance x is defined as

where f(y) is the value/cost of the solution y for the instance x. Clearly, the performance guarantee is greater than or equal to 1 and equal to 1 if and only if y is an optimal solution. If an algorithm A guarantees to return solutions with a performance guarantee of at most r(n), then A is said to be an r(n)-approximation algorithm and has an approximation ratio of r(n). Likewise, a problem with an r(n)-approximation algorithm is said to be r(n)-approximable or have an approximation ratio of r(n). [12] [13]

For minimization problems, the two different guarantees provide the same result and that for maximization problems, a relative performance guarantee of ρ is equivalent to a performance guarantee of . In the literature, both definitions are common but it is clear which definition is used since, for maximization problems, as ρ ≤ 1 while r ≥ 1.

The absolute performance guarantee of some approximation algorithm A, where x refers to an instance of a problem, and where is the performance guarantee of A on x (i.e. ρ for problem instance x) is:

That is to say that is the largest bound on the approximation ratio, r, that one sees over all possible instances of the problem. Likewise, the asymptotic performance ratio is:

That is to say that it is the same as the absolute performance ratio, with a lower bound n on the size of problem instances. These two types of ratios are used because there exist algorithms where the difference between these two is significant.

Performance guarantees
r-approx [12] [13] ρ-approxrel. error [13] rel. error [12] norm. rel. error [12] [13] abs. error [12] [13]
max:
min:

Epsilon terms

In the literature, an approximation ratio for a maximization (minimization) problem of c - ϵ (min: c + ϵ) means that the algorithm has an approximation ratio of c ∓ ϵ for arbitrary ϵ > 0 but that the ratio has not (or cannot) be shown for ϵ = 0. An example of this is the optimal inapproximability — inexistence of approximation — ratio of 7 / 8 + ϵ for satisfiable MAX-3SAT instances due to Johan Håstad. [14] As mentioned previously, when c = 1, the problem is said to have a polynomial-time approximation scheme.

An ϵ-term may appear when an approximation algorithm introduces a multiplicative error and a constant error while the minimum optimum of instances of size n goes to infinity as n does. In this case, the approximation ratio is ck / OPT = c ∓ o(1) for some constants c and k. Given arbitrary ϵ > 0, one can choose a large enough N such that the term k / OPT < ϵ for every n ≥ N. For every fixed ϵ, instances of size n < N can be solved by brute force, thereby showing an approximation ratio — existence of approximation algorithms with a guarantee — of c ∓ ϵ for every ϵ > 0.

See also

Citations

  1. 1 2 Bernard., Shmoys, David (2011). The design of approximation algorithms. Cambridge University Press. ISBN   9780521195270. OCLC   671709856.{{cite book}}: CS1 maint: multiple names: authors list (link)
  2. Lenstra, Jan Karel; Shmoys, David B.; Tardos, Éva (1990-01-01). "Approximation algorithms for scheduling unrelated parallel machines". Mathematical Programming. 46 (1–3): 259–271. CiteSeerX   10.1.1.115.708 . doi:10.1007/BF01585745. ISSN   0025-5610. S2CID   206799898.
  3. Agrawal, Ajit; Klein, Philip; Ravi, R. (1991). "When trees collide". Proceedings of the twenty-third annual ACM symposium on Theory of computing - STOC '91. New Orleans, Louisiana, United States: ACM Press. pp. 134–144. doi:10.1145/103418.103437. ISBN   978-0-89791-397-3. S2CID   1245448.
  4. Goemans, Michel X.; Williamson, David P. (November 1995). "Improved Approximation Algorithms for Maximum Cut and Satisfiability Problems Using Semidefinite Programming". J. ACM. 42 (6): 1115–1145. CiteSeerX   10.1.1.34.8500 . doi:10.1145/227683.227684. ISSN   0004-5411. S2CID   15794408.
  5. Khot, Subhash; Regev, Oded (2008-05-01). "Vertex cover might be hard to approximate to within 2−ε". Journal of Computer and System Sciences. Computational Complexity 2003. 74 (3): 335–349. doi: 10.1016/j.jcss.2007.06.019 .
  6. Karpinski, Marek; Lampis, Michael; Schmied, Richard (2015-12-01). "New inapproximability bounds for TSP". Journal of Computer and System Sciences. 81 (8): 1665–1677. arXiv: 1303.6437 . doi:10.1016/j.jcss.2015.06.003.
  7. Feige, Uriel; Goldwasser, Shafi; Lovász, Laszlo; Safra, Shmuel; Szegedy, Mario (March 1996). "Interactive Proofs and the Hardness of Approximating Cliques". J. ACM. 43 (2): 268–292. doi: 10.1145/226643.226652 . ISSN   0004-5411.
  8. Arora, Sanjeev; Safra, Shmuel (January 1998). "Probabilistic Checking of Proofs: A New Characterization of NP". J. ACM. 45 (1): 70–122. doi: 10.1145/273865.273901 . ISSN   0004-5411. S2CID   751563.
  9. Johnson, David S. (1974-12-01). "Approximation algorithms for combinatorial problems". Journal of Computer and System Sciences. 9 (3): 256–278. doi: 10.1016/S0022-0000(74)80044-9 .
  10. Arora, S. (October 1996). "Polynomial time approximation schemes for Euclidean TSP and other geometric problems". Proceedings of 37th Conference on Foundations of Computer Science. pp. 2–11. CiteSeerX   10.1.1.32.3376 . doi:10.1109/SFCS.1996.548458. ISBN   978-0-8186-7594-2. S2CID   1499391.
  11. Arora, S. (October 1997). "Nearly linear time approximation schemes for Euclidean TSP and other geometric problems". Proceedings 38th Annual Symposium on Foundations of Computer Science. pp. 554–563. doi:10.1109/SFCS.1997.646145. ISBN   978-0-8186-8197-4. S2CID   10656723.
  12. 1 2 3 4 5 G. Ausiello; P. Crescenzi; G. Gambosi; V. Kann; A. Marchetti-Spaccamela; M. Protasi (1999). Complexity and Approximation: Combinatorial Optimization Problems and their Approximability Properties.
  13. 1 2 3 4 5 Viggo Kann (1992). On the Approximability of NP-complete Optimization Problems (PDF).
  14. Johan Håstad (1999). "Some Optimal Inapproximability Results". Journal of the ACM. 48 (4): 798–859. CiteSeerX   10.1.1.638.2808 . doi:10.1145/502090.502098. S2CID   5120748.

Related Research Articles

<span class="mw-page-title-main">Travelling salesman problem</span> NP-hard problem in combinatorial optimization

The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset of integers and a target-sum , and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-hard. Moreover, some restricted variants of it are NP-complete too, for example:

The bin packing problem is an optimization problem, in which items of different sizes must be packed into a finite number of bins or containers, each of a fixed given capacity, in a way that minimizes the number of bins used. The problem has many applications, such as filling up containers, loading trucks with weight capacity constraints, creating file backups in media, and technology mapping in FPGA semiconductor chip design.

<span class="mw-page-title-main">Combinatorial optimization</span> Subfield of mathematical optimization

Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combinatorial optimization problems are the travelling salesman problem ("TSP"), the minimum spanning tree problem ("MST"), and the knapsack problem. In many such problems, such as the ones previously mentioned, exhaustive search is not tractable, and so specialized algorithms that quickly rule out large parts of the search space or approximation algorithms must be resorted to instead.

<span class="mw-page-title-main">Steiner tree problem</span> On short connecting networks with added vertices

In combinatorial mathematics, the Steiner tree problem, or minimum Steiner tree problem, named after Jakob Steiner, is an umbrella term for a class of problems in combinatorial optimization. While Steiner tree problems may be formulated in a number of settings, they all require an optimal interconnect for a given set of objects and a predefined objective function. One well-known variant, which is often used synonymously with the term Steiner tree problem, is the Steiner tree problem in graphs. Given an undirected graph with non-negative edge weights and a subset of vertices, usually referred to as terminals, the Steiner tree problem in graphs requires a tree of minimum weight that contains all terminals and minimizes the total weight of its edges. Further well-known variants are the Euclidean Steiner tree problem and the rectilinear minimum Steiner tree problem.

<span class="mw-page-title-main">Vertex cover</span> Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

In computational complexity theory, the class APX is the set of NP optimization problems that allow polynomial-time approximation algorithms with approximation ratio bounded by a constant. In simple terms, problems in this class have efficient algorithms that can find an answer within some fixed multiplicative factor of the optimal answer.

In computational complexity theory, the unique games conjecture is a conjecture made by Subhash Khot in 2002. The conjecture postulates that the problem of determining the approximate value of a certain type of game, known as a unique game, has NP-hard computational complexity. It has broad applications in the theory of hardness of approximation. If the unique games conjecture is true and P ≠ NP, then for many important problems it is not only impossible to get an exact solution in polynomial time, but also impossible to get a good polynomial-time approximation. The problems for which such an inapproximability result would hold include constraint satisfaction problems, which crop up in a wide variety of disciplines.

Domination analysis of an approximation algorithm is a way to estimate its performance, introduced by Glover and Punnen in 1997. Unlike the classical approximation ratio analysis, which compares the numerical quality of a calculated solution with that of an optimal solution, domination analysis involves examining the rank of the calculated solution in the sorted order of all possible solutions. In this style of analysis, an algorithm is said to have dominance number or domination numberK, if there exists a subset of K different solutions to the problem among which the algorithm's output is the best. Domination analysis can also be expressed using a domination ratio, which is the fraction of the solution space that is no better than the given solution; this number always lies within the interval [0,1], with larger numbers indicating better solutions. Domination analysis is most commonly applied to problems for which the total number of possible solutions is known and for which exact solution is difficult.

MAX-3SAT is a problem in the computational complexity subfield of computer science. It generalises the Boolean satisfiability problem (SAT) which is a decision problem considered in complexity theory. It is defined as:

In computer science, particularly the study of approximation algorithms, an L-reduction is a transformation of optimization problems which linearly preserves approximability features; it is one type of approximation-preserving reduction. L-reductions in studies of approximability of optimization problems play a similar role to that of polynomial reductions in the studies of computational complexity of decision problems.

Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

<span class="mw-page-title-main">Linear programming relaxation</span>

In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

For a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.

In graph theory, the metric k-center problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

In computational complexity theory, a gap reduction is a reduction to a particular type of decision problem, known as a c-gap problem. Such reductions provide information about the hardness of approximating solutions to optimization problems. In short, a gap problem refers to one wherein the objective is to distinguish between cases where the best solution is above one threshold from cases where the best solution is below another threshold, such that the two thresholds have a gap in between. Gap reductions can be used to demonstrate inapproximability results, as if a problem may be approximated to a better factor than the size of gap, then the approximation algorithm can be used to solve the corresponding gap problem.

In computability theory and computational complexity theory, especially the study of approximation algorithms, an approximation-preserving reduction is an algorithm for transforming one optimization problem into another problem, such that the distance of solutions from optimal is preserved to some degree. Approximation-preserving reductions are a subset of more general reductions in complexity theory; the difference is that approximation-preserving reductions usually make statements on approximation problems or optimization problems, as opposed to decision problems.

The vertex k-center problem is a classical NP-hard problem in computer science. It has application in facility location and clustering. Basically, the vertex k-center problem models the following real problem: given a city with facilities, find the best facilities where to build fire stations. Since firemen must attend any emergency as quickly as possible, the distance from the farthest facility to its nearest fire station has to be as small as possible. In other words, the position of the fire stations must be such that every possible fire is attended as quickly as possible.

The small set expansion hypothesis or small set expansion conjecture in computational complexity theory is an unproven computational hardness assumption related to the unique games conjecture. Under the small set expansion hypothesis it is assumed to be computationally infeasible to distinguish between a certain class of expander graphs called "small set expanders" and other graphs that are very far from being small set expanders.

References