NP-completeness

Last updated
The Boolean satisfiability problem (SAT) asks to determine if a propositional formula (example depicted) can be made true by an appropriate assignment ("solution") of truth values to its variables. While it is easy to verify whether a given assignment renders the formula true, no essentially faster method to find a satisfying assignment is known than to try all assignments in succession. Cook and Levin proved that each easy-to-verify problem can be solved as fast as SAT, which is hence NP-complete. 3SAT 17 svg.svg
The Boolean satisfiability problem (SAT) asks to determine if a propositional formula (example depicted) can be made true by an appropriate assignment ("solution") of truth values to its variables. While it is easy to verify whether a given assignment renders the formula true, no essentially faster method to find a satisfying assignment is known than to try all assignments in succession. Cook and Levin proved that each easy-to-verify problem can be solved as fast as SAT, which is hence NP-complete.

In computational complexity theory, a problem is NP-complete when:

Contents

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short (polynomial length) solution.
  3. The correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

The name "NP-complete" is short for "nondeterministic polynomial-time complete". In this name, "nondeterministic" refers to nondeterministic Turing machines, a way of mathematically formalizing the idea of a brute-force search algorithm. Polynomial time refers to an amount of time that is considered "quick" for a deterministic algorithm to check a single solution, or for a nondeterministic Turing machine to perform the whole search. "Complete" refers to the property of being able to simulate everything in the same complexity class.

More precisely, each input to the problem should be associated with a set of solutions of polynomial length, the validity of each of which can be tested quickly (in polynomial time), [2] such that the output for any input is "yes" if the solution set is non-empty and "no" if it is empty. The complexity class of problems of this form is called NP, an abbreviation for "nondeterministic polynomial time". A problem is said to be NP-hard if everything in NP can be transformed in polynomial time into it even though it may not be in NP. A problem is NP-complete if it is both in NP and NP-hard. The NP-complete problems represent the hardest problems in NP. If some NP-complete problem has a polynomial time algorithm, all problems in NP do. The set of NP-complete problems is often denoted by NP-C or NPC.

Although a solution to an NP-complete problem can be verified "quickly", there is no known way to find a solution quickly. That is, the time required to solve the problem using any currently known algorithm increases rapidly as the size of the problem grows. As a consequence, determining whether it is possible to solve these problems quickly, called the P versus NP problem, is one of the fundamental unsolved problems in computer science today.

While a method for computing the solutions to NP-complete problems quickly remains undiscovered, computer scientists and programmers still frequently encounter NP-complete problems. NP-complete problems are often addressed by using heuristic methods and approximation algorithms.

Overview

NP-complete problems are in NP, the set of all decision problems whose solutions can be verified in polynomial time; NP may be equivalently defined as the set of decision problems that can be solved in polynomial time on a non-deterministic Turing machine. A problem p in NP is NP-complete if every other problem in NP can be transformed (or reduced) into p in polynomial time.[ citation needed ]

It is not known whether every problem in NP can be quickly solved—this is called the P versus NP problem. But if any NP-complete problem can be solved quickly, then every problem in NP can, because the definition of an NP-complete problem states that every problem in NP must be quickly reducible to every NP-complete problem (that is, it can be reduced in polynomial time). Because of this, it is often said that NP-complete problems are harder or more difficult than NP problems in general.[ citation needed ]

Formal definition

A decision problem is NP-complete if:[ citation needed ]

  1. is in NP, and
  2. Every problem in NP is reducible to in polynomial time. [3]

can be shown to be in NP by demonstrating that a candidate solution to can be verified in polynomial time.

Note that a problem satisfying condition 2 is said to be NP-hard, whether or not it satisfies condition 1. [4]

A consequence of this definition is that if we had a polynomial time algorithm (on a UTM, or any other Turing-equivalent abstract machine) for , we could solve all problems in NP in polynomial time.

Background

Euler diagram for P, NP, NP-complete, and NP-hard sets of problems. The left side is valid under the assumption that P[?]NP, while the right side is valid under the assumption that P=NP (except that the empty language and its complement are never NP-complete, and in general, not every problem in P or NP is NP-complete). P np np-complete np-hard.svg
Euler diagram for P, NP, NP-complete, and NP-hard sets of problems. The left side is valid under the assumption that P≠NP, while the right side is valid under the assumption that P=NP (except that the empty language and its complement are never NP-complete, and in general, not every problem in P or NP is NP-complete).

The concept of NP-completeness was introduced in 1971 (see Cook–Levin theorem), though the term NP-complete was introduced later. At the 1971 STOC conference, there was a fierce debate between the computer scientists about whether NP-complete problems could be solved in polynomial time on a deterministic Turing machine. John Hopcroft brought everyone at the conference to a consensus that the question of whether NP-complete problems are solvable in polynomial time should be put off to be solved at some later date, since nobody had any formal proofs for their claims one way or the other. This is known as "the question of whether P=NP".

Nobody has yet been able to determine conclusively whether NP-complete problems are in fact solvable in polynomial time, making this one of the great unsolved problems of mathematics. The Clay Mathematics Institute is offering a US$1 million reward to anyone who has a formal proof that P=NP or that P≠NP. [5]

The existence of NP-complete problems is not obvious. The Cook–Levin theorem states that the Boolean satisfiability problem is NP-complete, thus establishing that such problems do exist. In 1972, Richard Karp proved that several other problems were also NP-complete (see Karp's 21 NP-complete problems); thus, there is a class of NP-complete problems (besides the Boolean satisfiability problem). Since the original results, thousands of other problems have been shown to be NP-complete by reductions from other problems previously shown to be NP-complete; many of these problems are collected in Garey & Johnson (1979).

NP-complete problems

Some NP-complete problems, indicating the reductions typically used to prove their NP-completeness Relative NPC chart.svg
Some NP-complete problems, indicating the reductions typically used to prove their NP-completeness

The easiest way to prove that some new problem is NP-complete is first to prove that it is in NP, and then to reduce some known NP-complete problem to it. Therefore, it is useful to know a variety of NP-complete problems. The list below contains some well-known problems that are NP-complete when expressed as decision problems.

To the right is a diagram of some of the problems and the reductions typically used to prove their NP-completeness. In this diagram, problems are reduced from bottom to top. Note that this diagram is misleading as a description of the mathematical relationship between these problems, as there exists a polynomial-time reduction between any two NP-complete problems; but it indicates where demonstrating this polynomial-time reduction has been easiest.

There is often only a small difference between a problem in P and an NP-complete problem. For example, the 3-satisfiability problem, a restriction of the Boolean satisfiability problem, remains NP-complete, whereas the slightly more restricted 2-satisfiability problem is in P (specifically, it is NL-complete), but the slightly more general max. 2-sat. problem is again NP-complete. Determining whether a graph can be colored with 2 colors is in P, but with 3 colors is NP-complete, even when restricted to planar graphs. Determining if a graph is a cycle or is bipartite is very easy (in L), but finding a maximum bipartite or a maximum cycle subgraph is NP-complete. A solution of the knapsack problem within any fixed percentage of the optimal solution can be computed in polynomial time, but finding the optimal solution is NP-complete.

Intermediate problems

An interesting example is the graph isomorphism problem, the graph theory problem of determining whether a graph isomorphism exists between two graphs. Two graphs are isomorphic if one can be transformed into the other simply by renaming vertices. Consider these two problems:

The Subgraph Isomorphism problem is NP-complete. The graph isomorphism problem is suspected to be neither in P nor NP-complete, though it is in NP. This is an example of a problem that is thought to be hard, but is not thought to be NP-complete. This class is called NP-Intermediate problems and exists if and only if P≠NP.

Solving NP-complete problems

At present, all known algorithms for NP-complete problems require time that is superpolynomial in the input size. The vertex cover problem has [6] for some and it is unknown whether there are any faster algorithms.

The following techniques can be applied to solve computational problems in general, and they often give rise to substantially faster algorithms:

One example of a heuristic algorithm is a suboptimal greedy coloring algorithm used for graph coloring during the register allocation phase of some compilers, a technique called graph-coloring global register allocation. Each vertex is a variable, edges are drawn between variables which are being used at the same time, and colors indicate the register assigned to each variable. Because most RISC machines have a fairly large number of general-purpose registers, even a heuristic approach is effective for this application.

Completeness under different types of reduction

In the definition of NP-complete given above, the term reduction was used in the technical meaning of a polynomial-time many-one reduction.

Another type of reduction is polynomial-time Turing reduction. A problem is polynomial-time Turing-reducible to a problem if, given a subroutine that solves in polynomial time, one could write a program that calls this subroutine and solves in polynomial time. This contrasts with many-one reducibility, which has the restriction that the program can only call the subroutine once, and the return value of the subroutine must be the return value of the program.

If one defines the analogue to NP-complete with Turing reductions instead of many-one reductions, the resulting set of problems won't be smaller than NP-complete; it is an open question whether it will be any larger.

Another type of reduction that is also often used to define NP-completeness is the logarithmic-space many-one reduction which is a many-one reduction that can be computed with only a logarithmic amount of space. Since every computation that can be done in logarithmic space can also be done in polynomial time it follows that if there is a logarithmic-space many-one reduction then there is also a polynomial-time many-one reduction. This type of reduction is more refined than the more usual polynomial-time many-one reductions and it allows us to distinguish more classes such as P-complete. Whether under these types of reductions the definition of NP-complete changes is still an open problem. All currently known NP-complete problems are NP-complete under log space reductions. All currently known NP-complete problems remain NP-complete even under much weaker reductions such as reductions and reductions. Some NP-Complete problems such as SAT are known to be complete even under polylogarithmic time projections. [7] It is known, however, that AC0 reductions define a strictly smaller class than polynomial-time reductions. [8]

Naming

According to Donald Knuth, the name "NP-complete" was popularized by Alfred Aho, John Hopcroft and Jeffrey Ullman in their celebrated textbook "The Design and Analysis of Computer Algorithms". He reports that they introduced the change in the galley proofs for the book (from "polynomially-complete"), in accordance with the results of a poll he had conducted of the theoretical computer science community. [9] Other suggestions made in the poll [10] included "Herculean", "formidable", Steiglitz's "hard-boiled" in honor of Cook, and Shen Lin's acronym "PET", which stood for "probably exponential time", but depending on which way the P versus NP problem went, could stand for "provably exponential time" or "previously exponential time". [11]

Common misconceptions

The following misconceptions are frequent. [12]

Properties

Viewing a decision problem as a formal language in some fixed encoding, the set NPC of all NP-complete problems is not closed under:

It is not known whether NPC is closed under complementation, since NPC=co-NPC if and only if NP=co-NP, and since NP=co-NP is an open question. [16]

See also

Related Research Articles

In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make (a AND NOT b) = TRUE. In contrast, "a AND NOT a" is unsatisfiable.

The P versus NP problem is a major unsolved problem in theoretical computer science. Informally, it asks whether every problem whose solution can be quickly verified can also be quickly solved.

In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

<span class="mw-page-title-main">NP (complexity)</span> Complexity class used to classify decision problems

In computational complexity theory, NP is a complexity class used to classify decision problems. NP is the set of decision problems for which the problem instances, where the answer is "yes", have proofs verifiable in polynomial time by a deterministic Turing machine, or alternatively the set of problems that can be solved in polynomial time by a nondeterministic Turing machine.

In computational complexity theory, the complexity class #P (pronounced "sharp P" or, sometimes "number P" or "hash P") is the set of the counting problems associated with the decision problems in the set NP. More formally, #P is the class of function problems of the form "compute f(x)", where f is the number of accepting paths of a nondeterministic Turing machine running in polynomial time. Unlike most well-known complexity classes, it is not a class of decision problems but a class of function problems. The most difficult, representative problems of this class are #P-complete.

In computational complexity theory, a decision problem is P-complete if it is in P and every problem in P can be reduced to it by an appropriate reduction.

In computational complexity theory, a decision problem is PSPACE-complete if it can be solved using an amount of memory that is polynomial in the input length and if every other problem that can be solved in polynomial space can be transformed to it in polynomial time. The problems that are PSPACE-complete can be thought of as the hardest problems in PSPACE, the class of decision problems solvable in polynomial space, because a solution to any one such problem could easily be used to solve any other problem in PSPACE.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

<span class="mw-page-title-main">Complexity class</span> Set of problems in computational complexity theory

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory.

In computer science, parameterized complexity is a branch of computational complexity theory that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input or output. The complexity of a problem is then measured as a function of those parameters. This allows the classification of NP-hard problems on a finer scale than in the classical setting, where the complexity of a problem is only measured as a function of the number of bits in the input. This appears to have been first demonstrated in Gurevich, Stockmeyer & Vishkin (1984). The first systematic work on parameterized complexity was done by Downey & Fellows (1999).

In computational complexity theory, P, also known as PTIME or DTIME(nO(1)), is a fundamental complexity class. It contains all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.

In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a deterministic Turing machine to the Boolean satisfiability problem.

In computational complexity theory, NL is the complexity class containing decision problems that can be solved by a nondeterministic Turing machine using a logarithmic amount of memory space.

In computational complexity theory, L is the complexity class containing decision problems that can be solved by a deterministic Turing machine using a logarithmic amount of writable memory space. Formally, the Turing machine has two tapes, one of which encodes the input and can only be read, whereas the other tape has logarithmic size but can be read as well as written. Logarithmic space is sufficient to hold a constant number of pointers into the input and a logarithmic number of boolean flags, and many basic logspace algorithms use the memory in this way.

In computational complexity theory, SL is the complexity class of problems log-space reducible to USTCON, which is the problem of determining whether there exists a path between two vertices in an undirected graph, otherwise described as the problem of determining whether two vertices are in the same connected component. This problem is also called the undirected reachability problem. It does not matter whether many-one reducibility or Turing reducibility is used. Although originally described in terms of symmetric Turing machines, that equivalent formulation is very complex, and the reducibility definition is what is used in practice.

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.

The Valiant–Vazirani theorem is a theorem in computational complexity theory stating that if there is a polynomial time algorithm for Unambiguous-SAT, then NP = RP. It was proven by Leslie Valiant and Vijay Vazirani in their paper titled NP is as easy as detecting unique solutions published in 1986.

In computational complexity theory, NL-complete is a complexity class containing the languages that are complete for NL, the class of decision problems that can be solved by a nondeterministic Turing machine using a logarithmic amount of memory space. The NL-complete languages are the most "difficult" or "expressive" problems in NL. If a deterministic algorithm exists for solving any one of the NL-complete problems in logarithmic memory space, then NL = L.

In the mathematical field of graph theory, an automorphism of a graph is a form of symmetry in which the graph is mapped onto itself while preserving the edge–vertex connectivity.

In computer science, the Sharp Satisfiability Problem is the problem of counting the number of interpretations that satisfy a given Boolean formula, introduced by Valiant in 1979. In other words, it asks in how many ways the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. For example, the formula is satisfiable by three distinct boolean value assignments of the variables, namely, for any of the assignments, , and, we have

References

Citations

  1. For example, simply assigning true to each variable renders the 18th conjunct (and hence the complete formula) false.
  2. Cobham, Alan (1965). "The intrinsic computational difficulty of functions". Proc. Logic, Methodology, and Philosophy of Science II. North Holland.
  3. J. van Leeuwen (1998). Handbook of Theoretical Computer Science. Elsevier. p. 84. ISBN   978-0-262-72014-4.
  4. J. van Leeuwen (1998). Handbook of Theoretical Computer Science. Elsevier. p. 80. ISBN   978-0-262-72014-4.
  5. Kiersz, Andy. "An eminent mathematician claims to have solved one of math's greatest mysteries — and it's one of 6 problems with a $1 million prize". Business Insider. Retrieved 2023-04-24.
  6. Chen, Jianer; Kanj, Iyad A.; Xia, Ge (2010-09-06). "Improved upper bounds for vertex cover". Theoretical Computer Science. 411 (40): 3736–3756. doi: 10.1016/j.tcs.2010.06.026 . ISSN   0304-3975.
  7. Agrawal, M.; Allender, E.; Rudich, Steven (1998). "Reductions in Circuit Complexity: An Isomorphism Theorem and a Gap Theorem". Journal of Computer and System Sciences. 57 (2): 127–143. doi: 10.1006/jcss.1998.1583 . ISSN   1090-2724.
  8. Agrawal, M.; Allender, E.; Impagliazzo, R.; Pitassi, T.; Rudich, Steven (2001). "Reducing the complexity of reductions". Computational Complexity. 10 (2): 117–138. doi:10.1007/s00037-001-8191-1. ISSN   1016-3328. S2CID   29017219.
  9. Don Knuth, Tracy Larrabee, and Paul M. Roberts, Mathematical Writing Archived 2010-08-27 at the Wayback Machine § 25, MAA Notes No. 14, MAA, 1989 (also Stanford Technical Report, 1987).
  10. Knuth, D. F. (1974). "A terminological proposal". SIGACT News. 6 (1): 12–18. doi:10.1145/1811129.1811130. S2CID   45313676.
  11. See the poll, or Archived 2011-06-07 at the Wayback Machine .
  12. Ball, Philip (2000). "DNA computer helps travelling salesman". Nature. doi:10.1038/news000113-10.
  13. Bern (1990); Deĭneko, Klinz & Woeginger (2006); Dorn et al. (2005) ; Lipton & Tarjan (1980).
  14. Hemaspaandra, L. A.; Williams, R. (2012). "SIGACT News Complexity Theory Column 76". ACM SIGACT News. 43 (4): 70. doi:10.1145/2421119.2421135. S2CID   13367514.
  15. Aaronson, Scott (2010). "BQP and the polynomial hierarchy". In Schulman, Leonard J. (ed.). Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5–8 June 2010. Association for Computing Machinery. pp. 141–150. arXiv: 0910.4698 . doi:10.1145/1806689.1806711. ISBN   978-1-4503-0050-6.
  16. Talbot, John; Welsh, D. J. A. (2006), Complexity and Cryptography: An Introduction, Cambridge University Press, p. 57, ISBN   9780521617710, The question of whether NP and co-NP are equal is probably the second most important open problem in complexity theory, after the P versus NP question.

Sources

Further reading