Subgraph isomorphism problem

Last updated

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. [1] However certain other cases of subgraph isomorphism may be solved in polynomial time. [2]

Contents

Sometimes the name subgraph matching is also used for the same problem. This name puts emphasis on finding such a subgraph as opposed to the bare decision problem.

Decision problem and computational complexity

To prove subgraph isomorphism is NP-complete, it must be formulated as a decision problem. The input to the decision problem is a pair of graphs G and H. The answer to the problem is positive if H is isomorphic to a subgraph of G, and negative otherwise.

Formal question:

Let , be graphs. Is there a subgraph such that ? I. e., does there exist a bijection such that ?

The proof of subgraph isomorphism being NP-complete is simple and based on reduction of the clique problem, an NP-complete decision problem in which the input is a single graph G and a number k, and the question is whether G contains a complete subgraph with k vertices. To translate this to a subgraph isomorphism problem, simply let H be the complete graph Kk; then the answer to the subgraph isomorphism problem for G and H is equal to the answer to the clique problem for G and k. Since the clique problem is NP-complete, this polynomial-time many-one reduction shows that subgraph isomorphism is also NP-complete. [3]

An alternative reduction from the Hamiltonian cycle problem translates a graph G which is to be tested for Hamiltonicity into the pair of graphs G and H, where H is a cycle having the same number of vertices as G. Because the Hamiltonian cycle problem is NP-complete even for planar graphs, this shows that subgraph isomorphism remains NP-complete even in the planar case. [4]

Subgraph isomorphism is a generalization of the graph isomorphism problem, which asks whether G is isomorphic to H: the answer to the graph isomorphism problem is true if and only if G and H both have the same numbers of vertices and edges and the subgraph isomorphism problem for G and H is true. However the complexity-theoretic status of graph isomorphism remains an open question.

In the context of the Aanderaa–Karp–Rosenberg conjecture on the query complexity of monotone graph properties, Gröger (1992) showed that any subgraph isomorphism problem has query complexity Ω(n3/2); that is, solving the subgraph isomorphism requires an algorithm to check the presence or absence in the input of Ω(n3/2) different edges in the graph. [5]

Algorithms

Ullmann (1976) describes a recursive backtracking procedure for solving the subgraph isomorphism problem. Although its running time is, in general, exponential, it takes polynomial time for any fixed choice of H (with a polynomial that depends on the choice of H). When G is a planar graph (or more generally a graph of bounded expansion) and H is fixed, the running time of subgraph isomorphism can be reduced to linear time. [2]

Ullmann (2010) is a substantial update to the 1976 subgraph isomorphism algorithm paper.

Cordella (2004) proposed in 2004 another algorithm based on Ullmann's, VF2, which improves the refinement process using different heuristics and uses significantly less memory.

Bonnici & Giugno (2013) proposed a better algorithm, which improves the initial order of the vertices using some heuristics.

The current state of the art solver for moderately-sized, hard instances is the Glasgow Subgraph Solver (McCreesh, Prosser & Trimble (2020)). [6] This solver adopts a constraint programming approach, using bit-parallel data structures and specialized propagation algorithms for performance. It supports most common variations of the problem and is capable of counting or enumerating solutions as well as deciding whether one exists.

For large graphs, state-of-the art algorithms include CFL-Match and Turboiso, and extensions thereupon such as DAF by Han et al. (2019).

Applications

As subgraph isomorphism has been applied in the area of cheminformatics to find similarities between chemical compounds from their structural formula; often in this area the term substructure search is used. [7] A query structure is often defined graphically using a structure editor program; SMILES based database systems typically define queries using SMARTS, a SMILES extension.

The closely related problem of counting the number of isomorphic copies of a graph H in a larger graph G has been applied to pattern discovery in databases, [8] the bioinformatics of protein-protein interaction networks, [9] and in exponential random graph methods for mathematically modeling social networks. [10]

Ohlrich et al. (1993) describe an application of subgraph isomorphism in the computer-aided design of electronic circuits. Subgraph matching is also a substep in graph rewriting (the most runtime-intensive), and thus offered by graph rewrite tools.

The problem is also of interest in artificial intelligence, where it is considered part of an array of pattern matching in graphs problems; an extension of subgraph isomorphism known as graph mining is also of interest in that area. [11]

See also

Notes

  1. The original Cook (1971) paper that proves the Cook–Levin theorem already showed subgraph isomorphism to be NP-complete, using a reduction from 3-SAT involving cliques.
  2. 1 2 Eppstein (1999); Nešetřil & Ossona de Mendez (2012)
  3. Wegener, Ingo (2005), Complexity Theory: Exploring the Limits of Efficient Algorithms, Springer, p. 81, ISBN   9783540210450 .
  4. de la Higuera, Colin; Janodet, Jean-Christophe; Samuel, Émilie; Damiand, Guillaume; Solnon, Christine (2013), "Polynomial algorithms for open plane graph and subgraph isomorphisms" (PDF), Theoretical Computer Science, 498: 76–99, doi: 10.1016/j.tcs.2013.05.026 , MR   3083515, It is known since the mid-70's that the isomorphism problem is solvable in polynomial time for plane graphs. However, it has also been noted that the subisomorphism problem is still N P-complete, in particular because the Hamiltonian cycle problem is NP-complete for planar graphs.
  5. Here Ω invokes Big Omega notation.
  6. For an experimental evaluation, see Solnon (2019).
  7. Ullmann (1976)
  8. Kuramochi & Karypis (2001).
  9. Pržulj, Corneil & Jurisica (2006).
  10. Snijders et al. (2006).
  11. http://www.aaai.org/Papers/Symposia/Fall/2006/FS-06-02/FS06-02-007.pdf; expanded version at https://e-reports-ext.llnl.gov/pdf/332302.pdf

Related Research Articles

<span class="mw-page-title-main">Graph isomorphism</span> Bijection between the vertex set of two graphs

In graph theory, an isomorphism of graphsG and H is a bijection between the vertex sets of G and H

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

<span class="mw-page-title-main">Clique (graph theory)</span> Adjacent subset of an undirected graph

In graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph is an induced subgraph of that is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

<span class="mw-page-title-main">Independent set (graph theory)</span> Unrelated vertices in graphs

In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.

<span class="mw-page-title-main">Cograph</span> Graph formed by complementation and disjoint union

In graph theory, a cograph, or complement-reducible graph, or P4-free graph, is a graph that can be generated from the single-vertex graph K1 by complementation and disjoint union. That is, the family of cographs is the smallest class of graphs that includes K1 and is closed under complementation and disjoint union.

<span class="mw-page-title-main">Graph property</span> Property of graphs that depends only on abstract structure

In graph theory, a graph property or graph invariant is a property of graphs that depends only on the abstract structure, not on graph representations such as particular labellings or drawings of the graph.

In the mathematical discipline of graph theory, a feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles. Equivalently, each FVS contains at least one vertex of any cycle in the graph. The feedback vertex set number of a graph is the size of a smallest feedback vertex set. The minimum feedback vertex set problem is an NP-complete problem; it was among the first problems shown to be NP-complete. It has wide applications in operating systems, database systems, and VLSI chip design.

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.

In graph theory and theoretical computer science, a maximum common induced subgraph of two graphs G and H is a graph that is an induced subgraph of both G and H, and that has as many vertices as possible.

In computational complexity, problems that are in the complexity class NP but are neither in the class P nor NP-complete are called NP-intermediate, and the class of such problems is called NPI. Ladner's theorem, shown in 1975 by Richard E. Ladner, is a result asserting that, if P ≠ NP, then NPI is not empty; that is, NP contains problems that are neither in P nor NP-complete. Since it is also true that if NPI problems exist, then P ≠ NP, it follows that P = NP if and only if NPI is empty.

<span class="mw-page-title-main">Induced subgraph isomorphism problem</span>

In complexity theory and graph theory, induced subgraph isomorphism is an NP-complete decision problem that involves finding a given graph as an induced subgraph of a larger graph.

<span class="mw-page-title-main">Modular product of graphs</span> Binary operation in graph theory

In graph theory, the modular product of graphs G and H is a graph formed by combining G and H that has applications to subgraph isomorphism. It is one of several different kinds of graph products that have been studied, generally using the same vertex set but with different rules for determining which edges to include.

<span class="mw-page-title-main">Clique-width</span> Measure of graph complexity

In graph theory, the clique-width of a graph G is a parameter that describes the structural complexity of the graph; it is closely related to treewidth, but unlike treewidth it can be small for dense graphs. It is defined as the minimum number of labels needed to construct G by means of the following 4 operations :

  1. Creation of a new vertex v with label i (denoted by i(v))
  2. Disjoint union of two labeled graphs G and H (denoted by )
  3. Joining by an edge every vertex labeled i to every vertex labeled j (denoted by η(i,j)), where ij
  4. Renaming label i to label j (denoted by ρ(i,j))

In graph theory, a clique cover or partition into cliques of a given undirected graph is a collection of cliques that cover the whole graph. A minimum clique cover is a clique cover that uses as few cliques as possible. The minimum k for which a clique cover exists is called the clique cover number of the given graph.

<span class="mw-page-title-main">NP-completeness</span> Complexity class

In computational complexity theory, a problem is NP-complete when:

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short solution.
  3. The correctness of each solution can be verified quickly and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.
<span class="mw-page-title-main">Dense subgraph</span> Highly connected subgraph

In graph theory and computer science, a dense subgraph is a subgraph with many edges per vertex. This is formalized as follows: let G = (V, E) be an undirected graph and let S = (VS, ES) be a subgraph of G. Then the density of S is defined to be:

In theoretical computer science, Baker's technique is a method for designing polynomial-time approximation schemes (PTASs) for problems on planar graphs. It is named after Brenda Baker, who announced it in a 1983 conference and published it in the Journal of the ACM in 1994.

In graph theory, a family of graphs is said to have bounded expansion if all of its shallow minors are sparse graphs. Many natural families of sparse graphs have bounded expansion. A closely related but stronger property, polynomial expansion, is equivalent to the existence of separator theorems for these families. Families with these properties have efficient algorithms for problems including the subgraph isomorphism problem and model checking for the first order theory of graphs.

<span class="mw-page-title-main">Matroid parity problem</span> Largest independent set of paired elements

In combinatorial optimization, the matroid parity problem is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the matchoid problem.

References