In theoretical computer science, the **subgraph isomorphism** problem is a computational task in which two graphs *G* and *H* are given as input, and one must determine whether *G* contains a subgraph that is isomorphic to *H*. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete.^{ [1] } However certain other cases of subgraph isomorphism may be solved in polynomial time.^{ [2] }

Sometimes the name **subgraph matching** is also used for the same problem. This name puts emphasis on finding such a subgraph as opposed to the bare decision problem.

To prove subgraph isomorphism is NP-complete, it must be formulated as a decision problem. The input to the decision problem is a pair of graphs *G* and *H*. The answer to the problem is positive if *H* is isomorphic to a subgraph of *G*, and negative otherwise.

Formal question:

Let , be graphs. Is there a subgraph such that ? I. e., does there exist a bijection such that ?

The proof of subgraph isomorphism being NP-complete is simple and based on reduction of the clique problem, an NP-complete decision problem in which the input is a single graph *G* and a number *k*, and the question is whether *G* contains a complete subgraph with *k* vertices. To translate this to a subgraph isomorphism problem, simply let *H* be the complete graph *K*_{k}; then the answer to the subgraph isomorphism problem for *G* and *H* is equal to the answer to the clique problem for *G* and *k*. Since the clique problem is NP-complete, this polynomial-time many-one reduction shows that subgraph isomorphism is also NP-complete.^{ [3] }

An alternative reduction from the Hamiltonian cycle problem translates a graph *G* which is to be tested for Hamiltonicity into the pair of graphs *G* and *H*, where *H* is a cycle having the same number of vertices as *G*. Because the Hamiltonian cycle problem is NP-complete even for planar graphs, this shows that subgraph isomorphism remains NP-complete even in the planar case.^{ [4] }

Subgraph isomorphism is a generalization of the graph isomorphism problem, which asks whether *G* is isomorphic to *H*: the answer to the graph isomorphism problem is true if and only if *G* and *H* both have the same numbers of vertices and edges and the subgraph isomorphism problem for *G* and *H* is true. However the complexity-theoretic status of graph isomorphism remains an open question.

In the context of the Aanderaa–Karp–Rosenberg conjecture on the query complexity of monotone graph properties, Gröger (1992) showed that any subgraph isomorphism problem has query complexity Ω(*n*^{3/2}); that is, solving the subgraph isomorphism requires an algorithm to check the presence or absence in the input of Ω(*n*^{3/2}) different edges in the graph.^{ [5] }

Ullmann (1976) describes a recursive backtracking procedure for solving the subgraph isomorphism problem. Although its running time is, in general, exponential, it takes polynomial time for any fixed choice of *H* (with a polynomial that depends on the choice of *H*). When *G* is a planar graph (or more generally a graph of bounded expansion) and *H* is fixed, the running time of subgraph isomorphism can be reduced to linear time.^{ [2] }

Ullmann (2010) is a substantial update to the 1976 subgraph isomorphism algorithm paper.

Cordella (2004) proposed in 2004 another algorithm based on Ullmann's, VF2, which improves the refinement process using different heuristics and uses significantly less memory.

Bonnici (2013) proposed a better algorithm, which improves the initial order of the vertices using some heuristics.

For large graphs, state-of-the art algorithms include CFL-Match and Turboiso, and extensions thereupon such as DAF by Han (2019) .

As subgraph isomorphism has been applied in the area of cheminformatics to find similarities between chemical compounds from their structural formula; often in this area the term **substructure search** is used.^{ [6] } A query structure is often defined graphically using a structure editor program; SMILES based database systems typically define queries using SMARTS, a SMILES extension.

The closely related problem of counting the number of isomorphic copies of a graph *H* in a larger graph *G* has been applied to pattern discovery in databases,^{ [7] } the bioinformatics of protein-protein interaction networks,^{ [8] } and in exponential random graph methods for mathematically modeling social networks.^{ [9] }

Ohlrich et al. (1993) describe an application of subgraph isomorphism in the computer-aided design of electronic circuits. Subgraph matching is also a substep in graph rewriting (the most runtime-intensive), and thus offered by graph rewrite tools.

The problem is also of interest in artificial intelligence, where it is considered part of an array of pattern matching in graphs problems; an extension of subgraph isomorphism known as graph mining is also of interest in that area.^{ [10] }

- ↑ The original Cook (1971) paper that proves the Cook–Levin theorem already showed subgraph isomorphism to be NP-complete, using a reduction from 3-SAT involving cliques.
- 1 2 Eppstein (1999); Nešetřil & Ossona de Mendez (2012)
- ↑ Wegener, Ingo (2005),
*Complexity Theory: Exploring the Limits of Efficient Algorithms*, Springer, p. 81, ISBN 9783540210450 . - ↑ de la Higuera, Colin; Janodet, Jean-Christophe; Samuel, Émilie; Damiand, Guillaume; Solnon, Christine (2013), "Polynomial algorithms for open plane graph and subgraph isomorphisms" (PDF),
*Theoretical Computer Science*,**498**: 76–99, doi:10.1016/j.tcs.2013.05.026, MR 3083515,It is known since the mid-70’s that the isomorphism problem is solvable in polynomial time for plane graphs. However, it has also been noted that the subisomorphism problem is still N P-complete, in particular because the Hamiltonian cycle problem is NP-complete for planar graphs.

- ↑ Here Ω invokes Big Omega notation.
- ↑ Ullmann (1976)
- ↑ Kuramochi & Karypis (2001).
- ↑ Pržulj, Corneil & Jurisica (2006).
- ↑ Snijders et al. (2006).
- ↑ http://www.aaai.org/Papers/Symposia/Fall/2006/FS-06-02/FS06-02-007.pdf; expanded version at https://e-reports-ext.llnl.gov/pdf/332302.pdf

In the mathematical field of graph theory the **Hamiltonian path problem** and the **Hamiltonian cycle problem** are problems of determining whether a Hamiltonian path or a Hamiltonian cycle exists in a given graph. Both problems are NP-complete.

In graph theory, an **isomorphism of graphs***G* and *H* is a bijection between the vertex sets of *G* and *H*

In computer science, the **clique problem** is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

In graph theory, two graphs and are **homeomorphic** if there is a graph isomorphism from some **subdivision** of to some **subdivision** of . If the edges of a graph are thought of as lines drawn from one vertex to another, then two graphs are homeomorphic to each other in the graph-theoretic sense precisely if they are homeomorphic in the sense in which the term is used in topology.

In graph theory, **graph coloring** is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a **vertex coloring**. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a **face coloring** of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

In graph theory, an **independent set**, **stable set**, **coclique** or **anticlique** is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . The size of an independent set is the number of vertices it contains. Independent sets have also been called internally stable sets.

In the mathematical discipline of graph theory, a **matching** or **independent edge set** in a graph is a set of edges without common vertices. Finding a matching in a bipartite graph can be treated as a network flow problem.

In graph theory, a **cograph**, or **complement-reducible graph**, or ** P_{4}-free graph**, is a graph that can be generated from the single-vertex graph

In graph theory, a **graph property** or **graph invariant** is a property of graphs that depends only on the abstract structure, not on graph representations such as particular labellings or drawings of the graph.

The **graph isomorphism problem** is the computational problem of determining whether two finite graphs are isomorphic.

In graph theory and theoretical computer science, a **maximum common induced subgraph** of two graphs *G* and *H* is a graph that is an induced subgraph of both *G* and *H*, and that has as many vertices as possible.

In complexity theory and graph theory, **induced subgraph isomorphism** is an NP-complete decision problem that involves finding a given graph as an induced subgraph of a larger graph.

In graph theory, the **modular product** of graphs *G* and *H* is a graph formed by combining *G* and *H* that has applications to subgraph isomorphism. It is one of several different kinds of graph products that have been studied, generally using the same vertex set but with different rules for determining which edges to include.

In graph theory, a **clique cover** or **partition into cliques** of a given undirected graph is a partition of the vertices of the graph into cliques, subsets of vertices within which every two vertices are adjacent. A **minimum clique cover** is a clique cover that uses as few cliques as possible. The minimum *k* for which a clique cover exists is called the **clique cover number** of the given graph.

For a graph, a **maximum cut** is a cut whose size is at least the size of any other cut. The problem of finding a maximum cut in a graph is known as the **Max-Cut Problem.**

In computational complexity theory, a problem is **NP-complete** when it can be solved by a restricted class of brute force search algorithms and it can be used to simulate any other problem with a similar algorithm. More precisely, each input to the problem should be associated with a set of solutions of polynomial length, whose validity can be tested quickly, such that the output for any input is "yes" if the solution set is non-empty and "no" if it is empty. The complexity class of problems of this form is called NP, an abbreviation for "nondeterministic polynomial time". A problem is said to be NP-hard if everything in NP can be transformed in polynomial time into it, and a problem is NP-complete if it is both in NP and NP-hard. The NP-complete problems represent the hardest problems in NP. If any NP-complete problem has a polynomial time algorithm, all problems in NP do. The set of NP-complete problems is often denoted by **NP-C** or **NPC**.

In computer science the notion of highly connected subgraphs appears frequently. This notion can be formalized as follows. Let be an undirected graph and let be a subgraph of . Then the *density* of is defined to be .

In theoretical computer science, **Baker's technique** is a method for designing polynomial-time approximation schemes (PTASs) for problems on planar graphs. It is named after Brenda Baker, who announced it in a 1983 conference and published it in the *Journal of the ACM* in 1994.

In combinatorial optimization, the **matroid parity problem** is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the **matchoid problem**.

- Cook, S. A. (1971), "The complexity of theorem-proving procedures",
*Proc. 3rd ACM Symposium on Theory of Computing*, pp. 151–158, doi:10.1145/800157.805047 . - Eppstein, David (1999), "Subgraph isomorphism in planar graphs and related problems" (PDF),
*Journal of Graph Algorithms and Applications*,**3**(3): 1–27, arXiv: cs.DS/9911003 , doi:10.7155/jgaa.00014 . - Garey, Michael R.; Johnson, David S. (1979),
*Computers and Intractability: A Guide to the Theory of NP-Completeness*, W.H. Freeman, ISBN 978-0-7167-1045-5 . A1.4: GT48, pg.202. - Gröger, Hans Dietmar (1992), "On the randomized complexity of monotone graph properties" (PDF),
*Acta Cybernetica*,**10**(3): 119–127. - Han, Myoungji; Kim, Hyunjoon; Gu, Geonmo; Park, Kunsoo; Han, Wookshin (2019), "Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together",
*SIGMOD*, doi:10.1145/3299869.3319880 - Kuramochi, Michihiro; Karypis, George (2001), "Frequent subgraph discovery",
*1st IEEE International Conference on Data Mining*, p. 313, CiteSeerX 10.1.1.22.4992 , doi:10.1109/ICDM.2001.989534, ISBN 978-0-7695-1119-1 . - Ohlrich, Miles; Ebeling, Carl; Ginting, Eka; Sather, Lisa (1993), "SubGemini: identifying subcircuits using a fast subgraph isomorphism algorithm",
*Proceedings of the 30th international Design Automation Conference*, pp. 31–37, doi:10.1145/157485.164556, ISBN 978-0-89791-577-9 . - Nešetřil, Jaroslav; Ossona de Mendez, Patrice (2012), "18.3 The subgraph isomorphism problem and Boolean queries",
*Sparsity: Graphs, Structures, and Algorithms*, Algorithms and Combinatorics,**28**, Springer, pp. 400–401, doi:10.1007/978-3-642-27875-4, ISBN 978-3-642-27874-7, MR 2920058 . - Pržulj, N.; Corneil, D. G.; Jurisica, I. (2006), "Efficient estimation of graphlet frequency distributions in protein–protein interaction networks",
*Bioinformatics*,**22**(8): 974–980, doi: 10.1093/bioinformatics/btl030 , PMID 16452112 . - Snijders, T. A. B.; Pattison, P. E.; Robins, G.; Handcock, M. S. (2006), "New specifications for exponential random graph models",
*Sociological Methodology*,**36**(1): 99–153, CiteSeerX 10.1.1.62.7975 , doi:10.1111/j.1467-9531.2006.00176.x . - Ullmann, Julian R. (1976), "An algorithm for subgraph isomorphism",
*Journal of the ACM*,**23**(1): 31–42, doi:10.1145/321921.321925 . - Jamil, Hasan (2011), "Computing Subgraph Isomorphic Queries using Structural Unification and Minimum Graph Structures",
*26th ACM Symposium on Applied Computing*, pp. 1058–1063. - Ullmann, Julian R. (2010), "Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism",
*Journal of Experimental Algorithmics*,**15**: 1.1, CiteSeerX 10.1.1.681.8766 , doi:10.1145/1671970.1921702 . - Cordella, Luigi P. (2004), "A (sub) graph isomorphism algorithm for matching large graphs",
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,**26**(10): 1367–1372, CiteSeerX 10.1.1.101.5342 , doi:10.1109/tpami.2004.75, PMID 15641723 - Bonnici, V.; Giugno, R. (2013), "A subgraph isomorphism algorithm and its application to biochemical data",
*BMC Bioinformatics*, 14(Suppl7) (13): S13, doi:10.1186/1471-2105-14-s7-s13, PMC 3633016 , PMID 23815292

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.