Maximum common induced subgraph

Last updated

In graph theory and theoretical computer science, a maximum common induced subgraph of two graphs G and H is a graph that is an induced subgraph of both G and H, and that has as many vertices as possible.

Finding this graph is NP-hard. In the associated decision problem, the input is two graphs G and H and a number k. The problem is to decide whether G and H have a common induced subgraph with at least k vertices. This problem is NP-complete. [1] It is a generalization of the induced subgraph isomorphism problem, which arises when k equals the number of vertices in the smaller of G and H, so that this entire graph must appear as an induced subgraph of the other graph.

Based on hardness of approximation results for the maximum independent set problem, the maximum common induced subgraph problem is also hard to approximate. [2] This implies that, unless P = NP, there is no approximation algorithm that, in polynomial time on -vertex graphs, always finds a solution within a factor of of optimal, for any . [3]

One possible solution for this problem is to build a modular product graph of G and H. In this graph, the largest clique corresponds to a maximum common induced subgraph of G and H. Therefore, algorithms for finding maximum cliques can be used to find the maximum common induced subgraph. [4] Moreover, a modified maximum-clique algorithm can be used to find a maximum common connected subgraph. [5]

The McSplit algorithm (along with its McSplit↓ variant) is a forward checking algorithm that does not use the clique encoding, but uses a compact data structure to keep track of the vertices in graph H to which each vertex in graph G may be mapped. Both versions of the McSplit algorithm outperform the clique encoding for many graph classes. [6] A more efficient implementation of McSplit is McSplitDAL+PR, which combines a Reinforcement Learning agent with some heuristic scores computed with the PageRank algorithm. [7]

Maximum common induced subgraph algorithms have a long tradition in cheminformatics and pharmacophore mapping. [8]

See also

Related Research Articles

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

<span class="mw-page-title-main">Graph isomorphism</span> Bijection between the vertex set of two graphs

In graph theory, an isomorphism of graphsG and H is a bijection between the vertex sets of G and H

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

<span class="mw-page-title-main">Clique (graph theory)</span> Subset of the vertices of a node-link graph that are all adjacent to each other

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph is an induced subgraph of that is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

<span class="mw-page-title-main">Independent set (graph theory)</span> Unrelated vertices in graphs

In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.

<span class="mw-page-title-main">Perfect graph</span> Graph with tight clique-coloring relation

In graph theory, a perfect graph is a graph in which the chromatic number equals the size of the maximum clique, both in the graph itself and in every induced subgraph. In all graphs, the chromatic number is greater than or equal to the size of the maximum clique, but they can be far apart. A graph is perfect when these numbers are equal, and remain equal after the deletion of arbitrary subsets of vertices.

In the mathematical discipline of graph theory, the line graph of an undirected graph G is another graph L(G) that represents the adjacencies between edges of G. L(G) is constructed in the following way: for each edge in G, make a vertex in L(G); for every two edges in G that have a vertex in common, make an edge between their corresponding vertices in L(G).

<span class="mw-page-title-main">Chordal graph</span> Graph where all long cycles have a chord

In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced cycle in the graph should have exactly three vertices. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings, as the graphs in which each minimal separator is a clique, and as the intersection graphs of subtrees of a tree. They are sometimes also called rigid circuit graphs or triangulated graphs.

<span class="mw-page-title-main">Cograph</span> Graph formed by complementation and disjoint union

In graph theory, a cograph, or complement-reducible graph, or P4-free graph, is a graph that can be generated from the single-vertex graph K1 by complementation and disjoint union. That is, the family of cographs is the smallest class of graphs that includes K1 and is closed under complementation and disjoint union.

In the mathematical field of graph theory, an induced subgraph of a graph is another graph, formed from a subset of the vertices of the graph and all of the edges connecting pairs of vertices in that subset.

In the mathematical discipline of graph theory, a feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles. Equivalently, each FVS contains at least one vertex of any cycle in the graph. The feedback vertex set number of a graph is the size of a smallest feedback vertex set. The minimum feedback vertex set problem is an NP-complete problem; it was among the first problems shown to be NP-complete. It has wide applications in operating systems, database systems, and VLSI chip design.

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.

<span class="mw-page-title-main">Induced subgraph isomorphism problem</span>

In complexity theory and graph theory, induced subgraph isomorphism is an NP-complete decision problem that involves finding a given graph as an induced subgraph of a larger graph.

<span class="mw-page-title-main">Modular product of graphs</span> Binary operation in graph theory

In graph theory, the modular product of graphs G and H is a graph formed by combining G and H that has applications to subgraph isomorphism. It is one of several different kinds of graph products that have been studied, generally using the same vertex set but with different rules for determining which edges to include.

In graph theory, a clique cover or partition into cliques of a given undirected graph is a partition of the vertices into cliques, subsets of vertices within which every two vertices are adjacent. A minimum clique cover is a clique cover that uses as few cliques as possible. The minimum k for which a clique cover exists is called the clique cover number of the given graph.

<span class="mw-page-title-main">Distance-hereditary graph</span> Graph whose induced subgraphs preserve distance

In graph theory, a branch of discrete mathematics, a distance-hereditary graph is a graph in which the distances in any connected induced subgraph are the same as they are in the original graph. Thus, any induced subgraph inherits the distances of the larger graph.

<span class="mw-page-title-main">Dense subgraph</span> Highly connected subgraph

In graph theory and computer science, a dense subgraph is a subgraph with many edges per vertex. This is formalized as follows: let G = (V, E) be an undirected graph and let S = (VS, ES) be a subgraph of G. Then the density of S is defined to be:

<span class="mw-page-title-main">Matroid parity problem</span> Largest independent set of paired elements

In combinatorial optimization, the matroid parity problem is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the matchoid problem.

References

  1. Michael R. Garey and David S. Johnson (1979), Computers and Intractability: A Guide to the Theory of NP-Completeness , W.H. Freeman, ISBN   0-7167-1045-5 A1.4: GT48, pg.202.
  2. Kann, Viggo (1992), "On the approximability of the maximum common subgraph problem", STACS 92: 9th Annual Symposium on Theoretical Aspects of Computer Science Cachan, France, February 13–15, 1992, Proceedings, Lecture Notes in Computer Science, vol. 577, Springer Science $\mathplus$ Business Media, pp. 375–388, doi:10.1007/3-540-55210-3_198, ISBN   978-3-540-55210-9 .
  3. Zuckerman, D. (2006), "Linear degree extractors and the inapproximability of max clique and chromatic number", Proc. 38th ACM Symp. Theory of Computing , pp. 681–690, doi:10.1145/1132516.1132612, ISBN   1-59593-134-1, S2CID   5713815, ECCC   TR05-100 .
  4. Barrow, H.; Burstall, R. (1976), "Subgraph isomorphism, matching relational structures and maximal cliques", Information Processing Letters, 4 (4): 83–84, doi:10.1016/0020-0190(76)90049-1 .
  5. McCreesh, Ciaran; Ndiaye, Samba Ndojh; Prosser, Patrick; Solnon, Christine (2016), "Clique and Constraint Models for Maximum Common (Connected) Subgraph Problems" (PDF), Principles and Practice of Constraint Programming - 22nd International Conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, Lecture Notes in Computer Science, vol. 9892, Springer International Publishing, pp. 350–368, doi:10.1007/978-3-319-44953-1_23, ISBN   978-3-319-44952-4, S2CID   215812381
  6. McCreesh, Ciaran; Prosser, Patrick; Trimble, James (2017), "A Partitioning Algorithm for Maximum Common Subgraph Problems", Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, {IJCAI} 2017, Melbourne, Australia, August 19-25, 2017, ijcai.org, pp. 712–719, doi: 10.24963/ijcai.2017/99 , ISBN   9780999241103
  7. Calabrese, Andrea; Cardone, Lorenzo; Licata, Salvatore; Porro, Marco; Quer, Stefano (2023). A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph. SCITEPRESS - Science and Technology Publications. pp. 197–206. doi: 10.5220/0012130800003538 . ISBN   978-989-758-665-1.
  8. Raymond, John W.; Willett, Peter (2002), "Maximum common subgraph isomorphism algorithms for the matching of chemical structures" (PDF), Journal of Computer-Aided Molecular Design, 16 (7): 521–533, Bibcode:2002JCAMD..16..521R, doi:10.1023/A:1021271615909, PMID   12510884, S2CID   5202419 .