Maximum common induced subgraph

Last updated

In graph theory and theoretical computer science, a maximum common induced subgraph of two graphs G and H is a graph that is an induced subgraph of both G and H, and that has as many vertices as possible.

Contents

Finding this graph is NP-hard. In the associated decision problem, the input is two graphs G and H and a number k. The problem is to decide whether G and H have a common induced subgraph with at least k vertices. This problem is NP-complete. [1] It is a generalization of the induced subgraph isomorphism problem, which arises when k equals the number of vertices in the smaller of G and H, so that this entire graph must appear as an induced subgraph of the other graph.

Based on hardness of approximation results for the maximum independent set problem, the maximum common induced subgraph problem is also hard to approximate. [2] This implies that, unless P = NP, there is no approximation algorithm that, in polynomial time on -vertex graphs, always finds a solution within a factor of of optimal, for any . [3]

One possible solution for this problem is to build a modular product graph of G and H. In this graph, the largest clique corresponds to a maximum common induced subgraph of G and H. Therefore, algorithms for finding maximum cliques can be used to find the maximum common induced subgraph. [4] Moreover, a modified maximum-clique algorithm can be used to find a maximum common connected subgraph. [5]

The McSplit algorithm (along with its McSplit↓ variant) is a forward checking algorithm that does not use the clique encoding, but uses a compact data structure to keep track of the vertices in graph H to which each vertex in graph G may be mapped. Both versions of the McSplit algorithm outperform the clique encoding for many graph classes. [6] A more efficient implementation of McSplit is McSplitDAL+PR, which combines a Reinforcement Learning agent with some heuristic scores computed with the PageRank algorithm. [7]

Applications

Maximum common induced subgraph algorithms form the basis for both graph differencing and graph alignment. Graph differencing identifies and highlights differences between two graphs by pinpointing changes, additions, or deletions. Graph alignment involves finding correspondences between the vertices and edges of two graphs to identify similar structures.

Maximum common induced subgraph algorithms have a long tradition in bioinformatics, cheminformatics, [8] [9] pharmacophore mapping, [10] pattern recognition, [11] computer vision, code analysis, compilers, and model checking.

The problem is also particularly useful in software engineering and model-based systems engineering, where software code and engineering models (e.g., Simulink, UML diagrams) are represented as graph data structures. Graph differencing can be used to detect changes between different versions of software code and models for change auditing, debugging, version control and collaborative team development.

See also

Related Research Articles

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

<span class="mw-page-title-main">Bipartite graph</span> Graph divided into two independent sets

In the mathematical field of graph theory, a bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets and , that is, every edge connects a vertex in to one in . Vertex sets and are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

<span class="mw-page-title-main">Clique (graph theory)</span> Adjacent subset of an undirected graph

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph is an induced subgraph of that is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

<span class="mw-page-title-main">Independent set (graph theory)</span> Unrelated vertices in graphs

In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.

<span class="mw-page-title-main">Perfect graph</span> Graph with tight clique-coloring relation

In graph theory, a perfect graph is a graph in which the chromatic number equals the size of the maximum clique, both in the graph itself and in every induced subgraph. In all graphs, the chromatic number is greater than or equal to the size of the maximum clique, but they can be far apart. A graph is perfect when these numbers are equal, and remain equal after the deletion of arbitrary subsets of vertices.

In the mathematical discipline of graph theory, the line graph of an undirected graph G is another graph L(G) that represents the adjacencies between edges of G. L(G) is constructed in the following way: for each edge in G, make a vertex in L(G); for every two edges in G that have a vertex in common, make an edge between their corresponding vertices in L(G).

<span class="mw-page-title-main">Chordal graph</span> Graph where all long cycles have a chord

In the mathematical area of graph theory, a chordal graph is one in which all cycles of four or more vertices have a chord, which is an edge that is not part of the cycle but connects two vertices of the cycle. Equivalently, every induced cycle in the graph should have exactly three vertices. The chordal graphs may also be characterized as the graphs that have perfect elimination orderings, as the graphs in which each minimal separator is a clique, and as the intersection graphs of subtrees of a tree. They are sometimes also called rigid circuit graphs or triangulated graphs: a chordal completion of a graph is typically called a triangulation of that graph.

<span class="mw-page-title-main">Cograph</span> Graph formed by complementation and disjoint union

In graph theory, a cograph, or complement-reducible graph, or P4-free graph, is a graph that can be generated from the single-vertex graph K1 by complementation and disjoint union. That is, the family of cographs is the smallest class of graphs that includes K1 and is closed under complementation and disjoint union.

In the mathematical field of graph theory, an induced subgraph of a graph is another graph, formed from a subset of the vertices of the graph and all of the edges, from the original graph, connecting pairs of vertices in that subset.

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.

<span class="mw-page-title-main">Induced subgraph isomorphism problem</span>

In complexity theory and graph theory, induced subgraph isomorphism is an NP-complete decision problem that involves finding a given graph as an induced subgraph of a larger graph.

<span class="mw-page-title-main">Rado graph</span> Infinite graph containing all countable graphs

In the mathematical field of graph theory, the Rado graph, Erdős–Rényi graph, or random graph is a countably infinite graph that can be constructed by choosing independently at random for each pair of its vertices whether to connect the vertices by an edge. The names of this graph honor Richard Rado, Paul Erdős, and Alfréd Rényi, mathematicians who studied it in the early 1960s; it appears even earlier in the work of Wilhelm Ackermann. The Rado graph can also be constructed non-randomly, by symmetrizing the membership relation of the hereditarily finite sets, by applying the BIT predicate to the binary representations of the natural numbers, or as an infinite Paley graph that has edges connecting pairs of prime numbers congruent to 1 mod 4 that are quadratic residues modulo each other.

<span class="mw-page-title-main">Claw-free graph</span> Graph without four-vertex star subgraphs

In graph theory, an area of mathematics, a claw-free graph is a graph that does not have a claw as an induced subgraph.

<span class="mw-page-title-main">Modular product of graphs</span> Binary operation in graph theory

In graph theory, the modular product of graphs G and H is a graph formed by combining G and H that has applications to subgraph isomorphism. It is one of several different kinds of graph products that have been studied, generally using the same vertex set but with different rules for determining which edges to include.

In graph theory, a clique cover or partition into cliques of a given undirected graph is a collection of cliques that cover the whole graph. A minimum clique cover is a clique cover that uses as few cliques as possible. The minimum k for which a clique cover exists is called the clique cover number of the given graph.

<span class="mw-page-title-main">Matroid parity problem</span> Largest independent set of paired elements

In combinatorial optimization, the matroid parity problem is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the matchoid problem.

References

  1. Michael R. Garey and David S. Johnson (1979), Computers and Intractability: A Guide to the Theory of NP-Completeness , W.H. Freeman, ISBN   0-7167-1045-5 A1.4: GT48, pg.202.
  2. Kann, Viggo (1992), "On the approximability of the maximum common subgraph problem", STACS 92: 9th Annual Symposium on Theoretical Aspects of Computer Science Cachan, France, February 13–15, 1992, Proceedings, Lecture Notes in Computer Science, vol. 577, Springer Science $\mathplus$ Business Media, pp. 375–388, doi:10.1007/3-540-55210-3_198, ISBN   978-3-540-55210-9 .
  3. Zuckerman, D. (2006), "Linear degree extractors and the inapproximability of max clique and chromatic number", Proc. 38th ACM Symp. Theory of Computing , pp. 681–690, doi:10.1145/1132516.1132612, ISBN   1-59593-134-1, S2CID   5713815, ECCC   TR05-100 .
  4. Barrow, H.; Burstall, R. (1976), "Subgraph isomorphism, matching relational structures and maximal cliques", Information Processing Letters, 4 (4): 83–84, doi:10.1016/0020-0190(76)90049-1 .
  5. McCreesh, Ciaran; Ndiaye, Samba Ndojh; Prosser, Patrick; Solnon, Christine (2016), "Clique and Constraint Models for Maximum Common (Connected) Subgraph Problems" (PDF), Principles and Practice of Constraint Programming - 22nd International Conference, CP 2016, Toulouse, France, September 5-9, 2016, Proceedings, Lecture Notes in Computer Science, vol. 9892, Springer International Publishing, pp. 350–368, doi:10.1007/978-3-319-44953-1_23, ISBN   978-3-319-44952-4, S2CID   215812381
  6. McCreesh, Ciaran; Prosser, Patrick; Trimble, James (2017), "A Partitioning Algorithm for Maximum Common Subgraph Problems", Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, {IJCAI} 2017, Melbourne, Australia, August 19-25, 2017, ijcai.org, pp. 712–719, doi: 10.24963/ijcai.2017/99 , ISBN   9780999241103
  7. Calabrese, Andrea; Cardone, Lorenzo; Licata, Salvatore; Porro, Marco; Quer, Stefano (2023). A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph. SCITEPRESS - Science and Technology Publications. pp. 197–206. doi: 10.5220/0012130800003538 . ISBN   978-989-758-665-1.
  8. Schietgat, Leander; Ramon, Jan; Bruynooghe, Maurice (2013-12-01). "A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics". Annals of Mathematics and Artificial Intelligence. 69 (4): 343–376. doi:10.1007/s10472-013-9335-0. ISSN   1573-7470.
  9. Ehrlich, Hans‐Christian; Rarey, Matthias (2011). "Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review". WIREs Computational Molecular Science. 1 (1): 68–79. doi:10.1002/wcms.5. ISSN   1759-0876.
  10. Raymond, John W.; Willett, Peter (2002), "Maximum common subgraph isomorphism algorithms for the matching of chemical structures" (PDF), Journal of Computer-Aided Molecular Design, 16 (7): 521–533, Bibcode:2002JCAMD..16..521R, doi:10.1023/A:1021271615909, PMID   12510884, S2CID   5202419 .
  11. Conte, D.; Foggia, P.; Sansone, C.; Vento, M. (2004). "THIRTY YEARS OF GRAPH MATCHING IN PATTERN RECOGNITION". International Journal of Pattern Recognition and Artificial Intelligence. 18 (03): 265–298. doi:10.1142/S0218001404003228. ISSN   0218-0014.