Graph canonization

Last updated

In graph theory, a branch of mathematics, graph canonization is the problem of finding a canonical form of a given graph G. A canonical form is a labeled graph Canon(G) that is isomorphic to G, such that every graph that is isomorphic to G has the same canonical form as G. Thus, from a solution to the graph canonization problem, one could also solve the problem of graph isomorphism: to test whether two graphs G and H are isomorphic, compute their canonical forms Canon(G) and Canon(H), and test whether these two canonical forms are identical.

Contents

The canonical form of a graph is an example of a complete graph invariant: every two isomorphic graphs have the same canonical form, and every two non-isomorphic graphs have different canonical forms. [1] [2] Conversely, every complete invariant of graphs may be used to construct a canonical form. [3] The vertex set of an n-vertex graph may be identified with the integers from 1 to n, and using such an identification a canonical form of a graph may also be described as a permutation of its vertices. Canonical forms of a graph are also called canonical labelings, [4] and graph canonization is also sometimes known as graph canonicalization.

Computational complexity

Unsolved problem in computer science:
Is graph canonization polynomial time equivalent to the graph isomorphism problem?

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic. Clearly, the graph canonization problem is at least as computationally hard as the graph isomorphism problem. In fact, graph isomorphism is even AC0-reducible to graph canonization. However, it is still an open question whether the two problems are polynomial time equivalent. [2]

In 2019, László Babai announced a quasi-polynomial time algorithm for graph canonization, that is, one with running time for some fixed . [5] While the existence of (deterministic) polynomial algorithms for graph isomorphism is still an open problem in computational complexity theory, in 1977 László Babai reported that with probability at least 1  exp(O(n)), a simple vertex classification algorithm produces a canonical labeling of a graph chosen uniformly at random from the set of all n-vertex graphs after only two refinement steps. Small modifications and an added depth-first search step produce canonical labeling of such uniformly-chosen random graphs in linear expected time. This result sheds some light on the fact why many reported graph isomorphism algorithms behave well in practice. [6] [7] This was an important breakthrough in probabilistic complexity theory which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.

A commonly known canonical form is the lexicographically smallest graph within the isomorphism class, which is the graph of the class with lexicographically smallest adjacency matrix considered as a linear string. However, the computation of the lexicographically smallest graph is NP-hard. [8]

For trees, a concise polynomial canonization algorithm requiring O(n) space is presented by Read (1972). [9] Begin by labeling each vertex with the string 01. Iteratively for each non-leaf x remove the leading 0 and trailing 1 from x's label; then sort x's label along with the labels of all adjacent leaves in lexicographic order. Concatenate these sorted labels, add back a leading 0 and trailing 1, make this the new label of x, and delete the adjacent leaves. If there are two vertices remaining, concatenate their labels in lexicographic order.

Applications

Graph canonization is the essence of many graph isomorphism algorithms. One of the leading tools is Nauty. [10]

A common application of graph canonization is in graphical data mining, in particular in chemical database applications. [11]

A number of identifiers for chemical substances, such as SMILES and InChI use canonicalization steps in their computation, which is essentially the canonicalization of the graph which represents the molecule. [12] [13] [14] These identifiers are designed to provide a standard (and sometimes human-readable) way to encode molecular information and to facilitate the search for such information in databases and on the web.

Related Research Articles

The Hamiltonian path problem is a topic discussed in the fields of complexity theory and graph theory. It decides if a directed or undirected graph, G, contains a Hamiltonian path, a path that visits every vertex in the graph exactly once. The problem may specify the start and end of the path, in which case the starting vertex s and ending vertex t must be identified.

<span class="mw-page-title-main">Interactive proof system</span>

In computational complexity theory, an interactive proof system is an abstract machine that models computation as the exchange of messages between two parties: a prover and a verifier. The parties interact by exchanging messages in order to ascertain whether a given string belongs to a language or not. The prover possesses unlimited computational resources but cannot be trusted, while the verifier has bounded computation power but is assumed to be always honest. Messages are sent between the verifier and prover until the verifier has an answer to the problem and has "convinced" itself that it is correct.

<span class="mw-page-title-main">Graph isomorphism</span> Bijection between the vertex set of two graphs

In graph theory, an isomorphism of graphsG and H is a bijection between the vertex sets of G and H

<span class="mw-page-title-main">Independent set (graph theory)</span> Unrelated vertices in graphs

In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.

<span class="mw-page-title-main">Chromatic polynomial</span> Function in algebraic graph theory

The chromatic polynomial is a graph polynomial studied in algebraic graph theory, a branch of mathematics. It counts the number of graph colorings as a function of the number of colors and was originally defined by George David Birkhoff to study the four color problem. It was generalised to the Tutte polynomial by Hassler Whitney and W. T. Tutte, linking it to the Potts model of statistical physics.

<span class="mw-page-title-main">Graph property</span> Property of graphs that depends only on abstract structure

In graph theory, a graph property or graph invariant is a property of graphs that depends only on the abstract structure, not on graph representations such as particular labellings or drawings of the graph.

In the mathematical discipline of graph theory, a feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles. Equivalently, each FVS contains at least one vertex of any cycle in the graph. The feedback vertex set number of a graph is the size of a smallest feedback vertex set. The minimum feedback vertex set problem is an NP-complete problem; it was among the first problems shown to be NP-complete. It has wide applications in operating systems, database systems, and VLSI chip design.

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.

<span class="mw-page-title-main">László Babai</span> Hungarian mathematician and computer scientist

László "Laci" Babai is a Hungarian professor of computer science and mathematics at the University of Chicago. His research focuses on computational complexity theory, algorithms, combinatorics, and finite groups, with an emphasis on the interactions between these fields.

In computational complexity, problems that are in the complexity class NP but are neither in the class P nor NP-complete are called NP-intermediate, and the class of such problems is called NPI. Ladner's theorem, shown in 1975 by Richard E. Ladner, is a result asserting that, if P ≠ NP, then NPI is not empty; that is, NP contains problems that are neither in P nor NP-complete. Since it is also true that if NPI problems exist, then P ≠ NP, it follows that P = NP if and only if NPI is empty.

In computational complexity theory, a gadget is a subunit of a problem instance that simulates the behavior of one of the fundamental units of a different computational problem. Gadgets are typically used to construct reductions from one computational problem to another, as part of proofs of NP-completeness or other types of computational hardness. The component design technique is a method for constructing reductions by using gadgets.

In the mathematical field of graph theory, an automorphism of a graph is a form of symmetry in which the graph is mapped onto itself while preserving the edge–vertex connectivity.

In graph theory, a branch of mathematics, a skew-symmetric graph is a directed graph that is isomorphic to its own transpose graph, the graph formed by reversing all of its edges, under an isomorphism that is an involution without any fixed points. Skew-symmetric graphs are identical to the double covering graphs of bidirected graphs.

In abstract algebra, the group isomorphism problem is the decision problem of determining whether two given finite group presentations refer to isomorphic groups.

<span class="mw-page-title-main">NP-completeness</span> Complexity class

In computational complexity theory, a problem is NP-complete when:

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short solution.
  3. The correctness of each solution can be verified quickly and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

In the study of graph algorithms, an implicit graph representation is a graph whose vertices or edges are not represented as explicit objects in a computer's memory, but rather are determined algorithmically from some other input, for example a computable function.

Eugene Michael Luks is an American mathematician and computer scientist, a professor emeritus of computer and information science at the University of Oregon. He is known for his research on the graph isomorphism problem and on algorithms for computational group theory.

Anna Lubiw is a computer scientist known for her work in computational geometry and graph theory. She is currently a professor at the University of Waterloo.

In the mathematical fields of graph theory and finite model theory, the logic of graphs deals with formal specifications of graph properties using sentences of mathematical logic. There are several variations in the types of logical operation that can be used in these sentences. The first-order logic of graphs concerns sentences in which the variables and predicates concern individual vertices and edges of a graph, while monadic second-order graph logic allows quantification over sets of vertices or edges. Logics based on least fixed point operators allow more general predicates over tuples of vertices, but these predicates can only be constructed through fixed-point operators, restricting their power.

In graph theory and theoretical computer science, the colour refinement algorithm also known as the naive vertex classification, or the 1-dimensional version of the Weisfeiler-Leman algorithm, is a routine used for testing whether two graphs are isomorphic. While it solves graph isomorphism on almost all graphs, there are graphs such as all regular graphs that cannot be distinguished using colour refinement.

References

  1. Arvind, Vikraman; Das, Bireswar; Köbler, Johannes (2008), "A logspace algorithm for partial 2-tree canonization", Computer Science – Theory and Applications: Third International Computer Science Symposium in Russia, CSR 2008 Moscow, Russia, June 7-12, 2008, Proceedings, Lecture Notes in Comput. Sci., vol. 5010, Springer, Berlin, pp. 40–51, doi:10.1007/978-3-540-79709-8_8, ISBN   978-3-540-79708-1, MR   2475148 .
  2. 1 2 Arvind, V.; Das, Bireswar; Köbler, Johannes (2007), "The space complexity of k-tree isomorphism", Algorithms and Computation: 18th International Symposium, ISAAC 2007, Sendai, Japan, December 17-19, 2007, Proceedings, Lecture Notes in Comput. Sci., vol. 4835, Springer, Berlin, pp. 822–833, doi: 10.1007/978-3-540-77120-3_71 , ISBN   978-3-540-77118-0, MR   2472661 .
  3. Gurevich, Yuri (1997), "From invariants to canonization" (PDF), Bulletin of the European Association for Theoretical Computer Science (63): 115–119, MR   1621595 .
  4. Babai, László; Luks, Eugene (1983), "Canonical labeling of graphs", Proc. 15th ACM Symposium on Theory of Computing , pp. 171–183, doi: 10.1145/800061.808746 , ISBN   0-89791-099-0 .
  5. Babai, László (June 23, 2019), Canonical Form for Graphs in Quasipolynomial Time
  6. Babai, László (1977), On the Isomorphism Problem, unpublished manuscript.
  7. Babai, László; Kucera, L. (1979), "Canonical labeling of graphs in linear average time", Proc. 20th Annual IEEE Symposium on Foundations of Computer Science , pp. 39–46, doi:10.1109/SFCS.1979.8, S2CID   14697933 .
  8. Babai, László; Luks, E. (1983), "Canonical labeling of graphs", Proc. 15th ACM Symposium on Theory of Computing, pp. 171–183
  9. Read, Ronald C. (1972), "The coding of various kinds of unlabeled trees", Graph Theory and Computing, Academic Press, New York, pp. 153–182, MR   0344150 .
  10. McKay, Brendan D.; Piperno, Adolfo (2014), "Journal of Symbolic Computation", Practical graph isomorphism, II, vol. 60, pp. 94–112, arXiv: 1301.1493 , doi: 10.1016/j.jsc.2013.09.003 , ISSN   0747-7171, S2CID   17930927 .
  11. Cook, Diane J.; Holder, Lawrence B. (2007), "6.2.1. Canonical Labeling", Mining Graph Data, John Wiley & Sons, pp. 120–122, ISBN   978-0-470-07303-2 .
  12. Weininger, David; Weininger, Arthur; Weininger, Joseph L. (May 1989). "SMILES. 2. Algorithm for generation of unique SMILES notation". Journal of Chemical Information and Modeling. 29 (2): 97–101. doi:10.1021/ci00062a008. S2CID   6621315.
  13. Kelley, Brian (May 2003). "Graph Canonicalization". Dr. Dobb's Journal.
  14. Scheider, Nadine; Sayle, Roger A.; Landrum, Gregory A. (October 2015). "Get Your Atoms in Order — An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm". Journal of Chemical Information and Modeling. 55 (10): 2111–2120. doi:10.1021/acs.jcim.5b00543. PMID   26441310.