Graph canonization

Last updated

In graph theory, a branch of mathematics, graph canonization is the problem of finding a canonical form of a given graph G. A canonical form is a labeled graph Canon(G) that is isomorphic to G, such that every graph that is isomorphic to G has the same canonical form as G. Thus, from a solution to the graph canonization problem, one could also solve the problem of graph isomorphism: to test whether two graphs G and H are isomorphic, compute their canonical forms Canon(G) and Canon(H), and test whether these two canonical forms are identical.

Contents

The canonical form of a graph is an example of a complete graph invariant: every two isomorphic graphs have the same canonical form, and every two non-isomorphic graphs have different canonical forms. [1] [2] Conversely, every complete invariant of graphs may be used to construct a canonical form. [3] The vertex set of an n-vertex graph may be identified with the integers from 1 to n, and using such an identification a canonical form of a graph may also be described as a permutation of its vertices. Canonical forms of a graph are also called canonical labelings, [4] and graph canonization is also sometimes known as graph canonicalization.

Computational complexity

Clearly, the graph canonization problem is at least as computationally hard as the graph isomorphism problem. In fact, graph isomorphism is even AC0-reducible to graph canonization. However, it is still an open question whether the two problems are polynomial time equivalent. [2]

While the existence of (deterministic) polynomial algorithms for graph isomorphism is still an open problem in computational complexity theory, in 1977 László Babai reported that with probability at least 1  exp(O(n)), a simple vertex classification algorithm produces a canonical labeling of a graph chosen uniformly at random from the set of all n-vertex graphs after only two refinement steps. Small modifications and an added depth-first search step produce canonical labeling of such uniformly-chosen random graphs in linear expected time. This result sheds some light on the fact why many reported graph isomorphism algorithms behave well in practice. [5] [6] This was an important breakthrough in probabilistic complexity theory which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.

A commonly known canonical form is the lexicographically smallest graph within the isomorphism class, which is the graph of the class with lexicographically smallest adjacency matrix considered as a linear string. However, the computation of the lexicographically smallest graph is NP-hard. [7]

For trees, a concise polynomial canonization algorithm requiring O(n) space is presented by Read (1972). [8] Begin by labeling each vertex with the string 01. Iteratively for each non-leaf x remove the leading 0 and trailing 1 from x's label; then sort x's label along with the labels of all adjacent leaves in lexicographic order. Concatenate these sorted labels, add back a leading 0 and trailing 1, make this the new label of x, and delete the adjacent leaves. If there are two vertices remaining, concatenate their labels in lexicographic order.

Applications

Graph canonization is the essence of many graph isomorphism algorithms. One of the leading tools is Nauty. [9]

A common application of graph canonization is in graphical data mining, in particular in chemical database applications. [10]

A number of identifiers for chemical substances, such as SMILES and InChI use canonicalization steps in their computation, which is essentially the canonicalization of the graph which represents the molecule. [11] [12] [13] These identifiers are designed to provide a standard (and sometimes human-readable) way to encode molecular information and to facilitate the search for such information in databases and on the web.

Related Research Articles

In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

<span class="mw-page-title-main">Interactive proof system</span>

In computational complexity theory, an interactive proof system is an abstract machine that models computation as the exchange of messages between two parties: a prover and a verifier. The parties interact by exchanging messages in order to ascertain whether a given string belongs to a language or not. The prover possesses unlimited computational resources but cannot be trusted, while the verifier has bounded computation power but is assumed to be always honest. Messages are sent between the verifier and prover until the verifier has an answer to the problem and has "convinced" itself that it is correct.

<span class="mw-page-title-main">Graph isomorphism</span> Bijection between the vertex set of two graphs

In graph theory, an isomorphism of graphsG and H is a bijection between the vertex sets of G and H

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

In graph theory, the Robertson–Seymour theorem states that the undirected graphs, partially ordered by the graph minor relationship, form a well-quasi-ordering. Equivalently, every family of graphs that is closed under minors can be defined by a finite set of forbidden minors, in the same way that Wagner's theorem characterizes the planar graphs as being the graphs that do not have the complete graph K5 or the complete bipartite graph K3,3 as minors.

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

<span class="mw-page-title-main">Chromatic polynomial</span> Function in algebraic graph theory

The chromatic polynomial is a graph polynomial studied in algebraic graph theory, a branch of mathematics. It counts the number of graph colorings as a function of the number of colors and was originally defined by George David Birkhoff to study the four color problem. It was generalised to the Tutte polynomial by Hassler Whitney and W. T. Tutte, linking it to the Potts model of statistical physics.

<span class="mw-page-title-main">Graph property</span> Property of graphs that depends only on abstract structure

In graph theory, a graph property or graph invariant is a property of graphs that depends only on the abstract structure, not on graph representations such as particular labellings or drawings of the graph.

In the mathematical discipline of graph theory, a feedback vertex set (FVS) of a graph is a set of vertices whose removal leaves a graph without cycles. Equivalently, each FVS contains at least one vertex of any cycle in the graph. The feedback vertex set number of a graph is the size of a smallest feedback vertex set. The minimum feedback vertex set problem is an NP-complete problem; it was among the first problems shown to be NP-complete. It has wide applications in operating systems, database systems, and VLSI chip design.

The graph isomorphism problem is the computational problem of determining whether two finite graphs are isomorphic.

<span class="mw-page-title-main">László Babai</span> Hungarian mathematician and computer scientist

László "Laci" Babai is a Hungarian professor of computer science and mathematics at the University of Chicago. His research focuses on computational complexity theory, algorithms, combinatorics, and finite groups, with an emphasis on the interactions between these fields.

In computational complexity, problems that are in the complexity class NP but are neither in the class P nor NP-complete are called NP-intermediate, and the class of such problems is called NPI. Ladner's theorem, shown in 1975 by Richard E. Ladner, is a result asserting that, if P ≠ NP, then NPI is not empty; that is, NP contains problems that are neither in P nor NP-complete. Since it is also true that if NPI problems exist, then P ≠ NP, it follows that P = NP if and only if NPI is empty.

In computational complexity theory, a gadget is a subunit of a problem instance that simulates the behavior of one of the fundamental units of a different computational problem. Gadgets are typically used to construct reductions from one computational problem to another, as part of proofs of NP-completeness or other types of computational hardness. The component design technique is a method for constructing reductions by using gadgets.

In the mathematical field of graph theory, an automorphism of a graph is a form of symmetry in which the graph is mapped onto itself while preserving the edge–vertex connectivity.

<span class="mw-page-title-main">NP-completeness</span> Complexity class

In computational complexity theory, a problem is NP-complete when:

  1. It is a decision problem, meaning that for any input to the problem, the output is either "yes" or "no".
  2. When the answer is "yes", this can be demonstrated through the existence of a short solution.
  3. The correctness of each solution can be verified quickly and a brute-force search algorithm can find a solution by trying all possible solutions.
  4. The problem can be used to simulate every other problem for which we can verify quickly that a solution is correct. In this sense, NP-complete problems are the hardest of the problems to which solutions can be verified quickly. If we could find solutions of some NP-complete problem quickly, we could quickly find the solutions of every other problem to which a given solution can be easily verified.

In the study of graph algorithms, an implicit graph representation is a graph whose vertices or edges are not represented as explicit objects in a computer's memory, but rather are determined algorithmically from some other input, for example a computable function.

Eugene Michael Luks is an American mathematician and computer scientist, a professor emeritus of computer and information science at the University of Oregon. He is known for his research on the graph isomorphism problem and on algorithms for computational group theory.

Anna Lubiw is a computer scientist known for her work in computational geometry and graph theory. She is currently a professor at the University of Waterloo.

In the mathematical fields of graph theory and finite model theory, the logic of graphs deals with formal specifications of graph properties using sentences of mathematical logic. There are several variations in the types of logical operation that can be used in these sentences. The first-order logic of graphs concerns sentences in which the variables and predicates concern individual vertices and edges of a graph, while monadic second-order graph logic allows quantification over sets of vertices or edges. Logics based on least fixed point operators allow more general predicates over tuples of vertices, but these predicates can only be constructed through fixed-point operators, restricting their power.

References

  1. Arvind, Vikraman; Das, Bireswar; Köbler, Johannes (2008), "A logspace algorithm for partial 2-tree canonization", Computer Science – Theory and Applications: Third International Computer Science Symposium in Russia, CSR 2008 Moscow, Russia, June 7-12, 2008, Proceedings, Lecture Notes in Comput. Sci., vol. 5010, Springer, Berlin, pp. 40–51, doi:10.1007/978-3-540-79709-8_8, MR   2475148 .
  2. 1 2 Arvind, V.; Das, Bireswar; Köbler, Johannes (2007), "The space complexity of k-tree isomorphism", Algorithms and Computation: 18th International Symposium, ISAAC 2007, Sendai, Japan, December 17-19, 2007, Proceedings, Lecture Notes in Comput. Sci., vol. 4835, Springer, Berlin, pp. 822–833, doi: 10.1007/978-3-540-77120-3_71 , MR   2472661 .
  3. Gurevich, Yuri (1997), "From invariants to canonization" (PDF), Bulletin of the European Association for Theoretical Computer Science (63): 115–119, MR   1621595 .
  4. Babai, László; Luks, Eugene (1983), "Canonical labeling of graphs", Proc. 15th ACM Symposium on Theory of Computing , pp. 171–183, doi: 10.1145/800061.808746 .
  5. Babai, László (1977), On the Isomorphism Problem, unpublished manuscript.
  6. Babai, László; Kucera, L. (1979), "Canonical labeling of graphs in linear average time", Proc. 20th Annual IEEE Symposium on Foundations of Computer Science , pp. 39–46, doi:10.1109/SFCS.1979.8, S2CID   14697933 .
  7. Babai, László; Luks, E. (1983), "Canonical labeling of graphs", Proc. 15th ACM Symposium on Theory of Computing, pp. 171–183
  8. Read, Ronald C. (1972), "The coding of various kinds of unlabeled trees", Graph Theory and Computing, Academic Press, New York, pp. 153–182, MR   0344150 .
  9. McKay, Brendan D.; Piperno, Adolfo (2014), "Journal of Symbolic Computation", Practical graph isomorphism, II, vol. 60, pp. 94–112, doi: 10.1016/j.jsc.2013.09.003 , ISSN   0747-7171, S2CID   17930927 .
  10. Cook, Diane J.; Holder, Lawrence B. (2007), "6.2.1. Canonical Labeling", Mining Graph Data, John Wiley & Sons, pp. 120–122, ISBN   978-0-470-07303-2 .
  11. Weininger, David; Weininger, Arthur; Weininger, Joseph L. (May 1989). "SMILES. 2. Algorithm for generation of unique SMILES notation". Journal of Chemical Information and Modeling. 29 (2): 97–101. doi:10.1021/ci00062a008. S2CID   6621315.
  12. Kelley, Brian (May 2003). "Graph Canonicalization". Dr. Dobb's Journal.
  13. Scheider, Nadine; Sayle, Roger A.; Landrum, Gregory A. (October 2015). "Get Your Atoms in Order — An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm". Journal of Chemical Information and Modeling. 55 (10): 2111–2120. doi:10.1021/acs.jcim.5b00543. PMID   26441310.