De Bruijn graph

Last updated

In graph theory, an n-dimensional De Bruijn graph of m symbols is a directed graph representing overlaps between sequences of symbols. It has mn vertices, consisting of all possible length-n sequences of the given symbols; the same symbol may appear multiple times in a sequence. For a set of m symbols S = {s1, …, sm}, the set of vertices is:

Contents

If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs (that is, directed edges) is

Although De Bruijn graphs are named after Nicolaas Govert de Bruijn, they were invented independently by both de Bruijn [1] and I. J. Good. [2] Much earlier, Camille Flye Sainte-Marie [3] implicitly used their properties.

Properties

The line graph construction of the three smallest binary De Bruijn graphs is depicted below. As can be seen in the illustration, each vertex of the n-dimensional De Bruijn graph corresponds to an edge of the (n – 1)-dimensional De Bruijn graph, and each edge in the n-dimensional De Bruijn graph corresponds to a two-edge path in the (n – 1)-dimensional De Bruijn graph.

DeBruijn-as-line-digraph.svg

Dynamical systems

Binary De Bruijn graphs can be drawn in such a way that they resemble objects from the theory of dynamical systems, such as the Lorenz attractor:

DeBruijn-3-2.svg
Binary De Bruijn graph
Lorenz attractor2.svg
Lorenz attractor

This analogy can be made rigorous: the n-dimensional m-symbol De Bruijn graph is a model of the Bernoulli map

The Bernoulli map (also called the 2x mod 1 map for m = 2) is an ergodic dynamical system, which can be understood to be a single shift of a m-adic number. [5] The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real x in the interval [0,1) to the vertex corresponding to the first n digits in the base-m representation of x. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided subshift of finite type.

Directed graphs of two B(2,3) de Bruijn sequences and a B(2,4) sequence. In B(2,3), each vertex is visited once, whereas in B(2,4), each edge is traversed once. De Bruijn binary graph.svg
Directed graphs of two B(2,3) de Bruijn sequences and a B(2,4) sequence. In B(2,3), each vertex is visited once, whereas in B(2,4), each edge is traversed once.

Embeddings resembling this one can be used to show that the binary De Bruijn graphs have queue number 2 [6] and that they have book thickness at most 5. [7]

Uses

See also

Related Research Articles

<span class="mw-page-title-main">Dual polyhedron</span> Polyhedron associated with another by swapping vertices for faces

In geometry, every polyhedron is associated with a second dual structure, where the vertices of one correspond to the faces of the other, and the edges between pairs of vertices of one correspond to the edges between pairs of faces of the other. Such dual figures remain combinatorial or abstract polyhedra, but not all can also be constructed as geometric polyhedra. Starting with any given polyhedron, the dual of its dual is the original polyhedron.

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

In combinatorics, Ramsey's theorem, in one of its graph-theoretic forms, states that one will find monochromatic cliques in any edge labelling (with colours) of a sufficiently large complete graph. To demonstrate the theorem for two colours (say, blue and red), let r and s be any two positive integers. Ramsey's theorem states that there exists a least positive integer R(r, s) for which every blue-red edge colouring of the complete graph on R(r, s) vertices contains a blue clique on r vertices or a red clique on s vertices. (Here R(r, s) signifies an integer that depends on both r and s.)

<span class="mw-page-title-main">Hypergraph</span> Generalization of graph theory

In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.

In graph theory, a branch of mathematics, the (binary) cycle space of an undirected graph is the set of its even-degree subgraphs.

<span class="mw-page-title-main">Eulerian path</span> Trail in a graph that visits each edge once

In graph theory, an Eulerian trail is a trail in a finite graph that visits every edge exactly once. Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. The problem can be stated mathematically like this:

<span class="mw-page-title-main">Random graph</span> Graph generated by a random process

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

<span class="mw-page-title-main">Clique (graph theory)</span> Adjacent subset of an undirected graph

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph is an induced subgraph of that is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

In mathematics, in the areas of order theory and combinatorics, Dilworth's theorem characterizes the width of any finite partially ordered set in terms of a partition of the order into a minimum number of chains. It is named for the mathematician Robert P. Dilworth (1950).

<span class="mw-page-title-main">Loop-erased random walk</span> Model for a random simple path

In mathematics, loop-erased random walk is a model for a random simple path with important applications in combinatorics, physics and quantum field theory. It is intimately connected to the uniform spanning tree, a model for a random tree. See also random walk for more general treatment of this topic.

In the mathematical field of graph theory, Kirchhoff's theorem or Kirchhoff's matrix tree theorem named after Gustav Kirchhoff is a theorem about the number of spanning trees in a graph, showing that this number can be computed in polynomial time from the determinant of a submatrix of the Laplacian matrix of the graph; specifically, the number is equal to any cofactor of the Laplacian matrix. Kirchhoff's theorem is a generalization of Cayley's formula which provides the number of spanning trees in a complete graph.

<span class="mw-page-title-main">Degree (graph theory)</span> Number of edges touching a vertex in a graph

In graph theory, the degree of a vertex of a graph is the number of edges that are incident to the vertex; in a multigraph, a loop contributes 2 to a vertex's degree, for the two ends of the edge. The degree of a vertex is denoted or . The maximum degree of a graph , denoted by , and the minimum degree of a graph, denoted by , are the maximum and minimum of its vertices' degrees. In the multigraph shown on the right, the maximum degree is 5 and the minimum degree is 0.

de Bruijn sequence Cycle through all length-k sequences

In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of B(k, n), must start at a different position, because substrings starting at the same position are not distinct. Therefore, B(k, n) must have at leastkn symbols. And since B(k, n) has exactlykn symbols, de Bruijn sequences are optimally short with respect to the property of containing every string of length n at least once.

In graph theory, the metric dimension of a graph G is the minimum cardinality of a subset S of vertices such that all other vertices are uniquely determined by their distances to the vertices in S. Finding the metric dimension of a graph is an NP-hard problem; the decision version, determining whether the metric dimension is less than a given value, is NP-complete.

<span class="mw-page-title-main">Convex polytope</span> Convex hull of a finite set of points in a Euclidean space

A convex polytope is a special case of a polytope, having the additional property that it is also a convex set contained in the -dimensional Euclidean space . Most texts use the term "polytope" for a bounded convex polytope, and the word "polyhedron" for the more general, possibly unbounded object. Others allow polytopes to be unbounded. The terms "bounded/unbounded convex polytope" will be used below whenever the boundedness is critical to the discussed issue. Yet other texts identify a convex polytope with its boundary.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

<i>k</i>-vertex-connected graph Graph which remains connected when k or fewer nodes removed

In graph theory, a connected graph G is said to be k-vertex-connected if it has more than k vertices and remains connected whenever fewer than k vertices are removed.

<span class="mw-page-title-main">Cycle basis</span> Cycles in a graph that generate all cycles

In graph theory, a branch of mathematics, a cycle basis of an undirected graph is a set of simple cycles that forms a basis of the cycle space of the graph. That is, it is a minimal set of cycles that allows every even-degree subgraph to be expressed as a symmetric difference of basis cycles.

In mathematics, a queen's graph is an undirected graph that represents all legal moves of the queen—a chess piece—on a chessboard. In the graph, each vertex represents a square on a chessboard, and each edge is a legal move the queen can make, that is, a horizontal, vertical or diagonal move by any number of squares. If the chessboard has dimensions , then the induced graph is called the queen's graph.

References

  1. de Bruijn, N. G. (1946). "A combinatorial problem". Indagationes Mathematicae . 49: 758–764. MR   0018142.
  2. Good, I. J. (1946). "Normal recurring decimals". Journal of the London Mathematical Society . 21 (3): 167–169. doi:10.1112/jlms/s1-21.3.167.
  3. Flye Sainte-Marie, C. (1894). "Question 48". L'Intermédiaire des Mathématiciens . 1: 107–110.
  4. Zhang, Fu Ji; Lin, Guo Ning (1987). "On the de Bruijn–Good graphs". Acta Mathematica Sinica . 30 (2): 195–205.
  5. Leroux, Philippe (2005). "Coassociative grammar, periodic orbits, and quantum random walk over ". International Journal of Mathematics and Mathematical Sciences (24): 3979–3996. arXiv: quant-ph/0209100 . doi: 10.1155/IJMMS.2005.3979 . MR   2204471.
  6. Heath, Lenwood S.; Rosenberg, Arnold L. (1992). "Laying out graphs using queues". SIAM Journal on Computing . 21 (5): 927–958. doi:10.1137/0221055. MR   1181408.
  7. Obrenić, Bojana (1993). "Embedding de Bruijn and shuffle-exchange graphs in five pages". SIAM Journal on Discrete Mathematics . 6 (4): 642–654. doi:10.1137/0406049. MR   1241401.
  8. Pevzner, Pavel A.; Tang, Haixu; Waterman, Michael S. (2001). "An Eulerian path approach to DNA fragment assembly". Proceedings of the National Academy of Sciences . 98 (17): 9748–9753. Bibcode:2001PNAS...98.9748P. doi: 10.1073/pnas.171285098 . PMC   55524 . PMID   11504945.
  9. Pevzner, Pavel A.; Tang, Haixu (2001). "Fragment Assembly with Double-Barreled Data". Bioinformatics . 17 (Suppl 1): S225–S233. doi: 10.1093/bioinformatics/17.suppl_1.S225 . PMID   11473013.
  10. Zerbino, Daniel R.; Birney, Ewan (2008). "Velvet: algorithms for de novo short read assembly using de Bruijn graphs". Genome Research . 18 (5): 821–829. doi:10.1101/gr.074492.107. PMC   2336801 . PMID   18349386.
  11. Chikhi, R.; Limasset, A.; Jackman, S.; Simpson, J.; Medvedev, P. (2014). "On the representation of de Bruijn graphs". Journal of Computational Biology . 22 (5): 336–52. arXiv: 1401.5383 . doi:10.1089/cmb.2014.0160. PMID   25629448. S2CID   9502231.
  12. Iqbal, Zamin; Caccamo, Mario; Turner, Isaac; Flicek, Paul; McVean, Gil (2012). "De novo assembly and genotyping of variants using colored de Bruijn graphs". Nature Genetics . 44 (2): 226–32. doi:10.1038/ng.1028. PMC   3272472 . PMID   22231483.