De Bruijn graph

Last updated

In graph theory, an n-dimensional De Bruijn graph of m symbols is a directed graph representing overlaps between sequences of symbols. It has mn vertices, consisting of all possible length-n sequences of the given symbols; the same symbol may appear multiple times in a sequence. For a set of m symbols S = {s1, …, sm}, the set of vertices is:

Contents

If one of the vertices can be expressed as another vertex by shifting all its symbols by one place to the left and adding a new symbol at the end of this vertex, then the latter has a directed edge to the former vertex. Thus the set of arcs (that is, directed edges) is

Although De Bruijn graphs are named after Nicolaas Govert de Bruijn, they were invented independently by both de Bruijn [1] and I. J. Good. [2] Much earlier, Camille Flye Sainte-Marie [3] implicitly used their properties.

Properties

The line graph construction of the three smallest binary De Bruijn graphs is depicted below. As can be seen in the illustration, each vertex of the n-dimensional De Bruijn graph corresponds to an edge of the (n – 1)-dimensional De Bruijn graph, and each edge in the n-dimensional De Bruijn graph corresponds to a two-edge path in the (n – 1)-dimensional De Bruijn graph.

DeBruijn-as-line-digraph.svg

Dynamical systems

Binary De Bruijn graphs can be drawn in such a way that they resemble objects from the theory of dynamical systems, such as the Lorenz attractor:

DeBruijn-3-2.svg
Binary De Bruijn graph
Lorenz attractor2.svg
Lorenz attractor

This analogy can be made rigorous: the n-dimensional m-symbol De Bruijn graph is a model of the Bernoulli map

The Bernoulli map (also called the 2x mod 1 map for m = 2) is an ergodic dynamical system, which can be understood to be a single shift of a m-adic number. [5] The trajectories of this dynamical system correspond to walks in the De Bruijn graph, where the correspondence is given by mapping each real x in the interval [0,1) to the vertex corresponding to the first n digits in the base-m representation of x. Equivalently, walks in the De Bruijn graph correspond to trajectories in a one-sided subshift of finite type.

Directed graphs of two B(2,3) de Bruijn sequences and a B(2,4) sequence. In B(2,3), each vertex is visited once, whereas in B(2,4), each edge is traversed once. De Bruijn binary graph.svg
Directed graphs of two B(2,3) de Bruijn sequences and a B(2,4) sequence. In B(2,3), each vertex is visited once, whereas in B(2,4), each edge is traversed once.

Embeddings resembling this one can be used to show that the binary De Bruijn graphs have queue number 2 [6] and that they have book thickness at most 5. [7]

Uses

See also

Related Research Articles

<span class="mw-page-title-main">Dual polyhedron</span> Polyhedron associated with another by swapping vertices for faces

In geometry, every polyhedron is associated with a second dual structure, where the vertices of one correspond to the faces of the other, and the edges between pairs of vertices of one correspond to the edges between pairs of faces of the other. Such dual figures remain combinatorial or abstract polyhedra, but not all can also be constructed as geometric polyhedra. Starting with any given polyhedron, the dual of its dual is the original polyhedron.

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

In combinatorics, Ramsey's theorem, in one of its graph-theoretic forms, states that one will find monochromatic cliques in any edge labelling (with colours) of a sufficiently large complete graph. To demonstrate the theorem for two colours (say, blue and red), let r and s be any two positive integers. Ramsey's theorem states that there exists a least positive integer R(r, s) for which every blue-red edge colouring of the complete graph on R(r, s) vertices contains a blue clique on r vertices or a red clique on s vertices. (Here R(r, s) signifies an integer that depends on both r and s.)

<span class="mw-page-title-main">Hypergraph</span> Generalization of graph theory

In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.

In graph theory, a branch of mathematics, the (binary) cycle space of an undirected graph is the set of its even-degree subgraphs.

<span class="mw-page-title-main">Eulerian path</span> Trail in a graph that visits each edge once

In graph theory, an Eulerian trail is a trail in a finite graph that visits every edge exactly once. Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. The problem can be stated mathematically like this:

<span class="mw-page-title-main">Turán graph</span> Balanced complete multipartite graph

The Turán graph, denoted by , is a complete multipartite graph; it is formed by partitioning a set of vertices into subsets, with sizes as equal as possible, and then connecting two vertices by an edge if and only if they belong to different subsets. Where and are the quotient and remainder of dividing by , the graph is of the form , and the number of edges is

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

<span class="mw-page-title-main">Clique (graph theory)</span> Adjacent subset of an undirected graph

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. That is, a clique of a graph is an induced subgraph of that is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

In the mathematical theory of matroids, a graphic matroid is a matroid whose independent sets are the forests in a given finite undirected graph. The dual matroids of graphic matroids are called co-graphic matroids or bond matroids. A matroid that is both graphic and co-graphic is sometimes called a planar matroid ; these are exactly the graphic matroids formed from planar graphs.

In mathematics, in the areas of order theory and combinatorics, Dilworth's theorem characterizes the width of any finite partially ordered set in terms of a partition of the order into a minimum number of chains. It is named for the mathematician Robert P. Dilworth.

<span class="mw-page-title-main">Loop-erased random walk</span> Model for a random simple path

In mathematics, loop-erased random walk is a model for a random simple path with important applications in combinatorics, physics and quantum field theory. It is intimately connected to the uniform spanning tree, a model for a random tree. See also random walk for more general treatment of this topic.

<span class="mw-page-title-main">Tournament (graph theory)</span> Directed graph where each vertex pair has one arc

A tournament is a directed graph (digraph) obtained by assigning a direction for each edge in an undirected complete graph. That is, it is an orientation of a complete graph, or equivalently a directed graph in which every pair of distinct vertices is connected by a directed edge with any one of the two possible orientations.

de Bruijn sequence Cycle through all length-k sequences

In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of B(k, n), must start at a different position, because substrings starting at the same position are not distinct. Therefore, B(k, n) must have at leastkn symbols. And since B(k, n) has exactlykn symbols, de Bruijn sequences are optimally short with respect to the property of containing every string of length n at least once.

<span class="mw-page-title-main">Convex polytope</span> Convex hull of a finite set of points in a Euclidean space

A convex polytope is a special case of a polytope, having the additional property that it is also a convex set contained in the -dimensional Euclidean space . Most texts use the term "polytope" for a bounded convex polytope, and the word "polyhedron" for the more general, possibly unbounded object. Others allow polytopes to be unbounded. The terms "bounded/unbounded convex polytope" will be used below whenever the boundedness is critical to the discussed issue. Yet other texts identify a convex polytope with its boundary.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

The Zarankiewicz problem, an unsolved problem in mathematics, asks for the largest possible number of edges in a bipartite graph that has a given number of vertices and has no complete bipartite subgraphs of a given size. It belongs to the field of extremal graph theory, a branch of combinatorics, and is named after the Polish mathematician Kazimierz Zarankiewicz, who proposed several special cases of the problem in 1951.

<span class="mw-page-title-main">Rook's graph</span> Graph of chess rook moves

In graph theory, a rook's graph is an undirected graph that represents all legal moves of the rook chess piece on a chessboard. Each vertex of a rook's graph represents a square on a chessboard, and there is an edge between any two squares sharing a row (rank) or column (file), the squares that a rook can move between. These graphs can be constructed for chessboards of any rectangular shape. Although rook's graphs have only minor significance in chess lore, they are more important in the abstract mathematics of graphs through their alternative constructions: rook's graphs are the Cartesian product of two complete graphs, and are the line graphs of complete bipartite graphs. The square rook's graphs constitute the two-dimensional Hamming graphs.

<span class="mw-page-title-main">Trace diagram</span> Graphical means of performing computations in linear algebra

In mathematics, trace diagrams are a graphical means of performing computations in linear and multilinear algebra. They can be represented as graphs in which some edges are labeled by matrices. The simplest trace diagrams represent the trace and determinant of a matrix. Several results in linear algebra, such as Cramer's Rule and the Cayley–Hamilton theorem, have simple diagrammatic proofs. They are closely related to Penrose's graphical notation.

In mathematics, a queen's graph is an undirected graph that represents all legal moves of the queen—a chess piece—on a chessboard. In the graph, each vertex represents a square on a chessboard, and each edge is a legal move the queen can make, that is, a horizontal, vertical or diagonal move by any number of squares. If the chessboard has dimensions , then the induced graph is called the queen's graph.

References

  1. de Bruijn, N. G. (1946). "A combinatorial problem". Indagationes Mathematicae . 49: 758–764. MR   0018142.
  2. Good, I. J. (1946). "Normal recurring decimals". Journal of the London Mathematical Society . 21 (3): 167–169. doi:10.1112/jlms/s1-21.3.167.
  3. Flye Sainte-Marie, C. (1894). "Question 48". L'Intermédiaire des Mathématiciens . 1: 107–110.
  4. Zhang, Fu Ji; Lin, Guo Ning (1987). "On the de Bruijn–Good graphs". Acta Mathematica Sinica . 30 (2): 195–205.
  5. Leroux, Philippe (2005). "Coassociative grammar, periodic orbits, and quantum random walk over ". International Journal of Mathematics and Mathematical Sciences. 2005 (24): 3979–3996. arXiv: quant-ph/0209100 . doi: 10.1155/IJMMS.2005.3979 . MR   2204471.
  6. Heath, Lenwood S.; Rosenberg, Arnold L. (1992). "Laying out graphs using queues". SIAM Journal on Computing . 21 (5): 927–958. doi:10.1137/0221055. MR   1181408.
  7. Obrenić, Bojana (1993). "Embedding de Bruijn and shuffle-exchange graphs in five pages". SIAM Journal on Discrete Mathematics . 6 (4): 642–654. doi:10.1137/0406049. MR   1241401.
  8. Pevzner, Pavel A.; Tang, Haixu; Waterman, Michael S. (2001). "An Eulerian path approach to DNA fragment assembly". Proceedings of the National Academy of Sciences . 98 (17): 9748–9753. Bibcode:2001PNAS...98.9748P. doi: 10.1073/pnas.171285098 . PMC   55524 . PMID   11504945.
  9. Pevzner, Pavel A.; Tang, Haixu (2001). "Fragment Assembly with Double-Barreled Data". Bioinformatics . 17 (Suppl 1): S225–S233. doi: 10.1093/bioinformatics/17.suppl_1.S225 . PMID   11473013.
  10. Zerbino, Daniel R.; Birney, Ewan (2008). "Velvet: algorithms for de novo short read assembly using de Bruijn graphs". Genome Research . 18 (5): 821–829. doi:10.1101/gr.074492.107. PMC   2336801 . PMID   18349386.
  11. Chikhi, R.; Limasset, A.; Jackman, S.; Simpson, J.; Medvedev, P. (2014). "On the representation of de Bruijn graphs". Journal of Computational Biology . 22 (5): 336–52. arXiv: 1401.5383 . doi:10.1089/cmb.2014.0160. PMID   25629448. S2CID   9502231.
  12. Iqbal, Zamin; Caccamo, Mario; Turner, Isaac; Flicek, Paul; McVean, Gil (2012). "De novo assembly and genotyping of variants using colored de Bruijn graphs". Nature Genetics . 44 (2): 226–32. doi:10.1038/ng.1028. PMC   3272472 . PMID   22231483.