De Bruijn sequence

Last updated
The de Bruijn sequence for alphabet size k = 2 and substring length n = 2. In general there are many sequences for a particular n and k but in this example it is unique, up to cycling. De Bruijn sequence.svg
The de Bruijn sequence for alphabet size k = 2 and substring length n = 2. In general there are many sequences for a particular n and k but in this example it is unique, up to cycling.

In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of B(k, n), must start at a different position, because substrings starting at the same position are not distinct. Therefore, B(k, n) must have at leastkn symbols. And since B(k, n) has exactlykn symbols, de Bruijn sequences are optimally short with respect to the property of containing every string of length n at least once.

Contents

The number of distinct de Bruijn sequences B(k, n) is

The sequences are named after the Dutch mathematician Nicolaas Govert de Bruijn, who wrote about them in 1946. [1] As he later wrote, [2] the existence of de Bruijn sequences for each order together with the above properties were first proved, for the case of alphabets with two elements, by CamilleFlye Sainte-Marie ( 1894 ). The generalization to larger alphabets is due to Tatyanavan Aardenne-Ehrenfest andde Bruijn ( 1951 ). Automata for recognizing these sequences are denoted as de Bruijn automata.

In most applications, A = {0,1}.

History

The earliest known example of a de Bruijn sequence comes from Sanskrit prosody where, since the work of Pingala, each possible three-syllable pattern of long and short syllables is given a name, such as 'y' for short–long–long and 'm' for long–long–long. To remember these names, the mnemonic yamātārājabhānasalagām is used, in which each three-syllable pattern occurs starting at its name: 'yamātā' has a short–long–long pattern, 'mātārā' has a long–long–long pattern, and so on, until 'salagām' which has a short–short–long pattern. This mnemonic, equivalent to a de Bruijn sequence on binary 3-tuples, is of unknown antiquity, but is at least as old as Charles Philip Brown's 1869 book on Sanskrit prosody that mentions it and considers it "an ancient line, written by Pāṇini". [3]

In 1894, A. de Rivière raised the question in an issue of the French problem journal L'Intermédiaire des Mathématiciens , of the existence of a circular arrangement of zeroes and ones of size that contains all binary sequences of length . The problem was solved (in the affirmative), along with the count of distinct solutions, by Camille Flye Sainte-Marie in the same year. [2] This was largely forgotten, and Martin (1934) proved the existence of such cycles for general alphabet size in place of 2, with an algorithm for constructing them. Finally, when in 1944 Kees Posthumus conjectured the count for binary sequences, de Bruijn proved the conjecture in 1946, through which the problem became well-known. [2]

Karl Popper independently describes these objects in his The Logic of Scientific Discovery (1934), calling them "shortest random-like sequences". [4]

Examples

Construction

A de Bruijn graph. Every four-digit sequence occurs exactly once if one traverses every edge exactly once and returns to one's starting point (a Eulerian cycle). Every three-digit sequence occurs exactly once if one visits every vertex exactly once (a Hamiltonian path). De bruijn graph-for binary sequence of order 4.svg
A de Bruijn graph. Every four-digit sequence occurs exactly once if one traverses every edge exactly once and returns to one's starting point (a Eulerian cycle). Every three-digit sequence occurs exactly once if one visits every vertex exactly once (a Hamiltonian path).

The de Bruijn sequences can be constructed by taking a Hamiltonian path of an n-dimensional de Bruijn graph over k symbols (or equivalently, an Eulerian cycle of an (n  1)-dimensional de Bruijn graph). [5]

An alternative construction involves concatenating together, in lexicographic order, all the Lyndon words whose length divides n. [6]

An inverse Burrows–Wheeler transform can be used to generate the required Lyndon words in lexicographic order. [7]

de Bruijn sequences can also be constructed using shift registers [8] or via finite fields. [9]

Example using de Bruijn graph

Directed graphs of two B(2,3) de Bruijn sequences and a B(2,4) sequence. In B(2,3), each vertex is visited once, whereas in B(2,4), each edge is traversed once. De Bruijn binary graph.svg
Directed graphs of two B(2,3) de Bruijn sequences and a B(2,4) sequence. In B(2,3), each vertex is visited once, whereas in B(2,4), each edge is traversed once.

Goal: to construct a B(2, 4) de Bruijn sequence of length 24 = 16 using Eulerian (n  1 = 4  1 = 3) 3-D de Bruijn graph cycle.

Each edge in this 3-dimensional de Bruijn graph corresponds to a sequence of four digits: the three digits that label the vertex that the edge is leaving followed by the one that labels the edge. If one traverses the edge labeled 1 from 000, one arrives at 001, thereby indicating the presence of the subsequence 0001 in the de Bruijn sequence. To traverse each edge exactly once is to use each of the 16 four-digit sequences exactly once.

For example, suppose we follow the following Eulerian path through these vertices:

000, 000, 001, 011, 111, 111, 110, 101, 011,
110, 100, 001, 010, 101, 010, 100, 000.

These are the output sequences of length k:

0 0 0 0
_ 0 0 0 1
_ _ 0 0 1 1

This corresponds to the following de Bruijn sequence:

0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 1

The eight vertices appear in the sequence in the following way:

      {0  0  0  0} 1  1  1  1  0  1  1  0  0  1  0  1        0 {0  0  0  1} 1  1  1  0  1  1  0  0  1  0  1        0  0 {0  0  1  1} 1  1  0  1  1  0  0  1  0  1        0  0  0 {0  1  1  1} 1  0  1  1  0  0  1  0  1        0  0  0  0 {1  1  1  1} 0  1  1  0  0  1  0  1        0  0  0  0  1 {1  1  1  0} 1  1  0  0  1  0  1        0  0  0  0  1  1 {1  1  0  1} 1  0  0  1  0  1        0  0  0  0  1  1  1 {1  0  1  1} 0  0  1  0  1        0  0  0  0  1  1  1  1 {0  1  1  0} 0  1  0  1        0  0  0  0  1  1  1  1  0 {1  1  0  0} 1  0  1        0  0  0  0  1  1  1  1  0  1 {1  0  0  1} 0  1        0  0  0  0  1  1  1  1  0  1  1 {0  0  1  0} 1        0  0  0  0  1  1  1  1  0  1  1  0 {0  1  0  1}        0} 0  0  0  1  1  1  1  0  1  1  0  0 {1  0  1 ...    ... 0  0} 0  0  1  1  1  1  0  1  1  0  0  1 {0  1 ...    ... 0  0  0} 0  1  1  1  1  0  1  1  0  0  1  0 {1 ...

...and then we return to the starting point. Each of the eight 3-digit sequences (corresponding to the eight vertices) appears exactly twice, and each of the sixteen 4-digit sequences (corresponding to the 16 edges) appears exactly once.

Example using inverse Burrows—Wheeler transform

Mathematically, an inverse Burrows—Wheeler transform on a word w generates a multi-set of equivalence classes consisting of strings and their rotations. [7] These equivalence classes of strings each contain a Lyndon word as a unique minimum element, so the inverse Burrows—Wheeler transform can be considered to generate a set of Lyndon words. It can be shown that if we perform the inverse Burrows—Wheeler transform on a word w consisting of the size-k alphabet repeated kn−1 times (so that it will produce a word the same length as the desired de Bruijn sequence), then the result will be the set of all Lyndon words whose length divides n. It follows that arranging these Lyndon words in lexicographic order will yield a de Bruijn sequence B(k,n), and that this will be the first de Bruijn sequence in lexicographic order. The following method can be used to perform the inverse Burrows—Wheeler transform, using its standard permutation:

  1. Sort the characters in the string w, yielding a new string w
  2. Position the string w above the string w, and map each letter's position in w to its position in w while preserving order. This process defines the Standard Permutation.
  3. Write this permutation in cycle notation with the smallest position in each cycle first, and the cycles sorted in increasing order.
  4. For each cycle, replace each number with the corresponding letter from string w in that position.
  5. Each cycle has now become a Lyndon word, and they are arranged in lexicographic order, so dropping the parentheses yields the first de Bruijn sequence.

For example, to construct the smallest B(2,4) de Bruijn sequence of length 24 = 16, repeat the alphabet (ab) 8 times yielding w=abababababababab. Sort the characters in w, yielding w=aaaaaaaabbbbbbbb. Position w above w as shown, and map each element in w to the corresponding element in w by drawing a line. Number the columns as shown so we can read the cycles of the permutation:

BurrowsWheeler- standard permutation.svg

Starting from the left, the Standard Permutation notation cycles are: (1) (2 3 5 9) (4 7 13 10) (6 11) (8 15 14 12) (16). (Standard Permutation)

Then, replacing each number by the corresponding letter in w from that column yields: (a)(aaab)(aabb)(ab)(abbb)(b).

These are all of the Lyndon words whose length divides 4, in lexicographic order, so dropping the parentheses gives B(2,4) = aaaabaabbababbbb.

Algorithm

The following Python code calculates a de Bruijn sequence, given k and n, based on an algorithm from Frank Ruskey's Combinatorial Generation. [10]

fromtypingimportIterable,Union,Anydefde_bruijn(k:Union[Iterable[Any],int],n:int)->str:"""de Bruijn sequence for alphabet k    and subsequences of length n.    """# Two kinds of alphabet input: an integer expands# to a list of integers as the alphabet..ifisinstance(k,int):alphabet=list(map(str,range(k)))else:# While any sort of list becomes used as it isalphabet=kk=len(k)a=[0]*k*nsequence=[]defdb(t,p):ift>n:ifn%p==0:sequence.extend(a[1:p+1])else:a[t]=a[t-p]db(t+1,p)forjinrange(a[t-p]+1,k):a[t]=jdb(t+1,t)db(1,1)return"".join(alphabet[i]foriinsequence)print(de_bruijn(2,3))print(de_bruijn("abcd",2))

which prints

00010111 aabacadbbcbdccdd

Note that these sequences are understood to "wrap around" in a cycle. For example, the first sequence contains 110 and 100 in this fashion.

Uses

de Bruijn cycles are of general use in neuroscience and psychology experiments that examine the effect of stimulus order upon neural systems, [11] and can be specially crafted for use with functional magnetic resonance imaging. [12]

Angle detection

The symbols of a de Bruijn sequence written around a circular object (such as a wheel of a robot) can be used to identify its angle by examining the n consecutive symbols facing a fixed point. This angle-encoding problem is known as the "rotating drum problem". [13] Gray codes can be used as similar rotary positional encoding mechanisms, a method commonly found in rotary encoders.

Finding least- or most-significant set bit in a word

A de Bruijn sequence can be used to quickly find the index of the least significant set bit ("right-most 1") or the most significant set bit ("left-most 1") in a word using bitwise operations and multiplication. [14] The following example uses a de Bruijn sequence to determine the index of the least significant set bit (equivalent to counting the number of trailing '0' bits) in a 32 bit unsigned integer:

uint8_tlowestBitIndex(uint32_tv){staticconstuint8_tBitPositionLookup[32]=// hash table{0,1,28,2,29,14,24,3,30,22,20,15,25,17,4,8,31,27,13,23,21,19,16,7,26,12,18,6,11,5,10,9};returnBitPositionLookup[((uint32_t)((v&-v)*0x077CB531U))>>27];}

The lowestBitIndex() function returns the index of the least-significant set bit in v, or zero if v has no set bits. The constant 0x077CB531U in the expression is the B(2, 5) sequence 00000111011111001011010100110001 (spaces added for clarity). The operation (v & -v) zeros all bits except the least-significant bit set, resulting in a new value which is a power of 2. This power of 2 is multiplied (arithmetic modulo 232) by the de Bruijn sequence, thus producing a 32-bit product in which the bit sequence of the 5 MSBs is unique for each power of 2. The 5 MSBs are shifted into the LSB positions to produce a hash code in the range [0, 31], which is then used as an index into hash table BitPositionLookup. The selected hash table value is the bit index of the least significant set bit in v.

The following example determines the index of the most significant bit set in a 32 bit unsigned integer:

uint32_tkeepHighestBit(uint32_tn){n|=(n>>1);n|=(n>>2);n|=(n>>4);n|=(n>>8);n|=(n>>16);returnn-(n>>1);}uint8_thighestBitIndex(uint32_tv){staticconstuint8_tBitPositionLookup[32]={// hash table0,1,16,2,29,17,3,22,30,20,18,11,13,4,7,23,31,15,28,21,19,10,12,6,14,27,9,5,26,8,25,24,};returnBitPositionLookup[(keepHighestBit(v)*0x06EB14F9U)>>27];}

In the above example an alternative de Bruijn sequence (0x06EB14F9U) is used, with corresponding reordering of array values. The choice of this particular de Bruijn sequence is arbitrary, but the hash table values must be ordered to match the chosen de Bruijn sequence. The keepHighestBit() function zeros all bits except the most-significant set bit, resulting in a value which is a power of 2, which is then processed as in the previous example.

Brute-force attacks on locks

One possible B (10, 4) sequence. The 2530 substrings are read from top to bottom then left to right, and their digits are concatenated. To get the string to brute-force a combination lock, the last three digits in brackets (000) are appended. The 10003-digit string is hence "0 0001 0002 0003 0004 0005 0006 0007 0008 0009 0011 ... 79 7988 7989 7998 7999 8 8889 8899 89 8999 9 000" (spaces added for readability). De Bruijn sequence 10 4.svg
One possible B(10, 4) sequence. The 2530 substrings are read from top to bottom then left to right, and their digits are concatenated. To get the string to brute-force a combination lock, the last three digits in brackets (000) are appended. The 10003-digit string is hence "0 0001 0002 0003 0004 0005 0006 0007 0008 0009 0011 79 7988 7989 7998 7999 8 8889 8899 89 8999 9 000" (spaces added for readability).

A de Bruijn sequence can be used to shorten a brute-force attack on a PIN-like code lock that does not have an "enter" key and accepts the last n digits entered. For example, a digital door lock with a 4-digit code (each digit having 10 possibilities, from 0 to 9) would have B(10, 4) solutions, with length 10000. Therefore, only at most 10000 + 3 = 10003 (as the solutions are cyclic) presses are needed to open the lock, whereas trying all codes separately would require 4 × 10000 = 40000 presses.

Simplified principle of the Anoto digital pen.
The camera identifies a 6x6 matrix of dots, each displaced from the blue grid (not printed) in one of 4 directions.
The combinations of relative displacements of a 6-bit de Bruijn sequence between the columns, and between the rows gives its absolute position on the digital paper. Anoto paper principle.svg
Simplified principle of the Anoto digital pen.
The camera identifies a 6×6 matrix of dots, each displaced from the blue grid (not printed) in one of 4 directions.
The combinations of relative displacements of a 6-bit de Bruijn sequence between the columns, and between the rows gives its absolute position on the digital paper.

f-fold de Bruijn sequences

An f-fold n-ary de Bruijn sequence is an extension of the notion n-ary de Bruijn sequence, such that the sequence of the length contains every possible subsequence of the length n exactly f times. For example, for the cyclic sequences 11100010 and 11101000 are two-fold binary de Bruijn sequences. The number of two-fold de Bruijn sequences, for is , the other known numbers [16] are , , and .

de Bruijn torus

STL model of de Bruijn torus (16,32;3,3)2 with 1s as panels and 0s as holes in the mesh - with consistent orientation, every 3x3 matrix appears exactly once De bruijn torus 3x3.stl
STL model of de Bruijn torus (16,32;3,3)2 with 1s as panels and 0s as holes in the mesh with consistent orientation, every 3×3 matrix appears exactly once

A de Bruijn torus is a toroidal array with the property that every k-ary m-by-n matrix occurs exactly once.

Such a pattern can be used for two-dimensional positional encoding in a fashion analogous to that described above for rotary encoding. Position can be determined by examining the m-by-n matrix directly adjacent to the sensor, and calculating its position on the de Bruijn torus.

de Bruijn decoding

Computing the position of a particular unique tuple or matrix in a de Bruijn sequence or torus is known as the de Bruijn decoding problem. Efficient decoding algorithms exist for special, recursively constructed sequences [17] and extend to the two-dimensional case. [18] de Bruijn decoding is of interest, e.g., in cases where large sequences or tori are used for positional encoding.

See also

Notes

  1. de Bruijn (1946).
  2. 1 2 3 de Bruijn (1975).
  3. Brown (1869); Stein (1963); Kak (2000); Knuth (2006); Hall (2008).
  4. Popper (2002).
  5. Klein (2013).
  6. According to Berstel & Perrin (2007), the sequence generated in this way was first described (with a different generation method) by Martin (1934), and the connection between it and Lyndon words was observed by Fredricksen & Maiorana (1978).
  7. 1 2 Higgins (2012).
  8. Goresky & Klapper (2012).
  9. Ralston (1982), pp. 136–139.
  10. "De Bruijn sequences". Sage. Retrieved 2023-03-07.
  11. Aguirre, Mattar & Magis-Weinberg (2011).
  12. "de Bruijn cycle generator".
  13. van Lint & Wilson (2001).
  14. Anderson (1997–2009); Busch (2009)
  15. "de Bruijn (DeBruijn) sequence (K=10, n=3)".
  16. Osipov (2016).
  17. Tuliani (2001).
  18. Hurlbert & Isaak (1993).

Related Research Articles

Blowfish is a symmetric-key block cipher, designed in 1993 by Bruce Schneier and included in many cipher suites and encryption products. Blowfish provides a good encryption rate in software, and no effective cryptanalysis of it has been found to date. However, the Advanced Encryption Standard (AES) now receives more attention, and Schneier recommends Twofish for modern applications.

<span class="mw-page-title-main">Hash function</span> Mapping arbitrary data to fixed-size values

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Permutation</span> Mathematical version of an order change

In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or process of changing the linear order of an ordered set.

The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit.

In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to be normal in base b if, for every positive integer n, all possible strings n digits long have density bn.

<span class="mw-page-title-main">Serpent (cipher)</span>

Serpent is a symmetric key block cipher that was a finalist in the Advanced Encryption Standard (AES) contest, in which it ranked second to Rijndael. Serpent was designed by Ross Anderson, Eli Biham, and Lars Knudsen.

In mathematics, the lexicographic or lexicographical order is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a totally ordered set.

In combinatorics, the factorial number system, also called factoradic, is a mixed radix numeral system adapted to numbering permutations. It is also called factorial base, although factorials do not function as base, but as place value of digits. By converting a number less than n! to factorial representation, one obtains a sequence of n digits that can be converted to a permutation of n elements in a straightforward way, either using them as Lehmer code or as inversion table representation; in the former case the resulting map from integers to permutations of n elements lists them in lexicographical order. General mixed radix systems were studied by Georg Cantor.

<span class="mw-page-title-main">Circular shift</span> Circular shifts: Mathematical concept and applications in software development

In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse operation. A circular shift is a special kind of cyclic permutation, which in turn is a special kind of permutation. Formally, a circular shift is a permutation σ of the n entries in the tuple such that either

In cryptography, a message authentication code based on universal hashing, or UMAC, is a type of message authentication code (MAC) calculated choosing a hash function from a class of hash functions according to some secret (random) process and applying it to the message. The resulting digest or fingerprint is then encrypted to hide the identity of the hash function used. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message. In contrast to traditional MACs, which are serializable, UMAC can be executed in parallel. Thus as machines continue to offer more parallel processing capabilities, the speed of implementing UMAC will increase.

Bijective numeration is any numeral system in which every non-negative integer can be represented in exactly one way using a finite string of digits. The name refers to the bijection that exists in this case between the set of non-negative integers and the set of finite strings using a finite set of symbols.

In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences. Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words. Lyndon words are a special case of Hall words; almost all properties of Lyndon words are shared by Hall words.

<span class="mw-page-title-main">Necklace (combinatorics)</span>

In combinatorics, a k-ary necklace of length n is an equivalence class of n-character strings over an alphabet of size k, taking all rotations as equivalent. It represents a structure with n circularly connected beads which have k available colors.

<span class="mw-page-title-main">Hypercube graph</span> Graphs formed by a hypercubes edges and vertices

In graph theory, the hypercube graphQn is the graph formed from the vertices and edges of an n-dimensional hypercube. For instance, the cube graph Q3 is the graph formed by the 8 vertices and 12 edges of a three-dimensional cube. Qn has 2n vertices, 2n – 1n edges, and is a regular graph with n edges touching each vertex.

SHA-3 is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. It was created by Austin Appleby in 2008 and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants, all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.

<span class="mw-page-title-main">Bit-reversal permutation</span> Permutation that reverses binary numbers

In applied mathematics, a bit-reversal permutation is a permutation of a sequence of items, where is a power of two. It is defined by indexing the elements of the sequence by the numbers from to , representing each of these numbers by its binary representation, and mapping each item to the item whose representation has the same bits in the reversed order.

<span class="mw-page-title-main">Moser–de Bruijn sequence</span> Number, sum of distinct powers of 4

In number theory, the Moser–de Bruijn sequence is an integer sequence named after Leo Moser and Nicolaas Govert de Bruijn, consisting of the sums of distinct powers of 4. Equivalently, they are the numbers whose binary representations are nonzero only in even positions.

References