Disjunctive sequence

Last updated

A disjunctive sequence is an infinite sequence of characters drawn from a finite alphabet, in which every finite string appears as a substring. For instance, the binary Champernowne sequence

Contents

formed by concatenating all binary strings in shortlex order, clearly contains all the binary strings and so is disjunctive. (The spaces above are not significant and are present solely to make clear the boundaries between strings). The complexity function of a disjunctive sequence S over an alphabet of size k is pS(n) = kn. [1]

Any normal sequence (a sequence in which each string of equal length appears with equal frequency) is disjunctive, but the converse is not true. For example, letting 0n denote the string of length n consisting of all 0s, consider the sequence

obtained by splicing exponentially long strings of 0s into the shortlex ordering of all binary strings. Most of this sequence consists of long runs of 0s, and so it is not normal, but it is still disjunctive.

A disjunctive sequence is recurrent but never uniformly recurrent/almost periodic.

Examples

The following result [2] [3] can be used to generate a variety of disjunctive sequences:

If a1, a2, a3, ..., is a strictly increasing infinite sequence of positive integers such that lim n → ∞ (an+1 / an) = 1,
then for any positive integer m and any integer base b ≥ 2, there is an an whose expression in base b starts with the expression of m in base b.
(Consequently, the infinite sequence obtained by concatenating the base-b expressions for a1, a2, a3, ..., is disjunctive over the alphabet {0, 1, ..., b-1}.)

Two simple cases illustrate this result:

E.g., using base-ten expressions, the sequences
123456789101112... (k = 1, positive natural numbers),
1491625364964... (k = 2, squares),
182764125216343... (k = 3, cubes),
etc.,
are disjunctive on {0,1,2,3,4,5,6,7,8,9}.
E.g., the sequences
23571113171923... (using base ten),
10111011111011110110001 ... (using base two),
etc.,

are disjunctive on the respective digit sets.

Another result [4] that provides a variety of disjunctive sequences is as follows:

If an = floor (f(n)), where f is any non-constant polynomial with real coefficients such that f(x) > 0 for all x > 0,
then the concatenation a1a2a3... (with the an expressed in base b) is a normal sequence in base b, and is therefore disjunctive on {0, 1, ..., b-1}.

E.g., using base-ten expressions, the sequences

818429218031851879211521610... (with f(x) = 2x3 - 5x2 + 11x )
591215182124273034... (with f(x) = π x + e)

are disjunctive on {0,1,2,3,4,5,6,7,8,9}.

Rich numbers

A rich number or disjunctive number is a real number whose expansion with respect to some base b is a disjunctive sequence over the alphabet {0,...,b−1}. Every normal number in base b is disjunctive but not conversely. The real number x is rich in base b if and only if the set { x bn mod 1} is dense in the unit interval. [5]

A number that is disjunctive to every base is called absolutely disjunctive or is said to be a lexicon. Every string in every alphabet occurs within a lexicon. A set is called "comeager" or "residual" if it contains the intersection of a countable family of open dense sets. The set of absolutely disjunctive reals is residual. [6] It is conjectured that every real irrational algebraic number is absolutely disjunctive. [7]

Notes

  1. Bugeaud (2012) p.91
  2. Calude, C.; Priese, L.; Staiger, L. (1997), Disjunctive sequences: An overview, University of Auckland, New Zealand, pp. 1–35, CiteSeerX   10.1.1.34.1370
  3. Istrate, G.; Păun, Gh. (1994), "Some combinatorial properties of self-reading sequences", Discrete Applied Mathematics, 55: 83–86, doi:10.1016/0166-218X(94)90037-X, Zbl   0941.68656
  4. Nakai, Yoshinobu; Shiokawa, Iekata (1992), "Discrepancy estimates for a class of normal numbers" (PDF), Acta Arithmetica, LXII.3 (3): 271–284, doi: 10.4064/aa-62-3-271-284
  5. Bugeaud (2012) p.92
  6. Calude & Zamfirescu (1999)
  7. Adamczewski & Bugeaud (2010) p.414

Related Research Articles

In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will halt. These numbers are formed from a construction due to Gregory Chaitin.

<span class="mw-page-title-main">Decimal</span> Number in base-10 numeral system

The decimal numeral system is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numeral system. The way of denoting numbers in the decimal system is often referred to as decimal notation.

<span class="mw-page-title-main">Monoid</span> Algebraic structure with an associative operation and an identity element

In abstract algebra, a branch of mathematics, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being 0.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

<span class="mw-page-title-main">Sequence</span> Finite or infinite ordered list of elements

In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members. The number of elements is called the length of the sequence. Unlike a set, the same elements can appear multiple times at different positions in a sequence, and unlike a set, the order does matter. Formally, a sequence can be defined as a function from natural numbers to the elements at each position. The notion of a sequence can be generalized to an indexed family, defined as a function from an arbitrary index set.

<span class="mw-page-title-main">Hamming distance</span> Number of bits that differ between two strings

In information theory, the Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming.

2 (two) is a number, numeral and digit. It is the natural number following 1 and preceding 3. It is the smallest and only even prime number.

In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to be normal in base b if, for every positive integer n, all possible strings n digits long have density bn.

In mathematics, the Thue–Morse sequence or Prouhet–Thue–Morse sequence or parity sequence is the binary sequence obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus far. The first few steps of this procedure yield the strings 0 then 01, 0110, 01101001, 0110100110010110, and so on, which are prefixes of the Thue–Morse sequence. The full sequence begins:

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In mathematics, the Champernowne constantC10 is a transcendental real constant whose decimal expansion has important properties. It is named after economist and mathematician D. G. Champernowne, who published it as an undergraduate in 1933. The number is defined by concatenating the base 10 representations of the positive integers:

In mathematics, the Littlewood conjecture is an open problem in Diophantine approximation, proposed by John Edensor Littlewood around 1930. It states that for any two real numbers α and β,

<span class="mw-page-title-main">Minkowski's question-mark function</span> Function with unusual fractal properties

In mathematics, Minkowski's question-mark function, denoted ?(x), is a function with unusual fractal properties, defined by Hermann Minkowski in 1904. It maps quadratic irrational numbers to rational numbers on the unit interval, via an expression relating the continued fraction expansions of the quadratics to the binary expansions of the rationals, given by Arnaud Denjoy in 1938. It also maps rational numbers to dyadic rationals, as can be seen by a recursive definition closely related to the Stern–Brocot tree.

<span class="mw-page-title-main">Sturmian word</span> Kind of infinitely long sequence of characters

In mathematics, a Sturmian word, named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of English billiards on a square table. The struck ball will successively hit the vertical and horizontal edges labelled 0 and 1 generating a sequence of letters. This sequence is a Sturmian word.

<span class="mw-page-title-main">Fibonacci word</span> Binary sequence from Fibonacci recurrence

A Fibonacci word is a specific sequence of binary digits. The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

Intuitively, an algorithmically random sequence is a sequence of binary digits that appears random to any algorithm running on a universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet. Random sequences are key objects of study in algorithmic information theory.

In number theory, the Erdős–Kac theorem, named after Paul Erdős and Mark Kac, and also known as the fundamental theorem of probabilistic number theory, states that if ω(n) is the number of distinct prime factors of n, then, loosely speaking, the probability distribution of

In mathematics and theoretical computer science, an automatic sequence (also called a k-automatic sequence or a k-recognizable sequence when one wants to indicate that the base of the numerals used is k) is an infinite sequence of terms characterized by a finite automaton. The n-th term of an automatic sequence a(n) is a mapping of the final state reached in a finite automaton accepting the digits of the number n in some fixed base k.

In computer science, the complexity function of a word or string is the function that counts the number of distinct factors of that string. More generally, the complexity function of a formal language counts the number of distinct words of given length.

References