Automatic sequence

Last updated October 10, 2023

In mathematics and theoretical computer science, an automatic sequence (also called a k-automatic sequence or a k-recognizable sequence when one wants to indicate that the base of the numerals used is k) is an infinite sequence of terms characterized by a finite automaton. The n-th term of an automatic sequence a(n) is a mapping of the final state reached in a finite automaton accepting the digits of the number n in some fixed base k.^[1]^[2]

Definition

Automatic sequences may be defined in a number of ways, all of which are equivalent. Four common definitions are as follows.

Automata-theoretic

Let k be a positive integer, and let D = (Q, Σ_k, δ, q₀, Δ, τ) be a deterministic finite automaton with output, where

Q is the finite set of states;
the input alphabet Σ_k consists of the set {0,1,...,k-1} of possible digits in base-k notation;
δ : Q × Σ_k → Q is the transition function;
q₀ ∈ Q is the initial state;
the output alphabet Δ is a finite set; and
τ : Q → Δ is the output function mapping from the set of internal states to the output alphabet.

Extend the transition function δ from acting on single digits to acting on strings of digits by defining the action of δ on a string s consisting of digits s₁s₂...s_t as:

δ(q,s) = δ(δ(q, s₁s₂...s_t-1), s_t).

Define a function a from the set of positive integers to the output alphabet Δ as follows:

a(n) = τ(δ(q₀,s(n))),

where s(n) is n written in base k. Then the sequence a = a(1)a(2)a(3)... is a k-automatic sequence.^[1]

An automaton reading the base k digits of s(n) starting with the most significant digit is said to be direct reading, while an automaton starting with the least significant digit is reverse reading.^[4] The above definition holds whether s(n) is direct or reverse reading.^[5]

Substitution

Let $\varphi$ be a k-uniform morphism of a free monoid $\Sigma ^{*}$ and let $\tau$ be a coding (that is, a $1$ -uniform morphism), as in the automata-theoretic case. If $w$ is a fixed point of $\varphi$ —that is, if $w=\varphi (w)$ —then $s=\tau (w)$ is a k-automatic sequence.^[6] Conversely, every k-automatic sequence is obtainable in this way.^[4] This result is due to Cobham, and it is referred to in the literature as Cobham's little theorem.^[2]^[7]

k-kernel

Let k ≥ 2. The k-kernel of the sequence s(n) is the set of subsequences

K_{k}(s)=\{s(k^{e}n+r):e\geq 0{\text{ and }}0\leq r\leq k^{e}-1\}.

In most cases, the k-kernel of a sequence is infinite. However, if the k-kernel is finite, then the sequence s(n) is k-automatic, and the converse is also true. This is due to Eilenberg.^[8]^[9]^[10]

It follows that a k-automatic sequence is necessarily a sequence on a finite alphabet.

Formal power series

Let u(n) be a sequence over an alphabet Σ and suppose that there is an injective function β from Σ to the finite field F_q, where q = pⁿ for some prime p. The associated formal power series is

\sum _{i\geq 0}\beta (u(i))X^{i}.

Then the sequence u is q-automatic if and only if this formal power series is algebraic over F_q(X). This result is due to Christol, and it is referred to in the literature as Christol's theorem.^[11]

History

Automatic sequences were introduced by Büchi in 1960,^[12] although his paper took a more logico-theoretic approach to the matter and did not use the terminology found in this article. The notion of automatic sequences was further studied by Cobham in 1972, who called these sequences "uniform tag sequences".^[7]

The term "automatic sequence" first appeared in a paper of Deshouillers.^[13]

Examples

The following sequences are automatic:

Thue–Morse sequence

The Thue–Morse sequence t(n) ( OEIS: A010060 ) is the fixed point of the morphism 0 → 01, 1 → 10. Since the n-th term of the Thue–Morse sequence counts the number of ones modulo 2 in the base-2 representation of n, it is generated by the two-state deterministic finite automaton with output pictured here, where being in state q₀ indicates there are an even number of ones in the representation of n and being in state q₁ indicates there are an odd number of ones. Hence, the Thue–Morse sequence is 2-automatic.

Period-doubling sequence

The n-th term of the period-doubling sequence d(n) ( OEIS: A096268 ) is determined by the parity of the exponent of the highest power of 2 dividing n. It is also the fixed point of the morphism 0 → 01, 1 → 00.^[14] Starting with the initial term w = 0 and iterating the 2-uniform morphism φ on w where φ(0) = 01 and φ(1) = 00, it is evident that the period-doubling sequence is the fixed-point of φ(w) and thus it is 2-automatic.

Rudin–Shapiro sequence

The n-th term of the Rudin–Shapiro sequence r(n) ( OEIS: A020985 ) is determined by the number of consecutive ones in the base-2 representation of n. The 2-kernel of the Rudin–Shapiro sequence^[15] is

{\begin{aligned}r(2n)&=r(n),\\r(4n+1)&=r(n),\\r(8n+7)&=r(2n+1),\\r(16n+3)&=r(8n+3),\\r(16n+11)&=r(4n+3).\end{aligned}}

Since the 2-kernel consists only of r(n), r(2n + 1), r(4n + 3), and r(8n + 3), it is finite and thus the Rudin–Shapiro sequence is 2-automatic.

Other sequences

Both the Baum–Sweet sequence ^[16] ( OEIS: A086747 ) and the regular paperfolding sequence ^[17]^[18]^[19] ( OEIS: A014577 ) are automatic. In addition, the general paperfolding sequence with a periodic sequence of folds is also automatic.^[20]

Properties

Automatic sequences exhibit a number of interesting properties. A non-exhaustive list of these properties is presented below.

Every automatic sequence is a morphic word.^[21]
For k ≥ 2 and r ≥ 1, a sequence is k-automatic if and only if it is k^r-automatic. This result is due to Eilenberg.^[22]
For h and k multiplicatively independent, a sequence is both h-automatic and k-automatic if and only if it is ultimately periodic.^[23] This result is due to Cobham also known as Cobham's theorem,^[24] with a multidimensional generalisation due to Semenov.^[25]^[26]
If u(n) is a k-automatic sequence over an alphabet Σ and f is a uniform morphism from Σ^∗ to another alphabet Δ^∗, then f(u) is a k-automatic sequence over Δ.^[27]
If u(n) is a k-automatic sequence, then the sequences u(kⁿ) and u(kⁿ − 1) are ultimately periodic.^[28] Conversely, if u(n) is an ultimately periodic sequence, then the sequence v defined by v(kⁿ) = u(n) and otherwise zero is k-automatic.^[29]

Proving and disproving automaticity

Given a candidate sequence $s=(s_{n})_{n\geq 0}$ , it is usually easier to disprove its automaticity than to prove it. By the k-kernel characterization of k-automatic sequences, it suffices to produce infinitely many distinct elements in the k-kernel $K_{k}(s)$ to show that $s$ is not k-automatic. Heuristically, one might try to prove automaticity by checking the agreement of terms in the k-kernel, but this can occasionally lead to wrong guesses. For example, let

t=011010011\dots

be the Thue–Morse word. Let $s$ be the word given by concatenating successive terms in the sequence of run-lengths of $t$ . Then $s$ begins

s=12112221\dots .

.

It is known that $s$ is the fixed point $h^{\omega }(1)$ of the morphism

h(1)=121,h(2)=12221.

The word $s$ is not 2-automatic, but certain elements of its 2-kernel agree for many terms. For example, $s_{16n+1}=s_{64n+1}{\text{ for }}0\leq n\leq 1864134$

but not for $n=1864135$ .^[30]

Given a sequence that is conjectured to be automatic, there are a few useful approaches to proving it actually is. One approach is to directly construct a deterministic automaton with output that gives the sequence. Let $(s_{n})_{n\geq 0}$ written in the alphabet $\Delta$ , and let $(n)_{k}$ denote the base- $k$ expansion of $n$ . Then the sequence $s=(s_{n})_{n\geq 0}$ is $k$ -automatic if and only each of the fibres

I_{k}(s,d):=\{(n)_{k}\mid s_{n}=d\}

is a regular language.^[31] Checking regularity of the fibres can often be done using the pumping lemma for regular languages.

If $s_{k}(n)$ denotes the sum of the digits in the base- $k$ expansion of $n$ and $p(X)$ is a polynomial with non-negative integer coefficients, and if $k\geq 2$ , $m\geq 1$ are integers, then the sequence

(s_{k}(p(n)){\pmod {m}})_{n\geq 0}

is $k$ -automatic if and only if $\deg p\leq 1$ or $m\mid k-1$ .^[32]

1-automatic sequences

k-automatic sequences are normally only defined for k ≥ 2.^[1] The concept can be extended to k = 1 by defining a 1-automatic sequence to be a sequence whose n-th term depends on the unary notation for n; that is, (1)ⁿ. Since a finite state automaton must eventually return to a previously visited state, all 1-automatic sequences are ultimately periodic.

Generalizations

Automatic sequences are robust against variations to either the definition or the input sequence. For instance, as noted in the automata-theoretic definition, a given sequence remains automatic under both direct and reverse reading of the input sequence. A sequence also remains automatic when an alternate set of digits is used or when the base is negated; that is, when the input sequence is represented in base −k instead of in base k.^[33] However, in contrast to using an alternate set of digits, a change of base may affect the automaticity of a sequence.

The domain of an automatic sequence can be extended from the natural numbers to the integers via two-sided automatic sequences. This stems from the fact that, given k ≥ 2, every integer can be represented uniquely in the form $\sum _{0\leq i\leq r}a_{i}(-k)^{i},$ where $a_{i}\in \{0,\dots ,k-1\}$ . Then a two-sided infinite sequence a(n)_{n $\in \mathbb {Z}$} is (−k)-automatic if and only if its subsequences a(n)_{n ≥ 0} and a(−n)_{n ≥ 0} are k-automatic.^[34]

The alphabet of a k-automatic sequence can be extended from finite size to infinite size via k-regular sequences.^[35] The k-regular sequences can be characterized as those sequences whose k-kernel is finitely-generated. Every bounded k-regular sequence is automatic.^[36]

Logical approach

For many 2-automatic sequences $s=(s_{n})_{n\geq 0}$ , the map $n\mapsto s_{n}$ has the property that the first-order theory ${\text{FO}}(\mathbb {N} ,+,0,1,n\mapsto s_{n})$ is decidable. Since many non-trivial properties of automatic sequences can be written in first-order logic, it is possible to prove these properties mechanically by executing the decision procedure.^[37]

For example, the following properties of the Thue–Morse word can all be verified mechanically in this way:

The Thue–Morse word is overlap-free, i.e., it does not contain a word of the form $cxcxc$ where $c$ is a single letter and $w$ is a possibly empty word.
A non-empty word $x$ is bordered if there is a non-empty word $w$ and a possibly empty word $y$ with $x=wyw$ . The Thue–Morse word contains a bordered factor for each length greater than 1.^[38]
There is an unbordered factor of length $n$ in the Thue–Morse word if and only if $(n)_{2}\notin 1(01^{*}0)^{*}10^{*}1$ where $(n)_{2}$ denotes the binary representation of $n$ .^[39]

The software Walnut,^[40]^[41] developed by Hamoon Mousavi, implements a decision procedure for deciding many properties of certain automatic words, such as the Thue–Morse word. This implementation is a consequence of the above work on the logical approach to automatic sequences.

Notes

1 2 3 Allouche & Shallit (2003) p. 152
1 2 Berstel et al (2009) p. 78
↑ Allouche & Shallit (2003) p. 168
1 2 3 Pytheas Fogg (2002) p. 13
↑ Pytheas Fogg (2002) p. 15
↑ Allouche & Shallit (2003) p. 175
1 2 Cobham (1972)
↑ Allouche & Shallit (2003) p. 185
↑ Lothaire (2005) p. 527
↑ Berstel & Reutenauer (2011) p. 91
↑ Christol, G. (1979). "Ensembles presque périodiques k-reconnaissables". Theoret. Comput. Sci. 9: 141–145. doi: 10.1016/0304-3975(79)90011-2 .
↑ Büchi, J. R. (1990). "Weak Second-Order Arithmetic and Finite Automata". The Collected Works of J. Richard Büchi. pp. 66–92. doi:10.1007/978-1-4613-8928-6_22. ISBN 978-1-4613-8930-9.{{cite book}}: |journal= ignored (help)
↑ Deshouillers, J.-M. (1979–1980). "La répartition modulo 1 des puissances de rationnels dans l'anneau des séries formelles sur un corps fini". Séminaire de Théorie des Nombres de Bordeaux: 5.01–5.22.
↑ Allouche & Shallit (2003) p. 176
↑ Allouche & Shallit (2003) p. 186
↑ Allouche & Shallit (2003) p. 156
↑ Berstel & Reutenauer (2011) p. 92
↑ Allouche & Shallit (2003) p. 155
↑ Lothaire (2005) p. 526
↑ Allouche & Shallit (2003) p. 183
↑ Lothaire (2005) p. 524
↑ Eilenberg, Samuel (1974). Automata, languages, and machines. Vol. A. Orlando: Academic Press. ISBN 978-0-122-34001-7.
↑ Allouche & Shallit (2003) pp. 345–350
↑ Cobham, A. (1969). "On the base-dependence of sets of numbers recognizable by finite automata". Math. Systems Theory. 3 (2): 186–192. doi:10.1007/BF01746527. S2CID 19792434.
↑ Semenov, A. L. (1977). "Presburgerness of predicates regular in two number systems". Sibirsk. Mat. Zh. (in Russian). 18: 403–418.
↑ Point, F.; Bruyère, V. (1997). "On the Cobham-Semenov theorem". Theory of Computing Systems. 30 (2): 197–220. doi:10.1007/BF02679449. S2CID 31270341.
↑ Lothaire (2005) p. 532
↑ Lothaire (2005) p. 529
↑ Berstel & Reutenauer (2011) p. 103
↑ Allouche, G.; Allouche, J.-P.; Shallit, J. (2006). "Kolam indiens, dessins sur le sable aux îles Vanuatu, courbe de Sierpinski et morphismes de monoïde". Annales de l'Institut Fourier. 56 (7): 2126. doi:10.5802/aif.2235.
↑ Allouche and Shallit (2003) p. 160
↑ Allouche and Shallit (2003) p. 197
↑ Allouche & Shallit (2003) p. 157
↑ Allouche & Shallit (2003) p. 162
↑ Allouche, J.-P.; Shallit, J. (1992). "The ring of k-regular sequences". Theoret. Comput. Sci. 98 (2): 163–197. doi: 10.1016/0304-3975(92)90001-v .
↑ Shallit, Jeffrey. "The Logical Approach to Automatic Sequences, Part 1: Automatic Sequences and k-Regular Sequences" (PDF). Retrieved April 1, 2020.
↑ Shallit, J. "The Logical Approach to Automatic Sequences: Part 1" (PDF). Retrieved April 1, 2020.
↑ Shallit, J. "The Logical Approach to Automatic Sequences: Part 3" (PDF). Retrieved April 1, 2020.
↑ Shallit, J. "The Logical Approach to Automatic Sequences: Part 3" (PDF). Retrieved April 1, 2020.
↑ Shallit, J. "Walnut Software" . Retrieved April 1, 2020.
↑ Mousavi, H. (2016). "Automatic Theorem Proving in Walnut". arXiv: 1603.06017 [cs.FL].

Related Research Articles

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In mathematics, the Thue–Morse sequence or Prouhet–Thue–Morse sequence or parity sequence is the binary sequence obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus far. The first few steps of this procedure yield the strings 0 then 01, 0110, 01101001, 0110100110010110, and so on, which are prefixes of the Thue–Morse sequence. The full sequence begins:

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A^∗. The free semigroup on A is the subsemigroup of A^∗ containing all elements except the empty string. It is usually denoted A⁺.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

In mathematics, the Prouhet–Thue–Morse constant, named for Eugène Prouhet, Axel Thue, and Marston Morse, is the number—denoted by $τ$ —whose binary expansion 0.01101001100101101001011001101001... is given by the Prouhet–Thue–Morse sequence. That is,

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines relations between sets of strings.

In mathematics, a Sturmian word, named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of English billiards on a square table. The struck ball will successively hit the vertical and horizontal edges labelled 0 and 1 generating a sequence of letters. This sequence is a Sturmian word.

In combinatorics, a squarefree word is a word that does not contain any squares. A square is a word of the form $XX$ , where $X$ is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern $XX$ .

In mathematics the Baum–Sweet sequence is an infinite automatic sequence of 0s and 1s defined by the rule:

In mathematics, the Rudin–Shapiro sequence, also known as the Golay–Rudin–Shapiro sequence, is an infinite 2-automatic sequence named after Marcel Golay, Walter Rudin, and Harold S. Shapiro, who independently investigated its properties.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In mathematics, a locally catenative sequence is a sequence of words in which each word can be constructed as the concatenation of previous words in the sequence.

In computer science, the complexity function of a word or string is the function that counts the number of distinct factors of that string. More generally, the complexity function of a formal language counts the number of distinct words of given length.

In mathematics and computer science, a morphic word or substitutive word is an infinite sequence of symbols which is constructed from a particular class of endomorphism of a free monoid.

In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet.

In mathematics, a recurrent word or sequence is an infinite word over a finite alphabet in which every factor occurs infinitely many times. An infinite word is recurrent if and only if it is a sesquipower.

In mathematics and computer science, the critical exponent of a finite or infinite sequence of symbols over a finite alphabet describes the largest number of times a contiguous subsequence can be repeated. For example, the critical exponent of "Mississippi" is 7/3, as it contains the string "ississi", which is of length 7 and period 3.

In mathematics and theoretical computer science, a k-regular sequence is a sequence satisfying linear recurrence equations that reflect the base-k representations of the integers. The class of k-regular sequences generalizes the class of k-automatic sequences to alphabets of infinite size.

In mathematics and theoretical computer science, a k-synchronized sequence is an infinite sequence of terms s(n) characterized by a finite automaton taking as input two strings m and n, each expressed in some fixed base k, and accepting if m = s(n). The class of k-synchronized sequences lies between the classes of k-automatic sequences and k-regular sequences.

Cobham's theorem is a theorem in combinatorics on words that has important connections with number theory, notably transcendental numbers, and automata theory. Informally, the theorem gives the condition for the members of a set S of natural numbers written in bases b₁ and base b₂ to be recognised by finite automata. Specifically, consider bases b₁ and b₂ such that they are not powers of the same integer. Cobham's theorem states that S written in bases b₁ and b₂ is recognised by finite automata if and only if S is a finite union of arithmetic progressions. The theorem was proved by Alan Cobham in 1969 and has since given rise to many extensions and generalisations.

References

Allouche, Jean-Paul; Shallit, Jeffrey (2003). Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press. ISBN 978-0-521-82332-6. Zbl 1086.11015.
Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words. CRM Monograph Series. Vol. 27. Providence, RI: American Mathematical Society. ISBN 978-0-8218-4480-9. Zbl 1161.68043.
Berstel, Jean; Reutenauer, Christophe (2011). Noncommutative rational series with applications. Encyclopedia of Mathematics and Its Applications. Vol. 137. Cambridge: Cambridge University Press. ISBN 978-0-521-19022-0. Zbl 1250.68007.
Cobham, Alan (1972). "Uniform tag sequences". Mathematical Systems Theory . 6 (1–2): 164–192. doi:10.1007/BF01706087. S2CID 28356747.
Lothaire, M. (2005). Applied combinatorics on words . Encyclopedia of Mathematics and Its Applications. Vol. 105. A collective work by Jean Berstel, Dominique Perrin, Maxime Crochemore, Eric Laporte, Mehryar Mohri, Nadia Pisanti, Marie-France Sagot, Gesine Reinert, Sophie Schbath, Michael Waterman, Philippe Jacquet, Wojciech Szpankowski, Dominique Poulalhon, Gilles Schaeffer, Roman Kolpakov, Gregory Koucherov, Jean-Paul Allouche and Valérie Berthé. Cambridge: Cambridge University Press. ISBN 978-0-521-84802-2. Zbl 1133.68067.
Pytheas Fogg, N. (2002). Substitutions in dynamics, arithmetics and combinatorics. Lecture Notes in Mathematics. Vol. 1794. Editors Berthé, Valérie; Ferenczi, Sébastien; Mauduit, Christian; Siegel, A. Berlin: Springer-Verlag. ISBN 978-3-540-44141-0. Zbl 1014.11015.