Morphic word

Last updated December 21, 2024

In mathematics and computer science, a morphic word or substitutive word is an infinite sequence of symbols which is constructed from a particular class of endomorphism of a free monoid.

Definition

Let f be an endomorphism of the free monoid A^∗ on an alphabet A with the property that there is a letter a such that f(a) = as for a non-empty string s: we say that f is prolongable at a. The word

asf(s)f(f(s))\cdots f^{(n)}(s)\cdots \

is a pure morphic or pure substitutive word. Note that it is the limit of the sequence a, f(a), f(f(a)), f(f(f(a))), ... It is clearly a fixed point of the endomorphism f: the unique such sequence beginning with the letter a.^[2]^[3] In general, a morphic word is the image of a pure morphic word under a coding, that is, a morphism that maps letter to letter.^[1]

If a morphic word is constructed as the fixed point of a prolongable k-uniform morphism on A^∗ then the word is k-automatic. The n-th term in such a sequence can be produced by a finite-state automaton reading the digits of n in base k.^[1]

Examples

The Thue–Morse sequence is generated over {0,1} by the 2-uniform endomorphism 0 → 01, 1 → 10.^[4]^[5]
The Fibonacci word is generated over {a,b} by the endomorphism a → ab, b → a.^[1]^[4]
The tribonacci word is generated over {a,b,c} by the endomorphism a → ab, b → ac, c → a.^[5]
The Rudin–Shapiro sequence is obtained from the fixed point of the 2-uniform morphism a → ab, b → ac, c → db, d → dc followed by the coding a,b → 0, c,d → 1.^[5]
The regular paperfolding sequence is obtained from the fixed point of the 2-uniform morphism a → ab, b → cb, c → ad, d → cd followed by the coding a,b → 0, c,d → 1.^[6]

D0L system

A D0L system (deterministic context-free Lindenmayer system) is given by a word w of the free monoid A^∗ on an alphabet A together with a morphism σ prolongable at w. The system generates the infinite D0L word ω = lim_n→∞ σⁿ(w). Purely morphic words are D0L words but not conversely. However, if ω = uν is an infinite D0L word with an initial segment u of length |u| ≥ |w|, then zν is a purely morphic word, where z is a letter not in A.^[7]

Related Research Articles

In mathematics, the Thue–Morse or Prouhet–Thue–Morse sequence is the binary sequence that can be obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus far. It is sometimes called the fair share sequence because of its applications to fair division or parity sequence. The first few steps of this procedure yield the strings 0, 01, 0110, 01101001, 0110100110010110, and so on, which are the prefixes of the Thue–Morse sequence. The full sequence begins:

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A^∗. The free semigroup on A is the subsemigroup of A^∗ containing all elements except the empty string. It is usually denoted A⁺.

In mathematics, a Sturmian word, named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of English billiards on a square table. The struck ball will successively hit the vertical and horizontal edges labelled 0 and 1 generating a sequence of letters. This sequence is a Sturmian word.

A Fibonacci word is a specific sequence of binary digits. The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

In combinatorics, a square-free word is a word that does not contain any squares. A square is a word of the form $XX$ , where $X$ is not empty. Thus, a square-free word can also be defined as a word that avoids the pattern $XX$ .

In mathematics and theoretical computer science, an automatic sequence (also called a k-automatic sequence or a k-recognizable sequence when one wants to indicate that the base of the numerals used is k) is an infinite sequence of terms characterized by a finite automaton. The n-th term of an automatic sequence a(n) is a mapping of the final state reached in a finite automaton accepting the digits of the number n in some fixed base k.

In computer science, a trace is an equivalence class of strings, wherein certain letters in the string are allowed to commute, but others are not. Traces generalize the concept of strings by relaxing the requirement for all the letters to have a definite order, instead allowing for indefinite orderings in which certain reshufflings could take place. In an opposite way, traces generalize the concept of sets with multiplicities by allowing for specifying some incomplete ordering of the letters rather than requiring complete equivalence under all reorderings. The trace monoid or free partially commutative monoid is a monoid of traces.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In mathematics, the plactic monoid is the monoid of all words in the alphabet of positive integers modulo Knuth equivalence. Its elements can be identified with semistandard Young tableaux. It was discovered by Donald Knuth, using an operation given by Craige Schensted in his study of the longest increasing subsequence of a permutation.

In computer science, the complexity function of a word or string is the function that counts the number of distinct factors of that string. More generally, the complexity function of a formal language counts the number of distinct words of given length.

$<span class="mw-page-title-main">Rauzy fractal</span>$

In mathematics, the Rauzy fractal is a fractal set associated with the Tribonacci substitution

In mathematics, a shuffle algebra is a Hopf algebra with a basis corresponding to words on some set, whose product is given by the shuffle productX ⧢ Y of two words X, Y: the sum of all ways of interlacing them. The interlacing is given by the riffle shuffle permutation.

In mathematics, a factorisation of a free monoid is a sequence of subsets of words with the property that every word in the free monoid can be written as a concatenation of elements drawn from the subsets. The Chen–Fox–Lyndon theorem states that the Lyndon words furnish a factorisation. The Schützenberger theorem relates the definition in terms of a multiplicative property to an additive property.

In mathematics, a compact semigroup is a semigroup in which the sets of solutions to equations can be described by finite sets of equations. The term "compact" here does not refer to any topology on the semigroup.

In mathematics, a sesquipower or Zimin word is a string over an alphabet with identical prefix and suffix. Sesquipowers are unavoidable patterns, in the sense that all sufficiently long strings contain one.

In mathematics, a recurrent word or sequence is an infinite word over a finite alphabet in which every factor occurs infinitely many times. An infinite word is recurrent if and only if it is a sesquipower.

In mathematics and computer science, the critical exponent of a finite or infinite sequence of symbols over a finite alphabet describes the largest number of times a contiguous subsequence can be repeated. For example, the critical exponent of "Mississippi" is 7/3, as it contains the string "ississi", which is of length 7 and period 3.

<span class="mw-page-title-main">Gesine Reinert</span> Mathematics professor

Gesine Reinert is a German statistician who is University Professor in Statistics at the University of Oxford. She is a Fellow of Keble College, Oxford, a Fellow of the Alan Turing Institute, and a Fellow of the Institute of Mathematical Statistics. Her research concerns the probability theory and statistics of biological sequences and biological networks.

Valérie Berthé is a French mathematician who works as a director of research for the Centre national de la recherche scientifique (CNRS) at the Institut de Recherche en Informatique Fondamentale (IRIF), a joint project between CNRS and Paris Diderot University. Her research involves symbolic dynamics, combinatorics on words, discrete geometry, numeral systems, tessellations, and fractals.

Cobham's theorem is a theorem in combinatorics on words that has important connections with number theory, notably transcendental numbers, and automata theory. Informally, the theorem gives the condition for the members of a set S of natural numbers written in bases b₁ and base b₂ to be recognised by finite automata. Specifically, consider bases b₁ and b₂ such that they are not powers of the same integer. Cobham's theorem states that S written in bases b₁ and b₂ is recognised by finite automata if and only if S differs by a finite set from a finite union of arithmetic progressions. The theorem was proved by Alan Cobham in 1969 and has since given rise to many extensions and generalisations.

References

1 2 3 4 Lothaire (2005) p.524
↑ Lothaire (2011) p. 10
↑ Honkala (2010) p.505
1 2 Lothaire (2011) p. 11
1 2 3 Lothaire (2005) p.525
↑ Lothaire (2005) p.526
↑ Honkala (2010) p.506

Allouche, Jean-Paul; Shallit, Jeffrey (2003). Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press. ISBN 978-0-521-82332-6. Zbl 1086.11015.
Honkala, Juha (2010). "The equality problem for purely substitutive words". In Berthé, Valérie; Rigo, Michel (eds.). Combinatorics, automata, and number theory. Encyclopedia of Mathematics and its Applications. Vol. 135. Cambridge: Cambridge University Press. pp. 505–529. ISBN 978-0-521-51597-9. Zbl 1216.68209.
Lothaire, M. (2005). Applied combinatorics on words . Encyclopedia of Mathematics and Its Applications. Vol. 105. A collective work by Jean Berstel, Dominique Perrin, Maxime Crochemore, Eric Laporte, Mehryar Mohri, Nadia Pisanti, Marie-France Sagot, Gesine Reinert, Sophie Schbath, Michael Waterman, Philippe Jacquet, Wojciech Szpankowski, Dominique Poulalhon, Gilles Schaeffer, Roman Kolpakov, Gregory Koucherov, Jean-Paul Allouche and Valérie Berthé. Cambridge: Cambridge University Press. ISBN 0-521-84802-4. Zbl 1133.68067.
Lothaire, M. (2011). Algebraic combinatorics on words. Encyclopedia of Mathematics and Its Applications. Vol. 90. With preface by Jean Berstel and Dominique Perrin (Reprint of the 2002 hardback ed.). Cambridge University Press. ISBN 978-0-521-18071-9. Zbl 1221.68183.

Morphic word

Contents

Definition

Examples

D0L system

See also

Related Research Articles

References

Further reading