Sesquipower

Last updated

In mathematics, a sesquipower or Zimin word is a string over an alphabet with identical prefix and suffix. Sesquipowers are unavoidable patterns, in the sense that all sufficiently long strings contain[ clarify ] one.

String (computer science) sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

In mathematics and theoretical computer science, an unavoidable pattern is a pattern of symbols that must occur in any sufficiently long string over an alphabet. An avoidable pattern is one for which there are infinitely many words no part of which match the pattern.

Contents

Formal definition

Formally, let A be an alphabet and A be the free monoid of finite strings over A. Every non-empty word w in A+ is a sesquipower of order 1. If u is a sesquipower of order n then any word w = uvu is a sesquipower of order n + 1. [1] The degree of a non-empty word w is the largest integer d such that w is a sesquipower of order d. [2]

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

If in addition v is required to be of length 1 and not to occur in u, the definition of an ABACABA pattern is obtained. The latter are thus special cases of sesquipowers.

The ABACABA pattern is a recursive fractal pattern that shows up in many places in the real world. Patterns often show a DABACABA type subset.

Bi-ideal sequence

A bi-ideal sequence is a sequence of words fi where f1 is in A+ and

for some gi in A and i  1. The degree of a word w is thus the length of the longest bi-ideal sequence ending in w. [2]

Unavoidable patterns

For a finite alphabet A on k letters, there is an integer M depending on k and n, such that any word of length M has a factor which is a sesquipower of order at least n. We express this by saying that the sesquipowers are unavoidable patterns. [3] [4]

Sesquipowers in infinite sequences

Given an infinite bi-ideal sequence, we note that each fi is a prefix of fi+1 and so the fi converge to an infinite sequence

We define an infinite word to be a sesquipower if it is the limit of an infinite bi-ideal sequence. [5] An infinite word is a sesquipower if and only if it is a recurrent word, [5] [6] that is, every factor occurs infinitely often. [7]

In mathematics, a recurrent word or sequence is an infinite word over a finite alphabet in which every factor occurs infinitely often. An infinite word is recurrent if and only if it is a sesquipower.

Fix a finite alphabet A and assume a total order on the letters. For given integers p and n, every sufficiently long word in A has either a factor which is a p-power or a factor which is an n-sesquipower; in the latter case the factor has an n-factorisation into Lyndon words. [6]

Related Research Articles

In theoretical computer science and formal language theory, a regular language is a formal language that can be expressed using a regular expression, in the strict sense of the latter notion used in theoretical computer science.

In mathematics, the Thue–Morse sequence, or Prouhet–Thue–Morse sequence, is the binary sequence obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus far. The first few steps of this procedure yield the strings 0 then 01, 0110, 01101001, 0110100110010110, and so on, which are prefixes of the Thue–Morse sequence. The full sequence begins:

In mathematics, the lexicographic or lexicographical order is a generalization of the way words are alphabetically ordered based on the alphabetical order of their component letters. This generalization consists primarily in defining a total order over the sequences of elements of a finite totally ordered set, often called an alphabet.

Sturmian word

In mathematics, a Sturmian word, named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of English billiards on a square table. The struck ball will successively hit the vertical and horizontal edges labelled 0 and 1 generating a sequence of letters. This sequence is a Sturmian word.

Fibonacci word infinite binary sequence generated by the Fibonacci recurrence with concatenation in place of addition

A Fibonacci word is a specific sequence of binary digits. The Fibonacci word is formed by repeated concatenation in the same way that the Fibonacci numbers are formed by repeated addition.

In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences. Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words.

In combinatorics, a squarefree word is a word that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern .

A disjunctive sequence is an infinite sequence in which every finite string appears as a substring. For instance, the binary Champernowne sequence

In mathematics and theoretical computer science, an automatic sequence is an infinite sequence of terms characterized by a finite automaton. The n-th term of an automatic sequence a(n) is a mapping of the final state reached in a finite automaton accepting the digits of the number n in some fixed base k.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In mathematics, the plactic monoid is the monoid of all words in the alphabet of positive integers modulo Knuth equivalence. Its elements can be identified with semistandard Young tableaux. It was discovered by Donald Knuth (1970), using an operation given by Craige Schensted (1961) in his study of the longest increasing subsequence of a permutation.

In computer science, the complexity function of a string, a finite or infinite sequence of letters from some alphabet, is the function that counts the number of distinct factors from that string. More generally, the complexity function of a language, a set of finite words over an alphabet, counts the number of distinct words of given length.

In mathematics, a shuffle algebra is a Hopf algebra with a basis corresponding to words on some set, whose product is given by the shuffle productXY of two words X, Y: the sum of all ways of interlacing them. The interlacing is given by the riffle shuffle permutation.

In mathematics, a factorisation of a free monoid is a sequence of subsets of words with the property that every word in the free monoid can be written as a concatenation of elements drawn from the subsets. The Chen–Fox–Lyndon theorem states that the Lyndon words furnish a factorisation. The Schützenberger theorem relates the definition in terms of a multiplicative property to an additive property.

In mathematics and computer science, a morphic word or substitutive word is an infinite sequence of symbols which is constructed from a particular class of endomorphism of a free monoid.

In mathematics, a compact semigroup is a semigroup in which the sets of solutions to equations can be described by finite sets of equations. The term "compact" here does not refer to any topology on the semigroup.

In mathematics and computer science, the critical exponent of a finite or infinite sequence of symbols over a finite alphabet describes the largest number of times a contiguous subsequence can be repeated. For example, the critical exponent of "Mississippi" is 7/3, as it contains the string "ississi", which is of length 7 and period 3.

References

  1. Lothaire (2011) p. 135
  2. 1 2 Lothaire (2011) p. 136
  3. Lothaire (2011) p. 137
  4. Berstel et al (2009) p.132
  5. 1 2 Lothiare (2011) p. 141
  6. 1 2 Berstel et al (2009) p.133
  7. Lothaire (2011) p. 30