Critical exponent of a word

Last updated

In mathematics and computer science, the critical exponent of a finite or infinite sequence of symbols over a finite alphabet describes the largest number of times a contiguous subsequence can be repeated. For example, the critical exponent of "Mississippi" is 7/3, as it contains the string "ississi", which is of length 7 and period 3.

Contents

If w is an infinite word over the alphabet A and x is a finite word over A, then x is said to occur in w with exponent α, for positive real α, if there is a factor y of w with y = xax0 where x0 is a prefix of x, a is the integer part of α, and the length |y| = α |x|: we say that y is an α-power. The word w is α-power-free if it contains no factors which are β-powers for any β ≥ α. [1]

The critical exponent for w is the supremum of the α for which w has α-powers, [2] or equivalently the infimum of the α for which w is α-power-free. [3]

Definition

If is a word (possibly infinite), then the critical exponent of is defined to be

where . [4]

Examples

Properties

Repetition threshold

The repetition threshold of an alphabet A of n letters is the minimum critical exponent of infinite words over A: clearly this value RT(n) depends only on n. For n=2, any binary word of length four has a factor of exponent 2, and since the critical exponent of the Thue–Morse sequence is 2, the repetition threshold for binary alphabets is RT(2) = 2. It is known that RT(3) = 7/4, RT(4) = 7/5 and that for n≥5 we have RT(n) ≥ n/(n-1). It is conjectured that the latter is the true value, and this has been established for 5 ≤ n ≤ 14 and for n ≥ 33. [2] [5]

See also

Notes

  1. Krieger (2006) p.281
  2. 1 2 Berstel et al (2009) p.126
  3. 1 2 3 4 5 Krieger (2006) p.280
  4. Krieger (2006) p.282
  5. 1 2 Allouche & Shallit (2003) p. 37
  6. Krieger & Shallit (2007).

Related Research Articles

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In mathematics, the Thue–Morse sequence, or Prouhet–Thue–Morse sequence, is the binary sequence obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus far. The first few steps of this procedure yield the strings 0 then 01, 0110, 01101001, 0110100110010110, and so on, which are prefixes of the Thue–Morse sequence. The full sequence begins:

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In mathematics, the Prouhet–Thue–Morse constant, named for Eugène Prouhet, Axel Thue, and Marston Morse, is the number—denoted by τ—whose binary expansion 0.01101001100101101001011001101001... is given by the Thue–Morse sequence. That is,

Sturmian word Kind of infinitely long sequence of characters

In mathematics, a Sturmian word, named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of English billiards on a square table. The struck ball will successively hit the vertical and horizontal edges labelled 0 and 1 generating a sequence of letters. This sequence is a Sturmian word.

In combinatorics, a squarefree word is a word that does not contain any squares. A square is a word of the form XX, where X is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern XX.

In mathematics and theoretical computer science, an automatic sequence is an infinite sequence of terms characterized by a finite automaton. The n-th term of an automatic sequence a(n) is a mapping of the final state reached in a finite automaton accepting the digits of the number n in some fixed base k.

In mathematics the Baum–Sweet sequence is an infinite automatic sequence of 0s and 1s defined by the rule:

In mathematics, the Rudin–Shapiro sequence, also known as the Golay–Rudin–Shapiro sequence, is an infinite 2-automatic sequence named after Marcel Golay, Walter Rudin, and Harold S. Shapiro, who independently investigated its properties.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In computer science, the complexity function of a word or string is the function that counts the number of distinct factors of that string. More generally, the complexity function of a formal language counts the number of distinct words of given length.

In mathematics and computer science, a morphic word or substitutive word is an infinite sequence of symbols which is constructed from a particular class of endomorphism of a free monoid.

In mathematics, a sesquipower or Zimin word is a string over an alphabet with identical prefix and suffix. Sesquipowers are unavoidable patterns, in the sense that all sufficiently long strings contain one.

In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet.

In mathematics, a recurrent word or sequence is an infinite word over a finite alphabet in which every factor occurs infinitely many times. An infinite word is recurrent if and only if it is a sesquipower.

In mathematics, Ostrowski numeration, named after Alexander Ostrowski, is either of two related numeration systems based on continued fractions: a non-standard positional numeral system for integers and a non-integer representation of real numbers.

In formal language theory, the Chomsky–Schützenberger enumeration theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger about the number of words of a given length generated by an unambiguous context-free grammar. The theorem provides an unexpected link between the theory of formal languages and abstract algebra.

In mathematics and theoretical computer science, a k-regular sequence is a sequence satisfying linear recurrence equations that reflect the base-k representations of the integers. The class of k-regular sequences generalizes the class of k-automatic sequences to alphabets of infinite size.

Dejean's theorem is a statement about repetitions in infinite strings of symbols. It belongs to the field of combinatorics on words; it was conjectured in 1972 by Françoise Dejean and proven in 2009 by Currie and Rampersad and, independently, by Rao.

Cobham's theorem is a theorem in combinatorics on words that has important connections with number theory, notably transcendental numbers, and automata theory. Informally, the theorem gives the condition for the members of a set S of natural numbers written in bases b1 and base b2 to be recognised by finite automata. Specifically, consider bases b1 and b2 such that they are not powers of the same integer. Cobham's theorem states that S written in bases b1 and b2 is recognised by finite automata if and only if S is a finite union of arithmetic progressions. The theorem was proved by Alan Cobham in 1969 and has since given rise to many extensions and generalisations.

References