Algorithmic Combinatorics on Partial Words

Last updated

Algorithmic Combinatorics on Partial Words is a book in the area of combinatorics on words, and more specifically on partial words. It was written by Francine Blanchet-Sadri, and published in 2008 by Chapman & Hall/CRC in their Discrete Mathematics and its Applications book series.

Contents

Topics

A partial word is a string whose characters may either belong to a given alphabet or be a wildcard character. Such a word can represent a set of strings over the alphabet without wildcards, by allowing each wildcard character to be replaced by any single character of the alphabet, independently of the replacements of the other wildcard characters. Two partial words are compatible when they agree on their non-wildcard characters, or equivalently when there is a string that they both match; one partial word contains another partial word if they are compatible and the non-wildcard positions of contain those of ; equivalently, the strings matched by are a subset of those matched by . [1]

The book has 12 chapters, [2] which can be grouped into five larger parts. The first part consists of two introductory chapters defining partial words, compatibility and containment, and related concepts. The second part generalizes to partial words some standard results on repetitions in strings, and the third part studies the problem of characterizing and recognizing primitive partial words, the partial words that have no repetition. Part four concerns codes defined from sets of partial words, in the sense that no two distinct concatenations of partial words from the set can be compatible with each other. A final part includes three chapters on advanced topics including the construction of repetitions of given numbers of copies of partial words that are compatible with each other, enumeration of the possible patterns of repetitions of partial words, and sets of partial words with the property that every infinite string contains a substring matching the set. [1] Each chapter includes a set of exercises, and the end of the book provides hints to some of these exercises. [2]

Audience and reception

Although Algorithmic Combinatorics on Partial Words is primarily aimed at the graduate level, reviewer Miklós Bóna writes that it is for the most part "remarkably easy to read" and suggests that it could also be read by advanced undergraduates. However, Bóna criticizes the book as being too focused on the combinatorics on words as an end in itself, with no discussion of how to translate mathematical structures of other types into partial words so that the methods of this book can be applied to them. Because of this lack of generality and application, he suggests that the audience for the book is likely to consist only of other researchers specializing in this area. [1] Similarly, although Patrice Séébold notes that this area can be motivated by applications to gene comparison, he criticizes the book as being largely a catalog of its author's own research results in partial words, without the broader thematic overview or identification of the fundamental topics and theorems that one would expect of a textbook, and suggests that a textbook that accomplishes these goals is still waiting to be written. [3]

However, reviewer Jan Kratochvíl is more positive, calling this "the first reference book on the theory of partial words", praising its pacing from introductory material to more advanced topics, and writing that it well supports its underlying thesis that many of the main results in the combinatorics of words without wildcards can be extended to partial words. He summarizes it as "an excellent textbook as well as a reference book for interested researchers". [2]

Related Research Articles

Formal language Words whose letters are taken from an alphabet and are well-formed according to a specific set of rules

In mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In theoretical computer science and formal language theory, a regular language is a formal language that can be expressed using a regular expression, in the strict sense of the latter notion used in theoretical computer science.

Permutation Change of ordering in a (mathematical) set

In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or process of changing the linear order of an ordered set.

In theoretical computer science, a Markov algorithm is a string rewriting system that uses grammar-like rules to operate on strings of symbols. Markov algorithms have been shown to be Turing-complete, which means that they are suitable as a general model of computation and can represent any mathematical expression from its simple notation. Markov algorithms are named after the Soviet mathematician Andrey Markov, Jr.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In mathematics, the lexicographic or lexicographical order is a generalization of the way words are alphabetically ordered based on the alphabetical order of their component letters. This generalization consists primarily in defining a total order on the sequences of elements of a finite totally ordered set, often called an alphabet.

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton that maps between two sets of symbols. An FST is more general than a finite-state automaton (FSA). An FSA defines a formal language by defining a set of accepted strings, while an FST defines relations between sets of strings.

In the theory of formal languages, the pumping lemma for regular languages is a lemma that describes an essential property of all regular languages. Informally, it says that all sufficiently long words in a regular language may be pumped—that is, have a middle section of the word repeated an arbitrary number of times—to produce a new word that also lies within the same language.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences. Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words.

In combinatorics, a squarefree word is a word that does not contain any squares. A square is a word of the form XX, where X is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern XX.

In formal language theory, a string is defined as a finite sequence of members of an underlying base set; this set is called the alphabet of a string or collection of strings. The members of the set are called symbols, and are typically thought of as representing letters, characters, or digits. For example, a common alphabet is {0,1}, the binary alphabet, and a binary string is a string drawn from the alphabet {0,1}. An infinite sequence of letters may be constructed from elements of an alphabet as well.

In computer science and the study of combinatorics on words, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. More formally, a partial word is a partial function where is some finite alphabet. If u(k) is not defined for some then the unknown element at place k in the string is called a "hole". In regular expressions a hole is represented by the metacharacter ".". For example, aab.ab.b is a partial word of length 8 over the alphabet A ={a,b} in which the fourth and seventh characters are holes.

A schema is a template in computer science used in the field of genetic algorithms that identifies a subset of strings with similarities at certain string positions. Schemata are a special case of cylinder sets; and so form a topological space.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In mathematics, a factorisation of a free monoid is a sequence of subsets of words with the property that every word in the free monoid can be written as a concatenation of elements drawn from the subsets. The Chen–Fox–Lyndon theorem states that the Lyndon words furnish a factorisation. The Schützenberger theorem relates the definition in terms of a multiplicative property to an additive property.

In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet.

Dejean's theorem is a statement about repetitions in infinite strings of symbols. It belongs to the field of combinatorics on words; it was conjectured in 1972 by Françoise Dejean and proven in 2009 by Currie and Rampersad and, independently, by Rao.

In the mathematical study of combinatorics on words, a parameter word is a string over a given alphabet having some number of wildcard characters. The set of strings matching a given parameter word is called a parameter set or combinatorial cube. Parameter words can be composed, to produce smaller subcubes of a given combinatorial cube. They have applications in Ramsey theory and in computer science in the detection of duplicate code.

In mathematics, the Graham–Rothschild theorem is a theorem that applies Ramsey theory to combinatorics on words and combinatorial cubes. It is named after Ronald Graham and Bruce Lee Rothschild, who published its proof in 1971. Through the work of Graham, Rothschild, and Klaus Leeb in 1972, it became part of the foundations of structural Ramsey theory. A special case of the Graham–Rothschild theorem motivates the definition of Graham's number, a number that was popularized by Martin Gardner in Scientific American and listed in the Guinness Book of World Records as the largest number ever appearing in a mathematical proof.

References

  1. 1 2 3 Bóna, Miklós (September 2009), "Review of Algorithmic Combinatorics on Partial Words" (PDF), ACM SIGACT News , 40 (3): 39–41, doi:10.1145/1620491.1620497
  2. 1 2 3 Kratochvíl, Jan (June 2011), "Review of Algorithmic Combinatorics on Partial Words", EMS Reviews, European Mathematical Society
  3. Séébold, Patrice (2009), "Review of Algorithmic Combinatorics on Partial Words", MathSciNet , MR   2384993