Algorithmic Combinatorics on Partial Words

Last updated

Algorithmic Combinatorics on Partial Words is a book in the area of combinatorics on words, and more specifically on partial words. It was written by Francine Blanchet-Sadri, and published in 2008 by Chapman & Hall/CRC in their Discrete Mathematics and its Applications book series.

Contents

Algorithmic Combinatorics on Partial Words
AuthorFrancine Blanchet-Sadri
LanguageEnglish
PublisherChapman & Hall/CRC
Publication date
2008

Topics

A partial word is a string whose characters may either belong to a given alphabet or be a wildcard character. Such a word can represent a set of strings over the alphabet without wildcards, by allowing each wildcard character to be replaced by any single character of the alphabet, independently of the replacements of the other wildcard characters. Two partial words are compatible when they agree on their non-wildcard characters, or equivalently when there is a string that they both match; one partial word contains another partial word if they are compatible and the non-wildcard positions of contain those of ; equivalently, the strings matched by are a subset of those matched by . [1]

The book has 12 chapters, [2] which can be grouped into five larger parts. The first part consists of two introductory chapters defining partial words, compatibility and containment, and related concepts. The second part generalizes to partial words some standard results on repetitions in strings, and the third part studies the problem of characterizing and recognizing primitive partial words, the partial words that have no repetition. Part four concerns codes defined from sets of partial words, in the sense that no two distinct concatenations of partial words from the set can be compatible with each other. A final part includes three chapters on advanced topics including the construction of repetitions of given numbers of copies of partial words that are compatible with each other, enumeration of the possible patterns of repetitions of partial words, and sets of partial words with the property that every infinite string contains a substring matching the set. [1] Each chapter includes a set of exercises, and the end of the book provides hints to some of these exercises. [2]

Audience and reception

Although Algorithmic Combinatorics on Partial Words is primarily aimed at the graduate level, reviewer Miklós Bóna writes that it is for the most part "remarkably easy to read" and suggests that it could also be read by advanced undergraduates. However, Bóna criticizes the book as being too focused on the combinatorics on words as an end in itself, with no discussion of how to translate mathematical structures of other types into partial words so that the methods of this book can be applied to them. Because of this lack of generality and application, he suggests that the audience for the book is likely to consist only of other researchers specializing in this area. [1] Similarly, although Patrice Séébold notes that this area can be motivated by applications to gene comparison, he criticizes the book as being largely a catalog of its author's own research results in partial words, without the broader thematic overview or identification of the fundamental topics and theorems that one would expect of a textbook, and suggests that a textbook that accomplishes these goals is still waiting to be written. [3]

However, reviewer Jan Kratochvíl is more positive, calling this "the first reference book on the theory of partial words", praising its pacing from introductory material to more advanced topics, and writing that it well supports its underlying thesis that many of the main results in the combinatorics of words without wildcards can be extended to partial words. He summarizes it as "an excellent textbook as well as a reference book for interested researchers". [2]

Related Research Articles

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

In mathematical logic and computer science, the Kleene star is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid construction. The application of the Kleene star to a set is written as . It is widely used for regular expressions, which is the context in which it was introduced by Stephen Kleene to characterize certain automata, where it means "zero or more repetitions".

  1. If is a set of strings, then is defined as the smallest superset of that contains the empty string and is closed under the string concatenation operation.
  2. If is a set of symbols or characters, then is the set of all strings over symbols in , including the empty string .

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In mathematics, the lexicographic or lexicographical order is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a totally ordered set.

<span class="mw-page-title-main">Antimatroid</span> Mathematical system of orderings or sets

In mathematics, an antimatroid is a formal system that describes processes in which a set is built up by including elements one at a time, and in which an element, once available for inclusion, remains available until it is included. Antimatroids are commonly axiomatized in two equivalent ways, either as a set system modeling the possible states of such a process, or as a formal language modeling the different sequences in which elements may be included. Dilworth (1940) was the first to study antimatroids, using yet another axiomatization based on lattice theory, and they have been frequently rediscovered in other contexts.

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines a relation between sets of strings.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences. Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words. Lyndon words are a special case of Hall words; almost all properties of Lyndon words are shared by Hall words.

In combinatorics, a square-free word is a word that does not contain any squares. A square is a word of the form XX, where X is not empty. Thus, a square-free word can also be defined as a word that avoids the pattern XX.

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/characters/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and, depending on its purpose, may be finite, countable, or even uncountable.

In computer science and the study of combinatorics on words, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. More formally, a partial word is a partial function where is some finite alphabet. If u(k) is not defined for some then the unknown element at place k in the string is called a "hole". In regular expressions (following the POSIX standard) a hole is represented by the metacharacter ".". For example, aab.ab.b is a partial word of length 8 over the alphabet A ={a,b} in which the fourth and seventh characters are holes.

A schema is a template in computer science used in the field of genetic algorithms that identifies a subset of strings with similarities at certain string positions. Schemata are a special case of cylinder sets, forming a basis for a product topology on strings. In other words, schemata can be used to generate a topology on a space of strings.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In mathematics, a factorisation of a free monoid is a sequence of subsets of words with the property that every word in the free monoid can be written as a concatenation of elements drawn from the subsets. The Chen–Fox–Lyndon theorem states that the Lyndon words furnish a factorisation. The Schützenberger theorem relates the definition in terms of a multiplicative property to an additive property.

In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet.

In mathematics and computer science, the critical exponent of a finite or infinite sequence of symbols over a finite alphabet describes the largest number of times a contiguous subsequence can be repeated. For example, the critical exponent of "Mississippi" is 7/3, as it contains the string "ississi", which is of length 7 and period 3.

Dejean's theorem is a statement about repetitions in infinite strings of symbols. It belongs to the field of combinatorics on words; it was conjectured in 1972 by Françoise Dejean and proven in 2009 by Currie and Rampersad and, independently, by Rao.

In the mathematical study of combinatorics on words, a parameter word is a string over a given alphabet having some number of wildcard characters. The set of strings matching a given parameter word is called a parameter set or combinatorial cube. Parameter words can be composed, to produce smaller subcubes of a given combinatorial cube. They have applications in Ramsey theory and in computer science in the detection of duplicate code.

In mathematics, the Graham–Rothschild theorem is a theorem that applies Ramsey theory to combinatorics on words and combinatorial cubes. It is named after Ronald Graham and Bruce Lee Rothschild, who published its proof in 1971. Through the work of Graham, Rothschild, and Klaus Leeb in 1972, it became part of the foundations of structural Ramsey theory. A special case of the Graham–Rothschild theorem motivates the definition of Graham's number, a number that was popularized by Martin Gardner in Scientific American and listed in the Guinness Book of World Records as the largest number ever appearing in a mathematical proof.

References

  1. 1 2 3 Bóna, Miklós (September 2009), "Review of Algorithmic Combinatorics on Partial Words" (PDF), ACM SIGACT News , 40 (3): 39–41, doi:10.1145/1620491.1620497
  2. 1 2 3 Kratochvíl, Jan (June 2011), "Review of Algorithmic Combinatorics on Partial Words", EMS Reviews, European Mathematical Society
  3. Séébold, Patrice (2009), "Review of Algorithmic Combinatorics on Partial Words", MathSciNet , MR   2384993