Parametric word

Last updated

In the mathematical study of combinatorics on words, a parametric word is a string over a given alphabet having some number of wildcard characters. [1] The set of strings matching a given parametric word is called a combinatorial cube. Parametric words can be composed, to produce smaller subcubes of a given combinatorial cube. They have applications in Ramsey theory and in computer science in the detection of duplicate code.

Contents

Definitions and notation

Formally, a -parameter word of length , over a given alphabet , is a sequence of characters, some of which may be drawn from and the others of which are distinct wildcard characters . Each wildcard character is required to appear at least once, but may appear multiple times, and the wildcard characters must appear in the order given by their indexes: the first wildcard character in the word must be , the next one that is different from must be , etc. As a special case, a word over the given alphabet, without any wildcard characters, is said to be a 0-parameter word. For 1-parameter words, the subscripts may be omitted, as there is no ambiguity between different wildcard characters. The set of all -parameter words over , of length , is denoted . [1]

A -parameter word represents a set of strings (0-parameter words), obtained by substituting a symbol of for each wildcard character. This set of strings is called a parameter set of combinatorial cube, and is called its dimension. A one-dimensional combinatorial cube may be called a combinatorial line. [1]

In a combinatorial cube, each copy of a particular wildcard character must have the same replacement. A generalization of parametric words allows different copies of the same wildcard character to be replaced by different characters from the alphabet, in a controlled way. If is an alphabet and is a group with an action on , then a -labeled parametric word is a -parameter word together with an assignment of a group element to each wildcard character in the word. The first occurrence of each wildcard character must be assigned the identity element of the group. Then, the strings represented by a labeled parametric word are obtained by choosing a character of for each wildcard character, and substituting the result of combining that character with the group element labeling each copy of that character. The set of all -labeled-parameter words over , of length , is denoted . [1]

Example

In the game of tic-tac-toe, the cells of the game board can be given two integer coordinates from the alphabet . Concatenating these two coordinates produces a string representing each cell, one of the nine strings or . There are seven one-parameter words of length two over this alphabet, the words and . The corresponding combinatorial lines form seven of the eight lines of three cells in a row of the tic-tac-toe board; for instance, the one-parameter word corresponds to the combinatorial line , and the one-parameter word corresponds to the combinatorial line . [2]

However, one of the eight winning lines of the tic-tac-toe game is missing from this set of combinatorial lines: the antidiagonal line . It is possible to obtain this line as a combinatorial line (without including any other combinations of cells that would be invalid for tic-tac-toe) by using a group with two elements, and an action in which the non-identity element swaps the alphabet letters and while leaving the element in place. There are eight labeled one-parameter words of length two for this action, seven of which are obtained from the unlabeled one-parameter words by using the identity label for all wildcards. These seven have the same combinatorial lines as before. The eighth labeled word consists of the word labeled by the identity element for its first and the reversing non-identity element for the second ; its combinatorial line is the final winning line of the tic-tac-toe board, . [2]

Composition

For three given integer parameters , it is possible to combine two parametric words, and , to produce another parametric word . To do so, simply replace each copy of the th wildcard symbol in by the th character in . This will necessarily produce a word of length that uses each of the wildcard symbols in at least once, in ascending order, so it produces a valid -parameter word of length . This notion of composition can also be extended to composition of labeled parametric words (both using the same alphabet and group action), by applying the group action to the non-wildcard substituted characters and composing the group labels for the wildcard substituted characters. A subset of a combinatorial cube is a smaller combinatorial cube if it can be obtained by a composition in this way. [1]

Combinatorial enumeration

The number of parametric words in for an alphabet of size is an -Stirling number of the second kind . These numbers count the number of partitions of the integers in the range into non-empty subsets such that the first integers belong to distinct subsets. Partitions of this type can be placed into a bijective equivalence with the parametric words, by creating a word with a character for each of the integers in the range , setting this character value to be either an integer in belonging to the same subset of the partition, or a wildcard character for each subset of the partition that does not contain an integer in . The -Stirling numbers obey a simple recurrence relation by which they may easily be calculated. [3]

Applications

In Ramsey theory, parametric words and combinatorial cubes may be used to formulate the Graham–Rothschild theorem, according to which, for every finite alphabet and group action, and every combination of integer values , , and , there exists a sufficiently large number such that if each -dimensional combinatorial cube over strings of length is assigned one of colors, then there exists a -dimensional combinatorial cube all of whose -dimensional subcubes have the same color. This result is a key foundation for structural Ramsey theory, and is used to define Graham's number, an enormous number used to estimate the value of for a certain combination of values. [1]

In computer science, in the problem of searching for duplicate code, the source code for a given routine or module may be transformed into a parametric word by converting it into a sequence of tokens, and for each variable or subroutine name, replacing each copy of the same name with the same wildcard character. If code is duplicated, the resulting parametric words will remain equal even if some of the variables or subroutines have been renamed. More sophisticated searching algorithms can find long duplicate code sections that form substrings of larger source code repositories, by allowing the wildcard characters to be substituted for each other. [4]

An important special case of parametric words, well-studied in the combinatorics of words, is given by the partial words. These are strings with wildcard characters that may be substituted independently of each other, without requiring that some of the substituted characters be equal or controlled by a group action. In the language of parametric words, a partial word may be described as a parametric word in which each wildcard symbol appears exactly once. However, because there is no repetition of wildcard symbols, partial words may be written more simply by omitting the subscripts on the wildcard symbols. [5]

Related Research Articles

Binomial coefficient family of positive integers that occur as coefficients in the binomial theorem

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers nk ≥ 0 and is written It is the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n, and it is given by the formula

In elementary algebra, the binomial theorem describes the algebraic expansion of powers of a binomial. According to the theorem, it is possible to expand the polynomial (x + y)n into a sum involving terms of the form axbyc, where the exponents b and c are nonnegative integers with b + c = n, and the coefficient a of each term is a specific positive integer depending on n and b. For example,

In mathematics, a combination is a selection of items from a collection, such that the order of selection does not matter. For example, given three fruits, say an apple, an orange and a pear, there are three combinations of two that can be drawn from this set: an apple and a pear; an apple and an orange; or a pear and an orange. More formally, a k-combination of a set S is a subset of k distinct elements of S. If the set has n elements, the number of k-combinations is equal to the binomial coefficient

Steiner system A type of block design, specifically a t-design with λ = 1 and t ≥ 2.

In combinatorial mathematics, a Steiner system is a type of block design, specifically a t-design with λ = 1 and t ≥ 2.

Pascals triangle triangular array of the binomial coefficients in mathematics

In mathematics, Pascal's triangle is a triangular array of the binomial coefficients. In much of the Western world, it is named after the French mathematician Blaise Pascal, although other mathematicians studied it centuries before him in India, Persia (Iran), China, Germany, and Italy.

In mathematics, the lexicographic or lexicographical order is a generalization of the way words are alphabetically ordered based on the alphabetical order of their component letters. This generalization consists primarily in defining a total order on the sequences of elements of a finite totally ordered set, often called an alphabet.

In mathematics, the Hales–Jewett theorem is a fundamental combinatorial result of Ramsey theory named after Alfred W. Hales and Robert I. Jewett, concerning the degree to which high-dimensional objects must necessarily exhibit some combinatorial structure; it is impossible for such objects to be "completely random".

In combinatorics, bijective proof is a proof technique that finds a bijective function f : AB between two finite sets A and B, or a size-preserving bijective function between two combinatorial classes, thus proving that they have the same number of elements, |A| = |B|. One place the technique is useful is where we wish to know the size of A, but can find no direct way of counting its elements. By establishing a bijection from A to some B solves the problem if B is more easily countable. Another useful feature of the technique is that the nature of the bijection itself often provides powerful insights into each or both of the sets.

In mathematics, and in particular in combinatorics, the combinatorial number system of degree k, also referred to as combinadics, is a correspondence between natural numbers N and k-combinations, represented as strictly decreasing sequences ck > ... > c2 > c1 ≥ 0. Since the latter are strings of numbers, one can view this as a kind of numeral system for representing N, although the main utility is representing a k-combination by N rather than the other way around. Distinct numbers correspond to distinct k-combinations, and produce them in lexicographic order; moreover the numbers less than correspond to all k-combinations of { 0, 1, ..., n − 1}. The correspondence does not depend on the size n of the set that the k-combinations are taken from, so it can be interpreted as a map from N to the k-combinations taken from N; in this view the correspondence is a bijection.

In mathematics, the Gaussian binomial coefficients are q-analogs of the binomial coefficients. The Gaussian binomial coefficient, written as or , is a polynomial in q with integer coefficients, whose value when q is set to a prime power counts the number of subspaces of dimension k in a vector space of dimension n over a finite field with q elements.

De Bruijn sequence circular sequence of symbols that contains each possible length-k contiguous subsequence exactly once

In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring. Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of B(k, n), must start at a different position, because substrings starting at the same position are not distinct. Therefore, B(k, n) must have at leastkn symbols. And since B(k, n) has exactlykn symbols, De Bruijn sequences are optimally short with respect to the property of containing every string of length n exactly once.

In mathematics, Pascal's rule is a combinatorial identity about binomial coefficients. It states that for positive natural numbers n and k,

A Post canonical system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set j of specified rules of a certain form, thus generating a formal language. Today they are mainly of historical relevance because every Post canonical system can be reduced to a string rewriting system, which is a simpler formulation. Both formalisms are Turing complete.

In the context of combinatorial mathematics, stars and bars is a graphical aid for deriving certain combinatorial theorems. It was popularized by William Feller in his classic book on probability. It can be used to solve many simple counting problems, such as how many ways there are to put n indistinguishable balls into k distinguishable bins.

In computer science and the study of combinatorics on words, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. More formally, a partial word is a partial function where is some finite alphabet. If u(k) is not defined for some then the unknown element at place k in the string is called a "hole". In regular expressions a hole is represented by the metacharacter ".". For example, aab.ab.b is a partial word of length 8 over the alphabet A ={a,b} in which the fourth and seventh characters are holes.

Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words affects various areas of mathematical study, including algebra and computer science. There have been a wide range of contributions to the field. Some of the first work was on square-free words by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of algorithms and coding. It led to developments in abstract algebra and answering open questions.

In mathematics, a sesquipower or Zimin word is a string over an alphabet with identical prefix and suffix. Sesquipowers are unavoidable patterns, in the sense that all sufficiently long strings contain one.

A strong positional game is a kind of positional game. Like most positional games, it is described by its set of positions and its family of winning-sets. It is played by two players, called First and Second, who alternately take previously-untaken positions.

Locally linear graph

In graph theory, a locally linear graph is an undirected graph in which the neighborhood of every vertex is an induced matching. That is, for every vertex , every neighbor of should be adjacent to exactly one other neighbor of . Equivalently, every edge of such a graph belongs to a unique triangle . Locally linear graphs have also been called locally matched graphs.

In mathematics, structural Ramsey theory is a categorical generalisation of Ramsey theory, rooted in the idea that many important results of Ramsey theory have "similar" logical structure. The key observation is noting that these Ramsey-type theorems can be expressed as the assertion that a certain category has the Ramsey property.

References

  1. 1 2 3 4 5 6 Prömel, Hans Jürgen (2002), "Large numbers, Knuth's arrow notation, and Ramsey theory", Synthese , 133 (1–2): 87–105, doi:10.1023/A:1020879709125, JSTOR   20117296, MR   1950045
  2. 1 2 Prömel, Hans Jürgen (2013), "Hales–Jewett's Theorem", Ramsey Theory for Discrete Structures, Springer International Publishing, pp. 41–51, doi:10.1007/978-3-319-01315-2_4
  3. Broder, Andrei Z. (1984), "The -Stirling numbers", Discrete Mathematics , 49 (3): 241–259, doi:10.1016/0012-365X(84)90161-4, MR   0743795
  4. Baker, Brenda S. (1997), "Parameterized duplication in strings: algorithms and an application to software maintenance", SIAM Journal on Computing , 26 (5): 1343–1362, doi:10.1137/S0097539793246707, MR   1471985
  5. Blanchet-Sadri, Francine (2008), Algorithmic Combinatorics on Partial Words, Discrete Mathematics and its Applications, Boca Raton, Florida: Chapman & Hall/CRC, ISBN   978-1-4200-6092-8, MR   2384993