Dyck language

Last updated
Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down Dyck lattice D4.svg
Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down

In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, D1, uses just two matching brackets, e.g. ( and ).

Contents

Dyck words and language are named after the mathematician Walther von Dyck. They have applications in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.

Formal definition

Let be the alphabet consisting of the symbols [ and ]. Let denote its Kleene closure. The Dyck language is defined as:

Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations. The Dyck language is generated by the context-free grammar with a single non-terminal S, and the production:

Sε| "[" S "]" S

That is, S is either the empty string (ε) or is "[", an element of the Dyck language, the matching "]", and an element of the Dyck language.

An alternative context-free grammar for the Dyck language is given by the production:

S → ("[" S "]")*

That is, S is zero or more occurrences of the combination of "[", an element of the Dyck language, and a matching "]", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.

Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting into equivalence classes, as follows. For any element of length , we define partial functions and by

is with "" inserted into the th position
is with "" deleted from the th position

with the understanding that is undefined for and is undefined if . We define an equivalence relation on as follows: for elements we have if and only if there exists a sequence of zero or more applications of the and functions starting with and ending with . That the sequence of zero operations is allowed accounts for the reflexivity of . Symmetry follows from the observation that any finite sequence of applications of to a string can be undone with a finite sequence of applications of . Transitivity is clear from the definition.

The equivalence relation partitions the language into equivalence classes. If we take to denote the empty string, then the language corresponding to the equivalence class is called the Dyck language.

Properties

Examples

We can define an equivalence relation on the Dyck language . For we have if and only if , i.e. and have the same length. This relation partitions the Dyck language: . We have where . Note that is empty for odd .

Having introduced the Dyck words of length , we can introduce a relationship on them. For every we define a relation on ; for we have if and only if can be reached from by a series of proper swaps. A proper swap in a word swaps an occurrence of '][' with '[]'. For each the relation makes into a partially ordered set. The relation is reflexive because an empty sequence of proper swaps takes to . Transitivity follows because we can extend a sequence of proper swaps that takes to by concatenating it with a sequence of proper swaps that takes to forming a sequence that takes into . To see that is also antisymmetric we introduce an auxiliary function defined as a sum over all prefixes of :

The following table illustrates that is strictly monotonic with respect to proper swaps.

Strict monotonicity of
partial sums of
][
[]
partial sums of
Difference of partial sums0200

Hence so when there is a proper swap that takes into . Now if we assume that both and , then there are non-empty sequences of proper swaps such is taken into and vice versa. But then which is nonsensical. Therefore, whenever both and are in , we have , hence is antisymmetric.

The partial ordered set is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

Generalizations

There exist variants of the Dyck language with multiple delimiters, e.g., D2 on the alphabet "(", ")", "[", and "]". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack. (The counting algorithm above does not generalise).

See also

Notes

  1. Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
  2. Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

Related Research Articles

In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type. The word homomorphism comes from the Ancient Greek language: ὁμός meaning "same" and μορφή meaning "form" or "shape". However, the word was apparently introduced to mathematics due to a (mis)translation of German ähnlich meaning "similar" to ὁμός meaning "same". The term "homomorphism" appeared as early as 1892, when it was attributed to the German mathematician Felix Klein (1849–1925).

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences that are also bounded. Informally, the theorems state that if a sequence is increasing and bounded above by a supremum, then the sequence will converge to the supremum; in the same way, if a sequence is decreasing and is bounded below by an infimum, it will converge to the infimum.

In mathematics, the idea of a free object is one of the basic concepts of abstract algebra. Informally, a free object over a set A can be thought of as being a "generic" algebraic structure over A: the only equations that hold between elements of the free object are those that follow from the defining axioms of the algebraic structure. Examples include free groups, tensor algebras, or free lattices.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A. The free semigroup on A is the subsemigroup of A containing all elements except the empty string. It is usually denoted A+.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

In algebraic geometry, divisors are a generalization of codimension-1 subvarieties of algebraic varieties. Two different generalizations are in common use, Cartier divisors and Weil divisors. Both are derived from the notion of divisibility in the integers and algebraic number fields.

Independence-friendly logic is an extension of classical first-order logic (FOL) by means of slashed quantifiers of the form and , where is a finite set of variables. The intended reading of is "there is a which is functionally independent from the variables in ". IF logic allows one to express more general patterns of dependence between variables than those which are implicit in first-order logic. This greater level of generality leads to an actual increase in expressive power; the set of IF sentences can characterize the same classes of structures as existential second-order logic.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and, depending on its purpose, may be finite, countable, or even uncountable.

In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by convergence of measures, consider a sequence of measures μn on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure μ that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance ε > 0 we require there be N sufficiently large for nN to ensure the 'difference' between μn and μ is smaller than ε. Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.

In mathematics, Hochschild homology (and cohomology) is a homology theory for associative algebras over rings. There is also a theory for Hochschild homology of certain functors. Hochschild cohomology was introduced by Gerhard Hochschild (1945) for algebras over a field, and extended to algebras over more general rings by Henri Cartan and Samuel Eilenberg (1956).

In computer science, a trace is a set of strings, wherein certain letters in the string are allowed to commute, but others are not. It generalizes the concept of a string, by not forcing the letters to always be in a fixed order, but allowing certain reshufflings to take place. Traces were introduced by Pierre Cartier and Dominique Foata in 1969 to give a combinatorial proof of MacMahon's master theorem. Traces are used in theories of concurrent computation, where commuting letters stand for portions of a job that can execute independently of one another, while non-commuting letters stand for locks, synchronization points or thread joins.

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

In mathematics and computer science, a history monoid is a way of representing the histories of concurrently running computer processes as a collection of strings, each string representing the individual history of a process. The history monoid provides a set of synchronization primitives for providing rendezvous points between a set of independently executing processes or threads.

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

Generalized relative entropy is a measure of dissimilarity between two quantum states. It is a "one-shot" analogue of quantum relative entropy and shares many properties of the latter quantity.

In mathematics, the injective tensor product of two topological vector spaces (TVSs) was introduced by Alexander Grothendieck and was used by him to define nuclear spaces. An injective tensor product is in general not necessarily complete, so its completion is called the completed injective tensor products. Injective tensor products have applications outside of nuclear spaces. In particular, as described below, up to TVS-isomorphism, many TVSs that are defined for real or complex valued functions, for instance, the Schwartz space or the space of continuously differentiable functions, can be immediately extended to functions valued in a Hausdorff locally convex TVS without any need to extend definitions from real/complex-valued functions to -valued functions.

References