Dyck language

Last updated January 12, 2024

In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, D1, uses just two matching brackets, e.g. ( and ).

Formal definition

Let $\Sigma =\{[,]\}$ be the alphabet consisting of the symbols [ and ]. Let $\Sigma ^{*}$ denote its Kleene closure. The Dyck language is defined as:

\{u\in \Sigma ^{*}\vert {\text{ all prefixes of }}u{\text{ contain no more ]'s than ['s}}{\text{ and the number of ['s in }}u{\text{ equals the number of ]'s}}\}.

Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations. The Dyck language is generated by the context-free grammar with a single non-terminal $S$ , and the production:

S \to ε | "[" S "]" S

That is, S is either the empty string ( $ε$ ) or is "[", an element of the Dyck language, the matching "]", and an element of the Dyck language.

An alternative context-free grammar for the Dyck language is given by the production:

S \to ("[" S "]") *

That is, S is zero or more occurrences of the combination of "[", an element of the Dyck language, and a matching "]", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.

Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting $\Sigma ^{*}$ into equivalence classes, as follows. For any element $u\in \Sigma ^{*}$ of length $|u|$ , we define partial functions $\operatorname {insert$ and $\operatorname {delete$ by

\operatorname {insert} (u,j)

is

u

with "

[]

" inserted into the

j

th position

\operatorname {delete} (u,j)

is

u

with "

[]

" deleted from the

j

th position

with the understanding that $\operatorname {insert} (u,j)$ is undefined for $j>|u|$ and $\operatorname {delete} (u,j)$ is undefined if $j>|u|-2$ . We define an equivalence relation $R$ on $\Sigma ^{*}$ as follows: for elements $a,b\in \Sigma ^{*}$ we have $(a,b)\in R$ if and only if there exists a sequence of zero or more applications of the $\operatorname {insert}$ and $\operatorname {delete}$ functions starting with $a$ and ending with $b$ . That the sequence of zero operations is allowed accounts for the reflexivity of $R$ . Symmetry follows from the observation that any finite sequence of applications of $\operatorname {insert}$ to a string can be undone with a finite sequence of applications of $\operatorname {delete}$ . Transitivity is clear from the definition.

The equivalence relation partitions the language $\Sigma ^{*}$ into equivalence classes. If we take $\epsilon$ to denote the empty string, then the language corresponding to the equivalence class $\operatorname {Cl} (\epsilon )$ is called the Dyck language.

Properties

The Dyck language is closed under the operation of concatenation.
By treating $\Sigma ^{*}$ as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient $\Sigma ^{*}/R$ , resulting in the syntactic monoid of the Dyck language. The class $\operatorname {Cl} (\epsilon )$ will be denoted $1$ .
The syntactic monoid of the Dyck language is not commutative: if $u=\operatorname {Cl} ([)$ and $v=\operatorname {Cl} (])$ then $uv=\operatorname {Cl} ([])=1\neq \operatorname {Cl} (][)=vu$ .
With the notation above, $uv=1$ but neither $u$ nor $v$ are invertible in $\Sigma ^{*}/R$ .
The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of $\operatorname {Cl} ([)$ and $\operatorname {Cl} (])$ described above.
By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a Dyck language on one or more kinds of bracket pairs.^[1]
The Dyck language with two distinct types of brackets can be recognized in the complexity class $TC^{0}$ .^[2]
The number of distinct Dyck words with exactly $n$ pairs of parentheses and $k$ innermost pairs (viz. the substring $[\ ]$ ) is the Narayana number $\operatorname {N} (n,k)$ .
The number of distinct Dyck words with exactly $n$ pairs of parentheses is the $n$ -th Catalan number $C_{n}$ . Notice that the Dyck language of words with $n$ parentheses pairs is equal to the union, over all possible $k$ , of the Dyck languages of words of $n$ parentheses pairs with $k$ innermost pairs, as defined in the previous point. Since $k$ can range from 0 to $n$ , we obtain the following equality, which indeed holds:

C_{n}=\sum _{k=1}^{n}\operatorname {N} (n,k)

Examples

We can define an equivalence relation $L$ on the Dyck language ${\mathcal {D}}$ . For $u,v\in {\mathcal {D}}$ we have $(u,v)\in L$ if and only if $|u|=|v|$ , i.e. $u$ and $v$ have the same length. This relation partitions the Dyck language: ${\mathcal {D}}/L=\{{\mathcal {D}}_{0},{\mathcal {D}}_{1},\ldots \}$ . We have ${\mathcal {D}}={\mathcal {D}}_{0}\cup {\mathcal {D}}_{2}\cup {\mathcal {D}}_{4}\cup \ldots =\bigcup _{n=0}^{\infty }{\mathcal {D}}_{n}$ where ${\mathcal {D}}_{n}=\{u\in {\mathcal {D}}\mid |u|=n\}$ . Note that ${\mathcal {D}}_{n}$ is empty for odd $n$ .

Having introduced the Dyck words of length $n$ , we can introduce a relationship on them. For every $n\in \mathbb {N}$ we define a relation $S_{n}$ on ${\mathcal {D}}_{n}$ ; for $u,v\in {\mathcal {D}}_{n}$ we have $(u,v)\in S_{n}$ if and only if $v$ can be reached from $u$ by a series of proper swaps. A proper swap in a word $u\in {\mathcal {D}}_{n}$ swaps an occurrence of '][' with '[]'. For each $n\in \mathbb {N}$ the relation $S_{n}$ makes ${\mathcal {D}}_{n}$ into a partially ordered set. The relation $S_{n}$ is reflexive because an empty sequence of proper swaps takes $u$ to $u$ . Transitivity follows because we can extend a sequence of proper swaps that takes $u$ to $v$ by concatenating it with a sequence of proper swaps that takes $v$ to $w$ forming a sequence that takes $u$ into $w$ . To see that $S_{n}$ is also antisymmetric we introduce an auxiliary function $\sigma _{n}:{\mathcal {D}}_{n}\rightarrow \mathbb {N}$ defined as a sum over all prefixes $v$ of $u$ :

\sigma _{n}(u)=\sum _{vw=u}{\Big (}({\text{count of ['s in }}v)-({\text{count of ]'s in }}v){\Big )}

The following table illustrates that $\sigma _{n}$ is strictly monotonic with respect to proper swaps.

Strict monotonicity of $\sigma _{n}$
partial sums of $\sigma _{n}(u)$	$P$	$P-1$	$P$	$Q$
$u$	$\ldots$	]	[	$\ldots$
$u'$	$\ldots$	[	]	$\ldots$
partial sums of $\sigma _{n}(u')$	$P$	$P+1$	$P$	$Q$
Difference of partial sums	0	2	0	0

Hence $\sigma _{n}(u')-\sigma _{n}(u)=2>0$ so $\sigma _{n}(u)<\sigma _{n}(u')$ when there is a proper swap that takes $u$ into $u'$ . Now if we assume that both $(u,v),(v,u)\in S_{n}$ and $u\neq v$ , then there are non-empty sequences of proper swaps such $u$ is taken into $v$ and vice versa. But then $\sigma _{n}(u)<\sigma _{n}(v)<\sigma _{n}(u)$ which is nonsensical. Therefore, whenever both $(u,v)$ and $(v,u)$ are in $S_{n}$ , we have $u=v$ , hence $S_{n}$ is antisymmetric.

The partial ordered set $D_{8}$ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

Generalizations

There exist variants of the Dyck language with multiple delimiters, e.g., D2 on the alphabet "(", ")", "[", and "]". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack. (The counting algorithm above does not generalise).

Notes

↑ Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
↑ Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

Related Research Articles

In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type. The word homomorphism comes from the Ancient Greek language: ὁμός meaning "same" and μορφή meaning "form" or "shape". However, the word was apparently introduced to mathematics due to a (mis)translation of German ähnlich meaning "similar" to ὁμός meaning "same". The term "homomorphism" appeared as early as 1892, when it was attributed to the German mathematician Felix Klein (1849–1925).

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the convergence of monotonic sequences that are also bounded. Informally, the theorems state that if a sequence is increasing and bounded above by a supremum, then the sequence will converge to the supremum; in the same way, if a sequence is decreasing and is bounded below by an infimum, it will converge to the infimum.

In mathematics, the idea of a free object is one of the basic concepts of abstract algebra. Informally, a free object over a set A can be thought of as being a "generic" algebraic structure over A: the only equations that hold between elements of the free object are those that follow from the defining axioms of the algebraic structure. Examples include free groups, tensor algebras, or free lattices.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero elements, often called the empty string and denoted by ε or λ, as the identity element. The free monoid on a set A is usually denoted A^∗. The free semigroup on A is the subsemigroup of A^∗ containing all elements except the empty string. It is usually denoted A⁺.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

In algebraic geometry, divisors are a generalization of codimension-1 subvarieties of algebraic varieties. Two different generalizations are in common use, Cartier divisors and Weil divisors. Both are derived from the notion of divisibility in the integers and algebraic number fields.

Independence-friendly logic is an extension of classical first-order logic (FOL) by means of slashed quantifiers of the form $and, where is a finite set of variables. The intended reading of is "there is a which is functionally independent from the variables in ". IF logic allows one to express more general patterns of dependence between variables than those which are implicit in first-order logic. This greater level of generality leads to an actual increase in expressive power; the set of IF sentences can characterize the same classes of structures as existential second-order logic.$

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation $between fixed strings over the alphabet, called rewrite rules, denoted by, an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is, where,,, and are strings.$

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and, depending on its purpose, may be finite, countable, or even uncountable.

In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by convergence of measures, consider a sequence of measures μ_n on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure μ that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance ε > 0 we require there be N sufficiently large for n ≥ N to ensure the 'difference' between μ_n and μ is smaller than ε. Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.

In mathematics, Hochschild homology (and cohomology) is a homology theory for associative algebras over rings. There is also a theory for Hochschild homology of certain functors. Hochschild cohomology was introduced by Gerhard Hochschild (1945) for algebras over a field, and extended to algebras over more general rings by Henri Cartan and Samuel Eilenberg (1956).

In computer science, a trace is a set of strings, wherein certain letters in the string are allowed to commute, but others are not. It generalizes the concept of a string, by not forcing the letters to always be in a fixed order, but allowing certain reshufflings to take place. Traces were introduced by Pierre Cartier and Dominique Foata in 1969 to give a combinatorial proof of MacMahon's master theorem. Traces are used in theories of concurrent computation, where commuting letters stand for portions of a job that can execute independently of one another, while non-commuting letters stand for locks, synchronization points or thread joins.

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

In mathematics and computer science, a history monoid is a way of representing the histories of concurrently running computer processes as a collection of strings, each string representing the individual history of a process. The history monoid provides a set of synchronization primitives for providing rendezvous points between a set of independently executing processes or threads.

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

Generalized relative entropy is a measure of dissimilarity between two quantum states. It is a "one-shot" analogue of quantum relative entropy and shares many properties of the latter quantity.

In mathematics, the injective tensor product of two topological vector spaces (TVSs) was introduced by Alexander Grothendieck and was used by him to define nuclear spaces. An injective tensor product is in general not necessarily complete, so its completion is called the completed injective tensor products. Injective tensor products have applications outside of nuclear spaces. In particular, as described below, up to TVS-isomorphism, many TVSs that are defined for real or complex valued functions, for instance, the Schwartz space or the space of continuously differentiable functions, can be immediately extended to functions valued in a Hausdorff locally convex TVS $with out any need to extend definitions from real/complex-valued functions to -valued functions.$

References

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208

[2] Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

[1]

[2]