Dyck language

Last updated

In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, Dyck-1, uses just two matching brackets, e.g. ( and ).

Contents

Dyck words and language are named after the mathematician Walther von Dyck. They have applications in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.

Formal definition

Let be the alphabet consisting of the symbols [ and ]. Let denote its Kleene closure. The Dyck language is defined as:

Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations. The Dyck language is generated by the context-free grammar with a single non-terminal S, and the production:

Sε| "[" S "]" S

That is, S is either the empty string (ε) or is "[", an element of the Dyck language, the matching "]", and an element of the Dyck language.

An alternative context-free grammar for the Dyck language is given by the production:

S → ("[" S "]")*

That is, S is zero or more occurrences of the combination of "[", an element of the Dyck language, and a matching "]", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.

Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting into equivalence classes, as follows. For any element of length , we define partial functions and by

is with "" inserted into the th position
is with "" deleted from the th position

with the understanding that is undefined for and is undefined if . We define an equivalence relation on as follows: for elements we have if and only if there exists a sequence of zero or more applications of the and functions starting with and ending with . That the sequence of zero operations is allowed accounts for the reflexivity of . Symmetry follows from the observation that any finite sequence of applications of to a string can be undone with a finite sequence of applications of . Transitivity is clear from the definition.

The equivalence relation partitions the language into equivalence classes. If we take to denote the empty string, then the language corresponding to the equivalence class is called the Dyck language.

Generalizations

Typed Dyck language

There exist variants of the Dyck language with multiple delimiters, e.g., Dyck-2 on the alphabet "(", ")", "[", and "]". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack. (The counting algorithm above does not generalise).

For example, the following is a valid sentence in Dyck-3:

( [ [ ] { } ] ( ) { ( ) } ) [ ]

Finite depth

A Dyck language sentence can be pictured as a descent and ascent through the levels of nested brackets. As one reads along a Dyck sentence, each opening bracket increases the nesting depth by 1, and each closing bracket decreases by 1. The depth of a sentence is the maximal depth reached within the sentence.

For example, we can annotate the following sentence with the depth at each step:

0 ( 1 [ 2 [ 3 ] 2 { 3 } 2 ] 1 ( 2 ) 1 { 2 ( 3 ) 2 } 1 ) 0 [ 1 ] 0

and the entire sentence has depth 3.

We define Dyck-(k, m) as the language with k types of brackets and maximal depth m. This has applications in the formal theory of recurrent neural networks. [1]

Properties

Examples

Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down Dyck lattice D4.svg
Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down

We can define an equivalence relation on the Dyck language . For we have if and only if , i.e. and have the same length. This relation partitions the Dyck language: . We have where . Note that is empty for odd .

Having introduced the Dyck words of length , we can introduce a relationship on them. For every we define a relation on ; for we have if and only if can be reached from by a series of proper swaps. A proper swap in a word swaps an occurrence of '][' with '[]'. For each the relation makes into a partially ordered set. The relation is reflexive because an empty sequence of proper swaps takes to . Transitivity follows because we can extend a sequence of proper swaps that takes to by concatenating it with a sequence of proper swaps that takes to forming a sequence that takes into . To see that is also antisymmetric we introduce an auxiliary function defined as a sum over all prefixes of :

The following table illustrates that is strictly monotonic with respect to proper swaps.

Strict monotonicity of
partial sums of
][
[]
partial sums of
Difference of partial sums0200

Hence so when there is a proper swap that takes into . Now if we assume that both and , then there are non-empty sequences of proper swaps such is taken into and vice versa. But then which is nonsensical. Therefore, whenever both and are in , we have , hence is antisymmetric.

The partial ordered set is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

See also

Notes

  1. Hewitt, John; Hahn, Michael; Ganguli, Surya; Liang, Percy; Manning, Christopher D. (2020-10-15), RNNs can generate bounded hierarchical languages with optimal memory, doi:10.48550/arXiv.2010.07515 , retrieved 2024-09-28
  2. Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
  3. Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

Related Research Articles

In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type. The word homomorphism comes from the Ancient Greek language: ὁμός meaning "same" and μορφή meaning "form" or "shape". However, the word was apparently introduced to mathematics due to a (mis)translation of German ähnlich meaning "similar" to ὁμός meaning "same". The term "homomorphism" appeared as early as 1892, when it was attributed to the German mathematician Felix Klein (1849–1925).

In the mathematical field of real analysis, the monotone convergence theorem is any of a number of related theorems proving the good convergence behaviour of monotonic sequences, i.e. sequences that are non-increasing, or non-decreasing. In its simplest form, it says that a non-decreasing bounded-above sequence of real numbers converges to its smallest upper bound, its supremum. Likewise, a non-increasing bounded-below sequence converges to its largest lower bound, its infimum. In particular, infinite sums of non-negative numbers converge to the supremum of the partial sums if and only if the partial sums are bounded.

In mathematics, K-theory is, roughly speaking, the study of a ring generated by vector bundles over a topological space or scheme. In algebraic topology, it is a cohomology theory known as topological K-theory. In algebra and algebraic geometry, it is referred to as algebraic K-theory. It is also a fundamental tool in the field of operator algebras. It can be seen as the study of certain kinds of invariants of large matrices.

In mathematics, in particular in algebraic topology, differential geometry and algebraic geometry, the Chern classes are characteristic classes associated with complex vector bundles. They have since become fundamental concepts in many branches of mathematics and physics, such as string theory, Chern–Simons theory, knot theory, Gromov–Witten invariants. Chern classes were introduced by Shiing-Shen Chern.

In mathematics, the idea of a free object is one of the basic concepts of abstract algebra. Informally, a free object over a set A can be thought of as being a "generic" algebraic structure over A: the only equations that hold between elements of the free object are those that follow from the defining axioms of the algebraic structure. Examples include free groups, tensor algebras, or free lattices.

In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.

In mathematics, particularly in operator theory and C*-algebra theory, the continuous functional calculus is a functional calculus which allows the application of a continuous function to normal elements of a C*-algebra.

In differential topology, the jet bundle is a certain construction that makes a new smooth fiber bundle out of a given smooth fiber bundle. It makes it possible to write differential equations on sections of a fiber bundle in an invariant form. Jets may also be seen as the coordinate free versions of Taylor expansions.

In algebraic topology, a branch of mathematics, a spectrum is an object representing a generalized cohomology theory. Every such cohomology theory is representable, as follows from Brown's representability theorem. This means that, given a cohomology theory

,

In algebraic geometry, divisors are a generalization of codimension-1 subvarieties of algebraic varieties. Two different generalizations are in common use, Cartier divisors and Weil divisors. Both are derived from the notion of divisibility in the integers and algebraic number fields.

Independence-friendly logic is an extension of classical first-order logic (FOL) by means of slashed quantifiers of the form and , where is a finite set of variables. The intended reading of is "there is a which is functionally independent from the variables in ". IF logic allows one to express more general patterns of dependence between variables than those which are implicit in first-order logic. This greater level of generality leads to an actual increase in expressive power; the set of IF sentences can characterize the same classes of structures as existential second-order logic.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

In quantum mechanics, notably in quantum information theory, fidelity quantifies the "closeness" between two density matrices. It expresses the probability that one state will pass a test to identify as the other. It is not a metric on the space of density matrices, but it can be used to define the Bures metric on this space.

In quantum computing, quantum finite automata (QFA) or quantum state machines are a quantum analog of probabilistic automata or a Markov decision process. They provide a mathematical abstraction of real-world quantum computers. Several types of automata may be defined, including measure-once and measure-many automata. Quantum finite automata can also be understood as the quantization of subshifts of finite type, or as a quantization of Markov chains. QFAs are, in turn, special cases of geometric finite automata or topological finite automata.

In computer science, a trace is an equivalence class of strings, wherein certain letters in the string are allowed to commute, but others are not. Traces generalize the concept of strings by relaxing the requirement for all the letters to have a definite order, instead allowing for indefinite orderings in which certain reshufflings could take place. In an opposite way, traces generalize the concept of sets with multiplicities by allowing for specifying some incomplete ordering of the letters rather than requiring complete equivalence under all reorderings. The trace monoid or free partially commutative monoid is a monoid of traces.

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

In mathematics and computer science, a history monoid is a way of representing the histories of concurrently running computer processes as a collection of strings, each string representing the individual history of a process. The history monoid provides a set of synchronization primitives for providing rendezvous points between a set of independently executing processes or threads.

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

In mathematics, specifically topology, a sequence covering map is any of a class of maps between topological spaces whose definitions all somehow relate sequences in the codomain with sequences in the domain. Examples include sequentially quotient maps, sequence coverings, 1-sequence coverings, and 2-sequence coverings. These classes of maps are closely related to sequential spaces. If the domain and/or codomain have certain additional topological properties then these definitions become equivalent to other well-known classes of maps, such as open maps or quotient maps, for example. In these situations, characterizations of such properties in terms of convergent sequences might provide benefits similar to those provided by, say for instance, the characterization of continuity in terms of sequential continuity or the characterization of compactness in terms of sequential compactness.

Flag algebras are an important computational tool in the field of graph theory which have a wide range of applications in homomorphism density and related topics. Roughly, they formalize the notion of adding and multiplying homomorphism densities and set up a framework to solve graph homomorphism inequalities with computers by reducing them to semidefinite programming problems. Originally introduced by Alexander Razborov in a 2007 paper, the method has since come to solve numerous difficult, previously unresolved graph theoretic questions. These include the question regarding the region of feasible edge density, triangle density pairs and the maximum number of pentagons in triangle free graphs.

References