Embedded pushdown automaton

Last updated November 02, 2021

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and context-sensitive grammars, or a subset of mildly context-sensitive grammars. Embedded pushdown automata should not be confused with nested stack automata which have more computational power.^{[ citation needed ]}

History and applications

EPDAs were first described by K. Vijay-Shanker in his 1988 doctoral thesis.^[1] They have since been applied to more complete descriptions of classes of mildly context-sensitive grammars and have had important roles in refining the Chomsky hierarchy. Various subgrammars, such as the linear indexed grammar, can thus be defined.^[2]

While natural languages have traditionally been analyzed using context-free grammars (see transformational-generative grammar and computational linguistics), this model does not work well for languages with crossed dependencies, such as Dutch, situations for which an EPDA is well suited. A detailed linguistic analysis is available in Joshi, Schabes (1997).^[3]

Theory

An EPDA is a finite state machine with a set of stacks that can be themselves accessed through the embedded stack. Each stack contains elements of the stack alphabet $\,\Gamma$ , and so we define an element of a stack by $\,\sigma _{i}\in \Gamma ^{*}$ , where the star is the Kleene closure of the alphabet.

Each stack can then be defined in terms of its elements, so we denote the $\,j$ th stack in the automaton using a double-dagger symbol: $\,\Upsilon _{j}=\ddagger \sigma _{j}=\{\sigma _{j,k},\sigma _{j,k-1},\ldots ,\sigma _{j,1}\}$ ,^{[ clarification needed ]} where $\,\sigma _{j,k}$ would be the next accessible symbol in the stack. The embedded stack of $\,m$ stacks can thus be denoted by $\,\{\Upsilon _{j}\}=\{\ddagger \sigma _{m},\ddagger \sigma _{m-1},\ldots ,\ddagger \sigma _{1}\}\in (\ddagger \Gamma ^{+})^{*}$ .^{[ clarification needed ]}

We define an EPDA by the septuple (7-tuple)

\,M=(Q,\Sigma ,\Gamma ,\delta ,q_{0},Q_{\textrm {F}},\sigma _{0})

where

$\,Q$ is a finite set of states;
$\,\Sigma$ is the finite set of the input alphabet;
$\,\Gamma$ is the finite stack alphabet;
$\,q_{0}\in Q$ is the start state;
$\,Q_{\textrm {F}}\subseteq Q$ is the set of final states;
$\,\sigma _{0}\in \Gamma$ is the initial stack symbol
$\,\delta :Q\times \Sigma \times \Gamma \rightarrow S$ is the transition function, where $\,S$ are finite subsets of $\,Q\times (\ddagger \Gamma ^{+})^{*}\times \Gamma ^{*}\times (\ddagger \Gamma ^{+})^{*}$ .

Thus the transition function takes a state, the next symbol of the input string, and the top symbol of the current stack and generates the next state, the stacks to be pushed and popped onto the embedded stack, the pushing and popping of the current stack, and the stacks to be considered the current stacks in the next transition. More conceptually, the embedded stack is pushed and popped, the current stack is optionally pushed back onto the embedded stack, and any other stacks one would like are pushed on top of that, with the last stack being the one read from in the next iteration. Therefore, stacks can be pushed both above and below the current stack.

A given configuration is defined by

\,C(M)=\{q,\Upsilon _{m}\ldots \Upsilon _{1},x_{1},x_{2}\}\in Q\times (\ddagger \Gamma ^{+})^{*}\times \Sigma ^{*}\times \Sigma ^{*}

where $\,q$ is the current state, the $\,\Upsilon$ s are the stacks in the embedded stack, with $\,\Upsilon _{m}$ the current stack, and for an input string $\,x=x_{1}x_{2}\in \Sigma ^{*}$ , $\,x_{1}$ is the portion of the string already processed by the machine and $\,x_{2}$ is the portion to be processed, with its head being the current symbol read. Note that the empty string $\,\epsilon \in \Sigma$ is implicitly defined as a terminating symbol, where if the machine is at a final state when the empty string is read, the entire input string is accepted, and if not it is rejected. Such accepted strings are elements of the language

\,L(M)=\left\{x|\{q_{0},\Upsilon _{0},\epsilon ,x\}\rightarrow _{M}^{*}\{q_{\textrm {F}},\Upsilon _{m}\ldots \Upsilon _{1},x,\epsilon \}\right\}

where $\,q_{\textrm {F}}\in Q_{\textrm {F}}$ and $\,\rightarrow _{M}^{*}$ defines the transition function applied over as many times as necessary to parse the string.

An informal description of EPDA can also be found in Joshi, Schabes (1997),^[3] Sect.7, p. 23-25.

k-order EPDA and the Weir hierarchy

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir.^[4] Based on the work of Nabil A. Khabbaz,^[5]^[6] Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes^{[ clarify ]} where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

Level-k languages are properly contained in the Level-(k + 1) language class
Level-k languages can be parsed in $O(n^{3\cdot 2^{k-1}})$ time
Level-k contains the language $\{a_{1}^{n}\dotso a_{2^{k}}^{n}|n\geq 0\}$ , but not $\{a_{1}^{n}\dotso a_{2^{k+1}}^{n}|n\geq 0\}$
Level-k contains the language $\{w^{2^{k-1}}|w\in \{a,b\}^{*}\}$ , but not $\{w^{2^{k-1}+1}|w\in \{a,b\}^{*}\}$

Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.

Related Research Articles

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices which are Hermitian and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a Finite Automaton (FA) or Finite-State Machine (FSM).

Categorial grammar is a family of formalisms in natural language syntax which share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz, Yehoshua Bar-Hillel, and Joachim Lambek. It saw a surge of interest in the 1970s following the work of Richard Montague, whose Montague grammar assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.

The Nakagami distribution or the Nakagami-m distribution is a probability distribution related to the gamma distribution. The family of Nakagami distributions has two parameters: a shape parameter $and a second parameter controlling spread .$

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

A queue machine or queue automaton is a finite state machine with the ability to store and retrieve data from an infinite-memory queue. It is a model of computation equivalent to a Turing machine, and therefore it can process the same class of formal languages.

A read-only Turing machine or two-way deterministic finite-state automaton (2DFA) is class of models of computability that behave like a standard Turing machine and can move in both directions across input, except cannot write to its input tape. The machine in its bare form is equivalent to a deterministic finite automaton in computational power, and therefore can only parse a regular language.

An abstract family of acceptors (AFA) is a grouping of generalized acceptors. Informally, an acceptor is a device with a finite state control, a finite number of input symbols, and an internal store with a read and write function. Each acceptor has a start state and a set of accepting states. The device reads a sequence of symbols, transitioning from state to state for each input symbol. If the device ends in an accepting state, the device is said to accept the sequence of symbols. A family of acceptors is a set of acceptors with the same type of internal store. The study of AFA is part of AFL theory.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for strings in a formal language.

A Multitrack Turing machine is a specific type of multi-tape Turing machine.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

In the geometry of numbers, the Klein polyhedron, named after Felix Klein, is used to generalize the concept of continued fractions to higher dimensions.

A tree stack automaton is a formalism considered in automata theory. It is a finite state automaton with the additional ability to manipulate a tree-shaped stack. It is an automaton with storage whose storage roughly resembles the configurations of a thread automaton. A restricted class of tree stack automata recognises exactly the languages generated by multiple context-free grammars.

References

↑ Vijay-Shanker, K. (January 1988). "A Study of Tree-Adjoining Grammars". Ph.D. Thesis. University of Pennsylvania.
↑ Weir, David J. (1994). "Linear Iterated Pushdowns" (PDF). Computational Intelligence. 10 (4): 431–439. doi:10.1111/j.1467-8640.1994.tb00007.x . Retrieved 2012-10-20.
1 2 Joshi, Aravind K.; Yves Schabes (1997). "Tree-Adjoining Grammars" (PDF). Handbook of Formal Languages. Springer. 3: 69–124. doi:10.1007/978-3-642-59126-6_2. ISBN 978-3-642-63859-6 . Retrieved 2014-02-07.
↑ Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages", Theoretical Computer Science, 104 (2): 235–261, doi: 10.1016/0304-3975(92)90124-X .
↑ Nabil Anton Khabbaz (1972). Generalized context-free languages (Ph.D.). University of Iowa.
↑ Nabil Anton Khabbaz (1974). "A geometric hierarchy of languages". J. Comput. Syst. Sci. 8 (2): 142–157. doi: 10.1016/s0022-0000(74)80052-8 .

Embedded pushdown automaton

Contents

History and applications

Theory

k-order EPDA and the Weir hierarchy

See also

Related Research Articles

References

Further reading