Nested word

Last updated November 14, 2024

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.^[1]

Formal definition
Encoding nested words into ordinary words
Example
Automata
Nested word automaton
Visibly pushdown automaton
Nondeterministic visibly pushdown automata
Decision problems
Languages
Closure properties
Relation to other language classes
Other models of description
Visibly pushdown grammars
Uniform Boolean circuits
Logical description
See also
Notes
References
External links

Formal definition

To define nested words, first define matching relations. For a nonnegative integer $\ell$ , the notation $[\ell ]$ denotes the set $\{1,2,\ldots ,\ell -1,\ell \}$ , with the special case $[0]=\emptyset$ .

A matching relation ↝ of length $\ell \geq 0$ is a subset of $\{-\infty ,1,2,\ldots ,\ell -1,\ell \}\times \{1,2,\ldots ,\ell -1,\ell ,\infty \}$ such that:

all nesting edges are forward, that is, if $i ↝ j$ then $i < j$ ;
nesting edges never have a finite position in common, that is, for $-\infty < i < \infty$ , there is at most one position h such that $h ↝ i$ , and there is at most one position j such that i ↝ j; and
nesting edges never cross, that is, there are no $i < i' \leq j < j'$ such that both $i ↝ j$ and $i' ↝ j'$ .

A position i is referred to as

a call position, if i ↝ j for some j,
a pending call if i ↝ ∞,
a return position, if h ↝ i for some h,
a pending return if −∞ ↝ i, and
an internal position in all remaining cases.

A nested word of length $\ell$ over an alphabet Σ is a pair (w,↝), where w is a word, or string, of length $\ell$ over Σ and ↝ is a matching relation of length $\ell$ .

Encoding nested words into ordinary words

Nested words over the alphabet $\Sigma =\{a_{1},a_{2},\ldots ,a_{n}\}$ can be encoded into "ordinary" words over the tagged alphabet ${\hat {\Sigma }}$ , in which each symbol a from Σ has three tagged counterparts: the symbol ⟨a for encoding a call position in a nested word labelled with a, the symbol a⟩ for encoding a return position labelled with a, and finally the symbol a itself for representing an internal position labelled with a. More precisely, let φ be the function mapping nested words over Σ to words over ${\hat {\Sigma }}$ such that each nested word ( $w_{1}w_{2}\cdots w_{\ell }$ ,↝) is mapped to the word $x_{1}x_{2}...x_{\ell }$ , where the letter $x_{i}$ equals ⟨a, a, and a⟩, if $w_{i}=a$ and i is a (possibly pending) call position, an internal position, and a (possibly pending) return position, respectively.

Example

For illustration, let $n = (w,↝)$ be the nested word over a ternary alphabet with $w = abaabccca$ and matching relation $↝ = {(-\infty,1),(2,\infty),(3,4),(5,7),(8,\infty)$ }. Then its encoding as word reads as $φ (n) = a ⟩⟨ b ⟨ aa ⟩⟨ bcc ⟩⟨ ca$ .

Automata

Nested word automaton

A nested word automaton has a finite number of states, and operates in almost the same way as a deterministic finite automaton on classical strings: a classical finite automaton reads the input word $w=w_{1}\cdots w_{\ell }$ from left to right, and the state of the automaton after reading the jth letter $w_{j}$ depends on the state in which the automaton was before reading $w_{j}$ .

In a nested word automaton, the position $j$ in the nested word (w,↝) might be a return position; if so, the state after reading $w_{j}$ will not only depend on the linear state in which the automaton was before reading $w_{j}$ , but also on a hierarchical state propagated by the automaton at the time it was in the corresponding call position. In analogy to regular languages of words, a set L of nested words is called regular if it is accepted by some (finite-state) nested word automaton.

Visibly pushdown automaton

Nested word automata are an automaton model accepting nested words. There is an equivalent automaton model operating on (ordinary) words. Namely, the notion of a deterministic visibly pushdown automaton is a restriction of the notion of a deterministic pushdown automaton.

Following Alur and Madhusudan,^[2] a deterministic visibly pushdown automaton is formally defined as a 6-tuple $M=(Q,{\hat {\Sigma }},\Gamma ,\delta ,q_{0},F)$ where

$Q$ is a finite set of states,
${\hat {\Sigma }}$ is the input alphabet, which – in contrast to that of ordinary pushdown automata – is partitioned into three sets $\Sigma _{\text{c}}$ , $\Sigma _{\text{r}}$ , and $\Sigma _{\text{int}}$ . The alphabet $\Sigma _{\text{c}}$ denotes the set of call symbols, $\Sigma _{\text{r}}$ contains the return symbols, and the set $\Sigma _{\text{int}}$ contains the internal symbols,
$\Gamma$ is a finite set which is called the stack alphabet, containing a special symbol $\bot \in \Gamma$ denoting the empty stack,
$\delta =\delta _{\text{c}}\cup \delta _{\text{r}}\cup \delta _{\text{int}}$ $Nested word$ is the transition function, which is partitioned into three parts corresponding to call transitions, return transitions, and internal transitions, namely
- $\delta _{\text{c}}\colon Q\times \Sigma _{\text{c}}\to Q\times \Gamma$ , the call transition function
- $\delta _{\text{r}}\colon Q\times \Sigma _{\text{r}}\times \Gamma \to Q$ , the return transition function
- $\delta _{\text{int}}:Q\times \Sigma _{\text{int}}\to Q$ , the internal transition function,
$q_{0}\in \,Q$ is the initial state, and
$F\subseteq Q$ is the set of accepting states.

The notion of computation of a visibly pushdown automaton is a restriction of the one used for pushdown automata. Visibly pushdown automata only add a symbol to the stack when reading a call symbol $a_{\text{c}}\in \Sigma _{\text{c}}$ , they only remove the top element from the stack when reading a return symbol $a_{\text{r}}\in \Sigma _{\text{r}}$ and they do not alter the stack when reading an internal event $a_{\text{i}}\in \Sigma _{\text{int}}$ . A computation ending in an accepting state is an accepting computation.

As a result, a visibly pushdown automaton cannot push to and pop from the stack with the same input symbol. Thus the language $L=\{a^{n}ba^{n}\mid n\in \mathrm {N} \}$ cannot be accepted by a visibly pushdown automaton for any partition of $\Sigma$ , however there are pushdown automata accepting this language.

If a language $L$ over a tagged alphabet ${\hat {\Sigma }}$ is accepted by a deterministic visibly pushdown automaton, then $L$ is called a visibly pushdown language.

Nondeterministic visibly pushdown automata

Nondeterministic visibly pushdown automata are as expressive as deterministic ones. Hence one can transform a nondeterministic visibly pushdown automaton into a deterministic one, but if the nondeterministic automaton had $s$ states, the deterministic one may have up to $2^{s^{2}}$ states.^[3]

Decision problems

Let $|A|$ be the size of the description of an automaton $A$ , then it is possible to check if a word n is accepted by the automaton in time $O(|A|^{3}\ell )$ . In particular, the emptiness problem is solvable in time $O(|A|^{3})$ . If $A$ is fixed, it is decidable in time $O(\ell )$ and space $O(d)$ where $d$ is the depth of n in a streaming seeing. It is also decidable with space $O(\log(\ell ))$ and time $O(\ell ^{2}\log(\ell ))$ , and by a uniform Boolean circuit of depth $O(\log \ell )$ .^[2]

For two nondeterministic automata A and B, deciding whether the set of words accepted by A is a subset of the word accepted by B is EXPTIME-complete. It is also EXPTIME-complete to figure out if there is a word that is not accepted.^[2]

Languages

As the definition of visibly pushdown automata shows, deterministic visibly pushdown automata can be seen as a special case of deterministic pushdown automata; thus the set VPL of visibly pushdown languages over $\,{\hat {\Sigma }}$ forms a subset of the set DCFL of deterministic context-free languages over the set of symbols in $\,{\hat {\Sigma }}$ . In particular, the function that removes the matching relation from nested words transforms regular languages over nested words into context-free languages.

Closure properties

The set of visibly pushdown languages is closed under the following operations:^[3]^[2]

set operations:
- union
- intersection
- complement,

thus giving rise to a Boolean algebra.

For the intersection operation, one can construct a VPA M simulating two given VPAs $M_{1}$ and $M_{2}$ by a simple product construction ( Alur & Madhusudan 2004 ): For $i=1,2$ , assume $M_{i}$ is given as $(Q_{i},\ {\hat {\Sigma }},\ \Gamma _{i},\ \delta _{i},\ s_{i},\ Z_{i},\ F_{i})$ . Then for the automaton M, the set of states is $\,Q_{1}\times Q_{2}$ , the initial state is $\left(s_{1},s_{2}\right)$ , the set of final states is $F_{1}\times F_{2}$ , the stack alphabet is given by $\,\Gamma _{1}\times \Gamma _{2}$ , and the initial stack symbol is $(Z_{1},Z_{2})$ .

If $M$ is in state $(p_{1},p_{2})$ on reading a call symbol $\left\langle a\right.$ , then $M$ pushes the stack symbol $(\gamma _{1},\gamma _{2})$ and goes to state $(q_{1},q_{2})$ , where $\gamma _{i}$ is the stack symbol pushed by $M_{i}$ when transitioning from state $p_{i}$ to $q_{i}$ on reading input $\left\langle a\right.$ .

If $M$ is in state $(p_{1},p_{2})$ on reading an internal symbol $a$ , then $M$ goes to state $(q_{1},q_{2})$ , whenever $M_{i}$ transitions from state $p_{i}$ to $q_{i}$ on reading a.

If $M$ is in state $(p_{1},p_{2})$ on reading a return symbol $\left.a\right\rangle$ , then $M$ pops the symbol $(\gamma _{1},\gamma _{2})$ from the stack and goes to state $(q_{1},q_{2})$ , where $\gamma _{i}$ is the stack symbol popped by $M_{i}$ when transitioning from state $p_{i}$ to $q_{i}$ on reading $\left.a\right\rangle$ .

Correctness of the above construction crucially relies on the fact that the push and pop actions of the simulated machines $M_{1}$ and $M_{2}$ are synchronized along the input symbols read. In fact, a similar simulation is no longer possible for deterministic pushdown automata, as the larger class of deterministic context-free languages is no longer closed under intersection.

In contrast to the construction for concatenation shown above, the complementation construction for visibly pushdown automata parallels the standard construction^[4] for deterministic pushdown automata.

Moreover, like the class of context free languages the class of visibly pushdown languages is closed under prefix closure and reversal, hence also suffix closure.

Relation to other language classes

Alur & Madhusudan (2004) point out that the visibly pushdown languages are more general than the parenthesis languages suggested in McNaughton (1967). As shown by Crespi Reghizzi & Mandrioli (2012), the visibly pushdown languages in turn are strictly contained in the class of languages described by operator-precedence grammars, which were introduced by Floyd (1963) and enjoy the same closure properties and characteristics (see Lonati et al. (2015) for ω languages and logic and automata-based characterizations). In comparison to conjunctive grammars, a generalization of context-free grammars, Okhotin (2011) shows that the linear conjunctive languages form a superclass of the visibly pushdown languages. The table at the end of this article puts the family of visibly pushdown languages in relation to other language families in the Chomsky hierarchy. Rajeev Alur and Parthasarathy Madhusudan^[5]^[6] related a subclass of regular binary tree languages to visibly pushdown languages.

Other models of description

Visibly pushdown grammars

Visibly pushdown languages are exactly the languages that can be described by visibly pushdown grammars.^[2]

Visibly pushdown grammars can be defined as a restriction of context-free grammars. A visibly pushdown grammar G is defined by the 4-tuple:

$G=(V=V^{0}\cup V^{1}\,,\Sigma \,,R\,,S\,)$ where

$V^{0}\,$ and $V^{1}\,$ are disjoint finite sets; each element $v\in V$ is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Each variable defines a sub-language of the language defined by $G\,$ , and the sub-languages of $V^{0}\,$ are the one without pending calls or pending returns.
$\Sigma \,$ is a finite set of terminals, disjoint from $V\,$ , which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar $G\,$ .
$R\,$ $Nested word$ is a finite relation from $V\,$ $Nested word$ to $(V\cup \Sigma )^{*}$ $Nested word$ such that $\exists \,w\in (V\cup \Sigma )^{*}:(S,w)\in R$ $Nested word$ . The members of $R\,$ $Nested word$ are called the (rewrite) rules or productions of the grammar. There are three kinds of rewrite rules. For $X,Y\in V,Z\in V^{0}$ $Nested word$ , $a\in {\hat {\Sigma }}$ $Nested word$ and $b\in {\hat {\Sigma }}$ $Nested word$
- $X\to \epsilon$
- $X\to aY$ and if $X\in V^{0}$ then $Y\in V^{0}$ and $a\in \Sigma$
- $X\to \langle aZb\rangle Y$ and if $X\in V^{0}$ then $Y\in V^{0}$
$S\in V\,$ is the start variable (or start symbol), used to represent the whole sentence (or program).

Here, the asterisk represents the Kleene star operation and $\epsilon$ is the empty word.

Uniform Boolean circuits

The problem whether a word of length $\ell$ is accepted by a given nested word automaton can be solved by uniform Boolean circuits of depth $\mathrm {O} (\log \ell )$ .^[2]

Logical description

Regular languages over nested words are exactly the set of languages described by monadic second-order logic with two unary predicates call and return, linear successor and the matching relation ↝.^[2]

Notes

↑ Google Scholar search results for "nested words" OR "visibly pushdown"
1 2 3 4 5 6 7 Alur & Madhusudan (2009)
1 2 Alur & Madhusudan (2004)
↑ Hopcroft & Ullman (1979 , p. 238 f).
↑ Alur, R.; Madhusudan, P. (2004). "Visibly pushdown languages" (PDF). Proceedings of the thirty-sixth annual ACM symposium on Theory of computing - STOC '04. pp. 202–211. doi:10.1145/1007352.1007390. ISBN 978-1581138528. S2CID 7473479. Sect.4, Theorem 5,
↑ Alur, R.; Madhusudan, P. (2009). "Adding nesting structure to words" (PDF). Journal of the ACM. 56 (3): 1–43. CiteSeerX 10.1.1.145.9971 . doi:10.1145/1516512.1516518. S2CID 768006. Sect.7

Related Research Articles

A finite-state machine (FSM) or finite-state automaton, finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the inputs that trigger each transition. Finite-state machines are of two types—deterministic finite-state machines and non-deterministic finite-state machines. For any non-deterministic finite-state machine, an equivalent deterministic one can be constructed.

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to mathematical logic. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a finite automaton (FA) or finite-state machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machine should move to from its current state when it reads the next input character. Some states are accepting states and one state is the start state. The machine accepts an input if and only if it will pass through an accepting state infinitely many times as it reads the input.

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines a relation between sets of strings.

In automata theory, a deterministic pushdown automaton is a variation of the pushdown automaton. The class of deterministic pushdown automata accepts the deterministic context-free languages, a proper subset of context-free languages.

In computer science, in particular in automata theory, a two-way finite automaton is a finite automaton that is allowed to re-read its input.

<span class="mw-page-title-main">Nested stack automaton</span>

In automata theory, a nested stack automaton is a finite automaton that can make use of a stack containing data which can be additional stacks. Like a stack automaton, a nested stack automaton may step up or down in the stack, and read the current symbol; in addition, it may at any place create a new stack, operate on that one, eventually destroy it, and continue operating on the old stack. This way, stacks can be nested recursively to an arbitrary depth; however, the automaton always operates on the innermost stack only.

A queue machine, queue automaton, or pullup automaton (PUA) is a finite state machine with the ability to store and retrieve data from an infinite-memory queue. Its design is similar to a pushdown automaton but differs by replacing the stack with this queue. A queue machine is a model of computation equivalent to a Turing machine, and therefore it can process the same class of formal languages.

A read-only Turing machine or two-way deterministic finite-state automaton (2DFA) is class of models of computability that behave like a standard Turing machine and can move in both directions across input, except cannot write to its input tape. The machine in its bare form is equivalent to a deterministic finite automaton in computational power, and therefore can only parse a regular language.

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and context-sensitive grammars, or a subset of mildly context-sensitive grammars. Embedded pushdown automata should not be confused with nested stack automata which have more computational power.

An abstract family of acceptors (AFA) is a grouping of generalized acceptors. Informally, an acceptor is a device with a finite state control, a finite number of input symbols, and an internal store with a read and write function. Each acceptor has a start state and a set of accepting states. The device reads a sequence of symbols, transitioning from state to state for each input symbol. If the device ends in an accepting state, the device is said to accept the sequence of symbols. A family of acceptors is a set of acceptors with the same type of internal store. The study of AFA is part of AFL (abstract families of languages) theory.

In automata theory, a timed automaton is a finite automaton extended with a finite set of real-valued clocks. During a run of a timed automaton, clock values increase all with the same speed. Along the transitions of the automaton, clock values can be compared to integers. These comparisons form guards that may enable or disable transitions and by doing so constrain the possible behaviors of the automaton. Further, clocks can be reset. Timed automata are a sub-class of a type hybrid automata.

In theoretical computer science and formal language theory, a weighted automaton or weighted finite-state machine is a generalization of a finite-state machine in which the edges have weights, for example real numbers or integers. Finite-state machines are only capable of answering decision problems; they take as input a string and produce a Boolean output, i.e. either "accept" or "reject". In contrast, weighted automata produce a quantitative output, for example a count of how many answers are possible on a given input string, or a probability of how likely the input string is according to a probability distribution. They are one of the simplest studied models of quantitative automata.

In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix automaton of a string $is the smallest directed acyclic graph with a dedicated initial vertex and a set of "final" vertices, such that paths from the initial vertex to final vertices represent the suffixes of the string.$

A tree stack automaton is a formalism considered in automata theory. It is a finite state automaton with the additional ability to manipulate a tree-shaped stack. It is an automaton with storage whose storage roughly resembles the configurations of a thread automaton. A restricted class of tree stack automata recognises exactly the languages generated by multiple context-free grammars.

In automata theory, a field of computer science, a signal automaton is a finite automaton extended with a finite set of real-valued clocks. During a run of a signal automaton, clock values increase all with the same speed. Along the transitions of the automaton, clock values can be compared to integers. These comparisons form guards that may enable or disable transitions and by doing so constrain the possible behaviors of the automaton. Further, clocks can be reset.

An alternating timed automaton (ATA) is a modeling formalism that combines features of timed automaton and an alternating finite automaton to succinctly express sets of timed event sequences. Classical timed automata only allow existential nondeterministic branching in their transitions, while alternating finite automata model discrete untimed behaviors. Unlike timed automata, alternating timed automata are closed under complementation. However, this increased expressive power comes at the cost of undecidability in their emptiness problem. A one clock alternating timed automaton (OCATA) is a restricted version of an ATA, limited to the use of a single clock. OCATAs can express timed languages that cannot be expressed using standard timed automata.

References

Floyd, R. W. (July 1963). "Syntactic Analysis and Operator Precedence". Journal of the ACM. 10 (3): 316–333. doi: 10.1145/321172.321179 . S2CID 19785090.
McNaughton, R. (1967). "Parenthesis Grammars". Journal of the ACM. 14 (3): 490–500. doi: 10.1145/321406.321411 . S2CID 10926200.
Alur, R.; Arenas, M.; Barcelo, P.; Etessami, K.; Immerman, N.; Libkin, L. (2008). Grädel, Erich (ed.). "First-Order and Temporal Logics for Nested Words". Logical Methods in Computer Science. 4 (4). arXiv: 0811.0537 . doi:10.2168/LMCS-4(4:11)2008. S2CID 220091601.
Crespi Reghizzi, Stefano; Mandrioli, Dino (2012). "Operator precedence and the visibly pushdown property". Journal of Computer and System Sciences. 78 (6): 1837–1867. doi: 10.1016/j.jcss.2011.12.006 .
Lonati, Violetta; Mandrioli, Dino; Panella, Federica; Pradella, Matteo (2015). "Operator Precedence Languages: Their Automata-Theoretic and Logic Characterization". SIAM Journal on Computing. 44 (4): 1026–1088. doi:10.1137/140978818. hdl: 2434/352809 .
Okhotin, Alexander: Comparing linear conjunctive languages to subfamilies of the context-free languages, 37th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2011).
Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 978-0-201-02988-8.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Google Scholar search results for "nested words" OR "visibly pushdown"

[AlurMadhu09-2] 1 2 3 4 5 6 7 Alur & Madhusudan (2009)

[AlurMadhu04-3] 1 2 Alur & Madhusudan (2004)

[4] Hopcroft & Ullman (1979 , p. 238 f).

[Alur2004-5] Alur, R.; Madhusudan, P. (2004). "Visibly pushdown languages" (PDF). Proceedings of the thirty-sixth annual ACM symposium on Theory of computing - STOC '04. pp. 202–211. doi:10.1145/1007352.1007390. ISBN 978-1581138528. S2CID 7473479. Sect.4, Theorem 5,

[Alur2009-6] Alur, R.; Madhusudan, P. (2009). "Adding nesting structure to words" (PDF). Journal of the ACM. 56 (3): 1–43. CiteSeerX 10.1.1.145.9971 . doi:10.1145/1516512.1516518. S2CID 768006. Sect.7

[1]

[2]

[3]

[4]

[5]

[6]