Tree automaton

Last updated

A tree automaton is a type of state machine. Tree automata deal with tree structures, rather than the strings of more conventional state machines.

Contents

The following article deals with branching tree automata, which correspond to regular languages of trees.

As with classical automata, finite tree automata (FTA) can be either a deterministic automaton or not. According to how the automaton processes the input tree, finite tree automata can be of two types: (a) bottom up, (b) top down. This is an important issue, as although non-deterministic (ND) top-down and ND bottom-up tree automata are equivalent in expressive power, deterministic top-down automata are strictly less powerful than their deterministic bottom-up counterparts, because tree properties specified by deterministic top-down tree automata can only depend on path properties. (Deterministic bottom-up tree automata are as powerful as ND tree automata.)

Definitions

A bottom-up finite tree automaton over F is defined as a tuple (Q, F, Qf, Δ), where Q is a set of states, F is a ranked alphabet (i.e., an alphabet whose symbols have an associated arity), QfQ is a set of final states, and Δ is a set of transition rules of the form f(q1(x1),...,qn(xn)) → q(f(x1,...,xn)), for an n-ary fF, q, qiQ, and xi variables denoting subtrees. That is, members of Δ are rewrite rules from nodes whose childs' roots are states, to nodes whose roots are states. Thus the state of a node is deduced from the states of its children.

For n=0, that is, for a constant symbol f, the above transition rule definition reads f() → q(f()); often the empty parentheses are omitted for convenience: fq(f). Since these transition rules for constant symbols (leaves) do not require a state, no explicitly defined initial states are needed. A bottom-up tree automaton is run on a ground term over F, starting at all its leaves simultaneously and moving upwards, associating a run state from Q with each subterm. The term is accepted if its root is associated to an accepting state from Qf. [1]

A top-down finite tree automaton over F is defined as a tuple (Q, F, Qi, Δ), with two differences with bottom-up tree automata. First, QiQ, the set of its initial states, replaces Qf; second, its transition rules are oriented conversely: q(f(x1,...,xn)) → f(q1(x1),...,qn(xn)), for an n-ary fF, q, qiQ, and xi variables denoting subtrees. That is, members of Δ are here rewrite rules from nodes whose roots are states to nodes whose children's roots are states. A top-down automaton starts in some of its initial states at the root and moves downward along branches of the tree, associating along a run a state with each subterm inductively. A tree is accepted if every branch can be gone through this way. [2]

A tree automaton is called deterministic (abbreviated DFTA) if no two rules from Δ have the same left hand side; otherwise it is called nondeterministic (NFTA). [3] Non-deterministic top-down tree automata have the same expressive power as non-deterministic bottom-up ones; [4] the transition rules are simply reversed, and the final states become the initial states.

In contrast, deterministic top-down tree automata [5] are less powerful than their bottom-up counterparts, because in a deterministic tree automaton no two transition rules have the same left-hand side. For tree automata, transition rules are rewrite rules; and for top-down ones, the left-hand side will be parent nodes. Consequently, a deterministic top-down tree automaton will only be able to test for tree properties that are true in all branches, because the choice of the state to write into each child branch is determined at the parent node, without knowing the child branches contents.

Infinite-tree automata extend top-down automata to infinite trees, and can be used to prove decidability of S2S, the monadic second-order theory with two successors. Finite tree automata (nondeterministic if top-down) suffice for WS2S. [6]

Examples

Bottom-up automaton accepting boolean lists

Employing coloring to distinguish members of F and Q, and using the ranked alphabet F={ false,true,nil,cons(.,.) }, with cons having arity 2 and all other symbols having arity 0, a bottom-up tree automaton accepting the set of all finite lists of boolean values can be defined as (Q, F, Qf, Δ) with Q={ Bool,BList }, Qf={ BList }, and Δ consisting of the rules

falseBool(false)(1),
trueBool(true)(2),
nilBList(nil)(3), and
cons(Bool(x1),BList(x2))BList(cons(x1,x2))    (4).

In this example, the rules can be understood intuitively as assigning to each term its type in a bottom-up manner; e.g. rule (4) can be read as "A term cons(x1,x2) has type BList, provided x1 and x2 has type Bool and BList, respectively". An accepting example run is

cons(false,cons(true,nil))
cons(false,cons(true,BList(nil)))by (3)
cons(false,cons(Bool(true),BList(nil)))by (2)
cons(false,BList(cons(true,nil)))by (4)
cons(Bool(false),BList(cons(true,nil)))by (1)
BList(cons(false,cons(true,nil)))    by (4), accepted.

Cf. the derivation of the same term from a regular tree grammar corresponding to the automaton, shown at Regular tree grammar#Examples.

A rejecting example run is

cons(false,true)
cons(false,Bool(true))by (1)
cons(Bool(false),Bool(true))    by (2), no further rule applicable.

Intuitively, this corresponds to the term cons(false,true) not being well-typed.

Top-down automaton accepting multiples of 3 in binary notation

(A)(B)(C)(D)
String
grammar

rules
String
automaton

transitions
Tree
automaton
transitions
Tree
grammar

rules
0
1
2
3
4
5
6
S0ε
S00S0
S01S1
S10S2
S11S0
S20S1
S21S2
 
δ(S0,0)= S0
δ(S0,1)= S1
δ(S1,0)= S2
δ(S1,1)= S0
δ(S2,0)= S1
δ(S2,1)= S2
S0(nil)nil
S0(0(x))0(S0(x))
S0(1(x))1(S1(x))
S1(0(x))0(S2(x))
S1(1(x))1(S0(x))
S2(0(x))0(S1(x))
S2(1(x))1(S2(x))
S0nil
S00(S0)
S01(S1)
S10(S2)
S11(S0)
S20(S1)
S21(S2)
Deterministic finite (string) automaton accepting multiples of 3 in binary notation DFA example multiplies of 3.svg
Deterministic finite (string) automaton accepting multiples of 3 in binary notation

Using the same colorization as above, this example shows how tree automata generalize ordinary string automata. The finite deterministic string automaton shown in the picture accepts all strings of binary digits that denote a multiple of 3. Using the notions from Deterministic finite automaton#Formal definition, it is defined by:

In the tree automaton setting, the input alphabet is changed such that the symbols 0 and 1 are both unary, and a nullary symbol, say nil is used for tree leaves. For example, the binary string "110" in the string automaton setting corresponds to the term "1(1(0(nil)))" in the tree automaton setting; this way, strings can be generalized to trees, or terms. The top-down finite tree automaton accepting the set of all terms corresponding to multiples of 3 in binary string notation is then defined by:

For example, the tree "1(1(0(nil)))" is accepted by the following tree automaton run:

S0(1(1(0(nil))))
1(S1(1(0(nil))))by 2
1(1(S0(0(nil))))by 4
1(1(0(S0(nil))))by 1
1(1(0(nil)))    by 0

In contrast, the term "1(0(nil))" leads to following non-accepting automaton run:

S0(1(0(nil)))
1(S1(0(nil))))by 2
1(0(S2(nil))))    by 3, no further rule applicable

Since there are no other initial states than S0 to start an automaton run with, the term "1(0(nil))" is not accepted by the tree automaton.

For comparison purposes, the table gives in column (A) and (D) a (right) regular (string) grammar, and a regular tree grammar, respectively, each accepting the same language as its automaton counterpart.

Properties

Recognizability

For a bottom-up automaton, a ground term t (that is, a tree) is accepted if there exists a reduction that starts from t and ends with q(t), where q is a final state. For a top-down automaton, a ground term t is accepted if there exists a reduction that starts from q(t) and ends with t, where q is an initial state.

The tree language L(A) accepted, or recognized, by a tree automaton A is the set of all ground terms accepted by A. A set of ground terms is recognizable if there exists a tree automaton that accepts it.

A linear (that is, arity-preserving) tree homomorphism preserves recognizability. [7]

Completeness and reduction

A non-deterministic finite tree automaton is complete if there is at least one transition rule available for every possible symbol-states combination. A state q is accessible if there exists a ground term t such that there exists a reduction from t to q(t). An NFTA is reduced if all its states are accessible. [8]

Pumping lemma

Every sufficiently large [9] ground term t in a recognizable tree language L can be vertically tripartited [10] such that arbitrary repetition ("pumping") of the middle part keeps the resulting term in L. [11] [12]

For the language of all finite lists of boolean values from the above example, all terms beyond the height limit k=2 can be pumped, since they need to contain an occurrence of cons. For example,

cons(false,cons(true,nil)),
cons(false,cons(false,cons(true,nil))),
cons(false,cons(false,cons(false,cons(true,nil)))), ...

all belong to that language.

Closure

The class of recognizable tree languages is closed under union, under complementation, and under intersection. [13]

Myhill–Nerode theorem

A congruence on the set of all trees over a ranked alphabet F is an equivalence relation such that u1v1 and ... and unvn implies f(u1,...,un) ≡ f(v1,...,vn), for every fF. It is of finite index if its number of equivalence-classes is finite.

For a given tree-language L, a congruence can be defined by uLv if C[u] ∈ LC[v] ∈ L for each context C.

The Myhill–Nerode theorem for tree automata states that the following three statements are equivalent: [14]

  1. L is a recognizable tree language
  2. L is the union of some equivalence classes of a congruence of finite index
  3. the relation ≡L is a congruence of finite index

See also

Notes

  1. Comon et al. 2008, sect. 1.1, p. 20.
  2. Comon et al. 2008, sect. 1.6, p. 38.
  3. Comon et al. 2008, sect. 1.1, p. 23.
  4. Comon et al. 2008, sect. 1.6, theorem 1.6.1, p. 38.
  5. In a strict sense, deterministic top-down automata are not defined by Comon et al. (2008) but they are used there (sect. 1.6, proposition 1.6.2, p. 38). They accept the class of path-closed tree languages (sect. 1.8, exercise 1.6, p. 43-44).
  6. Morawietz, Frank; Cornell, Tom (1997-07-07). "Representing constraints with automata". Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. ACL '98/EACL '98. USA: Association for Computational Linguistics: 468–475. doi:10.3115/976909.979677.
  7. The notion in Comon et al. (2008 , sect. 1.4, theorem 1.4.3, p. 31-32) of tree homomorphism is more general than that of the article "tree homomorphism".
  8. Comon et al. 2008, sect. 1.1, p. 23-24.
  9. Formally: height (t) > k, with k > 0 depending only on L, not on t
  10. Formally: there is a context C[.], a nontrivial context C’[.], and a ground term u such that t = C[C’[u]]. A "context" C[.] is a tree with one hole (or, correspondingly, a term with one occurrence of one variable). A context is called "trivial" if the tree consists only of the hole node (or, correspondingly, if the term is just the variable). The notation C[t] means the result of inserting the tree t into the hole of C[.] (or, correspondingly, instantiating the variable to t). Comon et al. 2008 , p. 17, gives a formal definition.
  11. Formally: C[Cn[u]] ∈ L for all n ≥ 0. The notation Cn[.] means the result of stacking n copies of C[.] one in another, cf. Comon et al. 2008 , p. 17.
  12. Comon et al. 2008, sect. 1.2, p. 29.
  13. Comon et al. 2008, sect. 1.3, theorem 1.3.1, p. 30.
  14. Comon et al. 2008, sect. 1.5, p .36.

Related Research Articles

<span class="mw-page-title-main">Pushdown automaton</span> Type of automaton

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

<span class="mw-page-title-main">Automata theory</span> Study of abstract machines and automata

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science with close connections to mathematical logic. The word automata comes from the Greek word αὐτόματος, which means "self-acting, self-willed, self-moving". An automaton is an abstract self-propelled computing device which follows a predetermined sequence of operations automatically. An automaton with a finite number of states is called a Finite Automaton (FA) or Finite-State Machine (FSM). The figure on the right illustrates a finite-state machine, which is a well-known type of automaton. This automaton consists of states and transitions. As the automaton sees a symbol of input, it makes a transition to another state, according to its transition function, which takes the previous state and current input symbol as its arguments.

<span class="mw-page-title-main">State diagram</span> Diagram of behavior of finite state systems

A state diagram is a type of diagram used in computer science and related fields to describe the behavior of systems. State diagrams require that the system described is composed of a finite number of states; sometimes, this is indeed the case, while at other times this is a reasonable abstraction. Many forms of state diagrams exist, which differ slightly and have different semantics.

<span class="mw-page-title-main">Büchi automaton</span>

In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machine should move to from its current state when it reads the next input character. Some states are accepting states and one state is the start state. The machine accepts an input if and only if it will pass through an accepting state infinitely many times as it reads the input.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if

In automata theory, an alternating finite automaton (AFA) is a nondeterministic finite automaton whose transitions are divided into existential and universal transitions. For example, let A be an alternating automaton.

In theoretical computer science and formal language theory, a regular tree grammar is a formal grammar that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a set of single-path trees.

In automata theory, a Muller automaton is a type of an ω-automaton. The acceptance condition separates a Muller automaton from other ω-automata. The Muller automaton is defined using a Muller acceptance condition, i.e. the set of all states visited infinitely often must be an element of the acceptance set. Both deterministic and non-deterministic Muller automata recognize the ω-regular languages. They are named after David E. Muller, an American mathematician and computer scientist, who invented them in 1963.

In mathematics and computer science, the probabilistic automaton (PA) is a generalization of the nondeterministic finite automaton; it includes the probability of a given transition into the transition function, turning it into a transition matrix. Thus, the probabilistic automaton also generalizes the concepts of a Markov chain and of a subshift of finite type. The languages recognized by probabilistic automata are called stochastic languages; these include the regular languages as a subset. The number of stochastic languages is uncountable.

A tree-walking automaton (TWA) is a type of finite automaton that deals with tree structures rather than strings. The concept was originally proposed by Aho and Ullman.

In computer science, more specifically in automata and formal language theory, nested words are a concept proposed by Alur and Madhusudan as a joint generalization of words, as traditionally used for modelling linearly ordered structures, and of ordered unranked trees, as traditionally used for modelling hierarchical structures. Finite-state acceptors for nested words, so-called nested word automata, then give a more expressive generalization of finite automata on words. The linear encodings of languages accepted by finite nested word automata gives the class of visibly pushdown languages. The latter language class lies properly between the regular languages and the deterministic context-free languages. Since their introduction in 2004, these concepts have triggered much research in that area.

In computer science and mathematical logic, an infinite-tree automaton is a state machine that deals with infinite tree structures. It can be seen as an extension of top-down finite-tree automata to infinite trees or as an extension of infinite-word automata to infinite trees.

In automata theory, a branch of theoretical computer science, an ω-automaton is a variation of a finite automaton that runs on infinite, rather than finite, strings as input. Since ω-automata do not stop, they have a variety of acceptance conditions rather than simply a set of accepting states.

In automata theory, a semi-deterministic Büchi automaton is a special type of Büchi automaton. In such an automaton, the set of states can be partitioned into two subsets: one subset forms a deterministic automaton and also contains all the accepting states.

In automata theory, a generalized Büchi automaton is a variant of a Büchi automaton. The difference with the Büchi automaton is the accepting condition, which is determined by a set of sets of states. A run is accepted by the automaton if it visits at least one state of every set of the accepting condition infinitely often. Generalized Büchi automata are equivalent in expressive power to Büchi automata; a transformation is given here.

In automata theory, a co-Büchi automaton is a variant of Büchi automaton. The only difference is the accepting condition: a Co-Büchi automaton accepts an infinite word if there exists a run, such that all the states occurring infinitely often in the run are in the final state set . In contrast, a Büchi automaton accepts a word if there exists a run, such that at least one state occurring infinitely often in the final state set .

In computer science, Thompson's construction algorithm, also called the McNaughton–Yamada–Thompson algorithm, is a method of transforming a regular expression into an equivalent nondeterministic finite automaton (NFA). This NFA can be used to match strings against the regular expression. This algorithm is credited to Ken Thompson.

In computational learning theory, induction of regular languages refers to the task of learning a formal description of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way, approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction.

In theoretical computer science and formal language theory, a tree transducer (TT) is an abstract machine taking as input a tree, and generating output – generally other trees, but models producing words or other structures exist. Roughly speaking, tree transducers extend tree automata in the same way that word transducers extend word automata.

References

Implementations