Unrestricted grammar

Last updated April 02, 2024

In automata theory, the class of unrestricted grammars (also called semi-Thue, type-0 or phrase structure grammars) is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted grammar, other than each of their left-hand sides being non-empty.^[1]^: 220 This grammar class can generate arbitrary recursively enumerable languages.

Formal definition

An unrestricted grammar is a formal grammar ${\textstyle G=(N,T,P,S)}$ , where

$N$ is a finite set of nonterminal symbols,
$T$ is a finite set of terminal symbols with $N$ and $T$ disjoint,^{[note 1]}
$P$ is a finite set of production rules of the form $\alpha \to \beta ,$ where $\alpha$ and $\beta$ are strings of symbols in $N\cup T$ and $\alpha$ is not the empty string, and

$S\in N$ is a specially designated start symbol.^[1]^: 220

As the name implies, there are no real restrictions on the types of production rules that unrestricted grammars can have.^{[note 2]}

Equivalence to Turing machines

The unrestricted grammars characterize the recursively enumerable languages. This is the same as saying that for every unrestricted grammar $G$ there exists some Turing machine capable of recognizing $L(G)$ and vice versa. Given an unrestricted grammar, such a Turing machine is simple enough to construct, as a two-tape nondeterministic Turing machine.^[1]^: 221 The first tape contains the input word $w$ to be tested, and the second tape is used by the machine to generate sentential forms from $G$ . The Turing machine then does the following:

Start at the left of the second tape and repeatedly choose to move right or select the current position on the tape.
Nondeterministically choose a production $\beta \to \gamma$ from the productions in $G$ .
If $\beta$ appears at some position on the second tape, replace $\beta$ by $\gamma$ at that point, possibly shifting the symbols on the tape left or right depending on the relative lengths of $\beta$ and $\gamma$ (e.g. if $\beta$ is longer than $\gamma$ , shift the tape symbols left).
Compare the resulting sentential form on tape 2 to the word on tape 1. If they match, then the Turing machine accepts the word. If they don't, the Turing machine will go back to step 1.

It is easy to see that this Turing machine will generate all and only the sentential forms of $G$ on its second tape after the last step is executed an arbitrary number of times, thus the language $L(G)$ must be recursively enumerable.

The reverse construction is also possible. Given some Turing machine, it is possible to create an equivalent unrestricted grammar^[1]^: 222 which even uses only productions with one or more non-terminal symbols on their left-hand sides. Therefore, an arbitrary unrestricted grammar can always be equivalently converted to obey the latter form, by converting it to a Turing machine and back again. Some authors^{[ citation needed ]} use the latter form as definition of unrestricted grammar.

Computational properties

The decision problem of whether a given string $s$ can be generated by a given unrestricted grammar is equivalent to the problem of whether it can be accepted by the Turing machine equivalent to the grammar. The latter problem is called the halting problem and is undecidable.

Recursively enumerable languages are closed under Kleene star, concatenation, union, and intersection, but not under set difference; see Recursively enumerable language#Closure properties.

The equivalence of unrestricted grammars to Turing machines implies the existence of a universal unrestricted grammar, a grammar capable of accepting any other unrestricted grammar's language given a description of the language. For this reason, it is theoretically possible to build a programming language based on unrestricted grammars (e.g. Thue).

Notes

↑ Actually, $T\cap N=\emptyset$ is not strictly necessary since unrestricted grammars make no real distinction between the two. The designation exists purely so that one knows when to stop generating sentential forms of the grammar; more precisely, the language $L(G)$ recognized by $G$ is restricted to strings of terminal symbols.
↑ While Hopcroft and Ullman (1979) do not mention the cardinalities of $N$ , $T$ , $P$ explicitly, the proof of their Theorem 9.3 (construction of an equivalent Turing machine from a given unrestricted grammar, p.221, cf. Section #Equivalence to Turing machines) tacitly requires finiteness of $P$ and finite lengths of all strings in rules of $P$ . Any member of $N$ or $T$ that does not occur in $P$ can be omitted without affecting the generated language.

Related Research Articles

<span class="mw-page-title-main">Chomsky hierarchy</span> Hierarchy of classes of formal grammars

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. Linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algorithm.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, $can be recognized as a sum because it can be broken into, also a sum, and, a suitable suffix.$

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation $between fixed strings over the alphabet, called rewrite rules, denoted by, an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is, where,,, and are strings.$

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In logic, finite model theory, and computability theory, Trakhtenbrot's theorem states that the problem of validity in first-order logic on the class of all finite models is undecidable. In fact, the class of valid sentences over finite models is not recursively enumerable.

In the mathematical discipline of set theory, there are many ways of describing specific countable ordinals. The smallest ones can be usefully and non-circularly expressed in terms of their Cantor normal forms. Beyond that, many ordinals of relevance to proof theory still have computable ordinal notations. However, it is not possible to decide effectively whether a given putative ordinal notation is a notation or not ; various more-concrete ways of defining ordinals that definitely have notations are available.

In formal languages, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined as part of a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

A queue machine, queue automaton, or pullup automaton (PUA) is a finite state machine with the ability to store and retrieve data from an infinite-memory queue. It is a model of computation equivalent to a Turing machine, and therefore it can process the same class of formal languages.

A read-only Turing machine or two-way deterministic finite-state automaton (2DFA) is class of models of computability that behave like a standard Turing machine and can move in both directions across input, except cannot write to its input tape. The machine in its bare form is equivalent to a deterministic finite automaton in computational power, and therefore can only parse a regular language.

A formal grammar describes which strings from an alphabet of a formal language are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

A Multitrack Turing machine is a specific type of multi-tape Turing machine.

An enumerator is a Turing machine with an attached printer. The Turing machine can use that printer as an output device to print strings. Every time the Turing machine wants to add a string to the list, it sends the string to the printer. Enumerator is a type of Turing machine variant and is equivalent with Turing machine.

References

1 2 3 4 Hopcroft, John; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation (1st ed.). Addison-Wesley. ISBN 0-201-44124-1.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[2] Actually, $T\cap N=\emptyset$ is not strictly necessary since unrestricted grammars make no real distinction between the two. The designation exists purely so that one knows when to stop generating sentential forms of the grammar; more precisely, the language $L(G)$ recognized by $G$ is restricted to strings of terminal symbols.

[3] While Hopcroft and Ullman (1979) do not mention the cardinalities of $N$ , $T$ , $P$ explicitly, the proof of their Theorem 9.3 (construction of an equivalent Turing machine from a given unrestricted grammar, p.221, cf. Section #Equivalence to Turing machines) tacitly requires finiteness of $P$ and finite lengths of all strings in rules of $P$ . Any member of $N$ or $T$ that does not occur in $P$ can be omitted without affecting the generated language.

[Hopcroft.Ullman.1979-1] 1 2 3 4 Hopcroft, John; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation (1st ed.). Addison-Wesley. ISBN 0-201-44124-1.

[1]

[note 1]

[note 2]