Production (computer science)

Last updated November 06, 2023

A production or production rule in computer science is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences. A finite set of productions $P$ is the main component in the specification of a formal grammar (specifically a generative grammar). The other components are a finite set $N$ of nonterminal symbols, a finite set (known as an alphabet) $\Sigma$ of terminal symbols that is disjoint from $N$ and a distinguished symbol $S\in N$ that is the start symbol.

In an unrestricted grammar, a production is of the form $u\to v$ , where $u$ and $v$ are arbitrary strings of terminals and nonterminals, and $u$ may not be the empty string. If $v$ is the empty string, this is denoted by the symbol $\epsilon$ , or $\lambda$ (rather than leave the right-hand side blank). So productions are members of the cartesian product

V^{*}NV^{*}\times V^{*}=(V^{*}\setminus \Sigma ^{*})\times V^{*}

,

where $V:=N\cup \Sigma$ is the vocabulary, ${}^{*}$ is the Kleene star operator, $V^{*}NV^{*}$ indicates concatenation, $\cup$ denotes set union, and $\setminus$ denotes set minus or set difference. If we do not allow the start symbol to occur in $v$ (the word on the right side), we have to replace $V^{*}$ by $(V\setminus \{S\})^{*}$ on the right side of the cartesian product symbol.^[1]

The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form:

N\to (N\cup \Sigma )^{*}

Grammar generation

To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when we obtain a string containing only terminals. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous.

For example, assume the alphabet consists of $a$ and $b$ , with the start symbol $S$ , and we have the following rules:

1.

S\rightarrow aSb

2.

S\rightarrow ba

then we start with $S$ , and can choose a rule to apply to it. If we choose rule 1, we replace $S$ with $aSb$ and obtain the string $aSb$ . If we choose rule 1 again, we replace $S$ with $aSb$ and obtain the string $aaSbb$ . This process is repeated until we only have symbols from the alphabet (i.e., $a$ and $b$ ). If we now choose rule 2, we replace $S$ with $ba$ and obtain the string $aababb$ , and are done. We can write this series of choices more briefly, using symbols: $S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb$ . The language of the grammar is the set of all the strings that can be generated using this process: $\{ba,abab,aababb,aaababbb,\dotsc \}$ .

Related Research Articles

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

In the theory of computation, a branch of theoretical computer science, a pushdown automaton (PDA) is a type of automaton that employs a stack.

In computer science, an LL parser is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In theoretical computer science and formal language theory, a prefix grammar is a type of string rewriting system, consisting of a set of string rewriting rules, and similar to a formal grammar or a semi-Thue system. What is specific about prefix grammars is not the shape of their rules, but the way in which they are applied: only prefixes are rewritten. The prefix grammars describe exactly all regular languages.

Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by Yehoshua Bar-Hillel and Joachim Lambek. It saw a surge of interest in the 1970s following the work of Richard Montague, whose Montague grammar assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.

A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines relations between sets of strings.

In the theory of formal languages of computer science, mathematics, and linguistics, a Dyck word is a balanced string of brackets. The set of Dyck words forms a Dyck language. The simplest, D1, use just two matching brackets, e.g..

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation $between fixed strings over the alphabet, called rewrite rules, denoted by, an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is, where,,, and are strings.$

Conjunctive grammars are a class of formal grammars studied in formal language theory. They extend the basic type of grammars, the context-free grammars, with a conjunction operation. Besides explicit conjunction, conjunctive grammars allow implicit disjunction represented by multiple rules for a single nonterminal symbol, which is the only logical connective expressible in context-free grammars. Conjunction can be used, in particular, to specify intersection of languages. A further extension of conjunctive grammars known as Boolean grammars additionally allows explicit negation.

In formal language theory, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible symbols/glyphs, typically thought of as representing letters, characters, digits, phonemes, or even words. Alphabets in this technical sense of a set are used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any cardinality ("size") and depending on its purpose maybe be finite, countable, or even uncountable.

In formal languages, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined as part of a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.

An embedded pushdown automaton or EPDA is a computational model for parsing languages generated by tree-adjoining grammars (TAGs). It is similar to the context-free grammar-parsing pushdown automaton, but instead of using a plain stack to store symbols, it has a stack of iterated stacks that store symbols, giving TAGs a generative capacity between context-free and context-sensitive grammars, or a subset of mildly context-sensitive grammars. Embedded pushdown automata should not be confused with nested stack automata which have more computational power.

A formal grammar describes how to form strings from an alphabet of a formal language that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

Controlled grammars are a class of grammars that extend, usually, the context-free grammars with additional controls on the derivations of a sentence in the language. A number of different kinds of controlled grammars exist, the four main divisions being Indexed grammars, grammars with prescribed derivation sequences, grammars with contextual conditions on rule application, and grammars with parallelism in rule application. Because indexed grammars are so well established in the field, this article will address only the latter three kinds of controlled grammars.

Parikh's theorem in theoretical computer science says that if one looks only at the number of occurrences of each terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular language. It is useful for deciding that strings with a given number of terminals are not accepted by a context-free grammar. It was first proved by Rohit Parikh in 1961 and republished in 1966.

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

References

↑ See Klaus Reinhardt: Prioritatszahlerautomaten und die Synchronisation von Halbspursprachen Archived 2018-01-17 at the Wayback Machine ; Fakultät Informatik der Universität Stuttgart; 1994 (German)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[KR1994-1] See Klaus Reinhardt: Prioritatszahlerautomaten und die Synchronisation von Halbspursprachen Archived 2018-01-17 at the Wayback Machine ; Fakultät Informatik der Universität Stuttgart; 1994 (German)

[1]

Production (computer science)

Contents

Grammar generation

See also

Related Research Articles

References