Recursive grammar

Last updated

In computer science, a grammar is informally called a recursive grammar if it contains production rules that are recursive, meaning that expanding a non-terminal according to these rules can eventually lead to a string that includes the same non-terminal again. Otherwise it is called a non-recursive grammar. [1]

Computer science Study of the theoretical foundations of information and computation

Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

In formal language theory, a grammar is a set of production rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form.

Recursion (computer science) method in computer science

Recursion in computer science is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

For example, a grammar for a context-free language is left recursive if there exists a non-terminal symbol A that can be put through the production rules to produce a string with A (as the leftmost symbol). [2] [3] All types of grammars in the Chomsky hierarchy can be recursive and it is recursion that allows the production of infinite sets of words. [1]

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, can be recognized as a sum because it can be broken into , also a sum, and , a suitable suffix.

In the formal languages of computer science and linguistics, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described by Noam Chomsky in 1956.

Properties

A non-recursive grammar can produce only a finite language; and each finite language can be produced by a non-recursive grammar. [1] For example, a straight-line grammar produces just a single word.

A straight-line grammar is a formal grammar that generates exactly one string. Consequently, it does not branch nor loop.

A recursive context-free grammar that contains no useless rules necessarily produces an infinite language. This property forms the basis for an algorithm that can test efficiently whether a context-free grammar produces a finite or infinite language. [4]

In theoretical computer science, in particular in the theory of formal languages, useless rules of a formal grammar are those rules of symbol production that are unreachable or unproductive, that is, that can or need never be applied. Simply, they "can be removed from the grammar without affecting the language produced"

Algorithm An unambiguous specification of how to solve a class of problems

In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems. Algorithms can perform calculation, data processing, automated reasoning, and other tasks.

Related Research Articles

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by CSG but not by context-free grammars. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSG are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-free grammar (CFG) is a certain type of formal grammar: a set of production rules that describe all possible strings in a given formal language. Production rules are simple replacements. For example, the rule

Formal language set of strings of symbols that may be constrained by rules that are specific to it

In mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

Theory of computation subfield of computer science

In theoretical computer science and mathematics, the theory of computation is the branch that deals with how efficiently problems can be solved on a model of computation, using an algorithm. The field is divided into three major branches: automata theory and languages, computability theory, and computational complexity theory, which are linked by the question: "What are the fundamental capabilities and limitations of computers?".

In formal language theory, a context-free grammar is in Greibach normal form (GNF) if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables. A non-strict form allows one exception to this format restriction for allowing the empty word to be a member of the described language. The normal form was established by Sheila Greibach and it bears her name.

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures where each such procedure implements one of the nonterminals of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

In computer science, top-down parsing is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science. The computability of a problem is closely linked to the existence of an algorithm to solve the problem.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. Although related to caching, memoization refers to a specific case of this optimization, distinguishing it from forms of caching such as buffering or page replacement. In the context of some logic programming languages, memoization is also known as tabling; see also lookup table.

Top-Down Parsing Language (TDPL) is a type of analytic formal grammar developed by Alexander Birman in the early 1970s in order to study formally the behavior of a common class of practical top-down parsers that support a limited form of backtracking. Birman originally named his formalism the TMG Schema (TS), after TMG, an early parser generator, but the formalism was later given the name TDPL by Aho and Ullman in their classic anthology The Theory of Parsing, Translation and Compiling.

In computer science, a parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

Grammar induction

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. Terminal symbols are the elementary symbols of the language defined by a formal grammar. Nonterminal symbols are replaced by groups of terminal symbols according to the production rules.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

LL grammar

In formal language theory, an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence. A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

References

  1. 1 2 3 Nederhof, Mark-Jan; Satta, Giorgio (2002), "Parsing Non-recursive Context-free Grammars", Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02), Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 112–119, doi:10.3115/1073083.1073104 .
  2. Notes on Formal Language Theory and Parsing, James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.
  3. Moore, Robert C. (2000), "Removing Left Recursion from Context-free Grammars", Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (NAACL 2000), Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 249–255.
  4. Fleck, Arthur Charles (2001), Formal Models of Computation: The Ultimate Limits of Computing, AMAST series in computing, 7, World Scientific, Theorem 6.3.1, p. 309, ISBN   9789810245009 .