Kuroda normal form

Last updated

In formal language theory, a noncontracting grammar is in Kuroda normal form if all production rules are of the form: [1]

Contents

ABCD or
ABC or
AB or
Aa

where A, B, C and D are nonterminal symbols and a is a terminal symbol. [1] Some sources omit the AB pattern. [2]

It is named after Sige-Yuki Kuroda, who originally called it a linear bounded grammar, a terminology that was also used by a few other authors thereafter. [3]

Every grammar in Kuroda normal form is noncontracting, and therefore, generates a context-sensitive language. Conversely, every noncontracting grammar that does not generate the empty string can be converted to Kuroda normal form. [2]

A straightforward technique attributed to György Révész transforms a grammar in Kuroda normal form to a context-sensitive grammar: ABCD is replaced by four context-sensitive rules ABAZ, AZWZ, WZWD and WDCD. This proves that every noncontracting grammar generates a context-sensitive language. [1]

There is a similar normal form for unrestricted grammars as well, which at least some authors call "Kuroda normal form" too: [4]

ABCD or
ABC or
Aa or
Aε

where ε is the empty string. Every unrestricted grammar is weakly equivalent to one using only productions of this form. [2]

If the rule AB → CD is eliminated from the above, one obtains context-free grammars in Chomsky Normal Form. [5] The Penttonen normal form (for unrestricted grammars) is a special case where first rule above is ABAD. [4] Similarly, for context-sensitive grammars, the Penttonen normal form, also called the one-sided normal form (following Penttonen's own terminology) is: [1] [2]

ABAD or
ABC or
Aa

For every context-sensitive grammar, there exists a weakly equivalent one-sided normal form. [2]

See also

Related Research Articles

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars.

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar. Context-sensitive is one of the four types of grammars in the Chomsky hierarchy.

In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG).

In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form if all of its production rules are of the form:

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

The star height problem in formal language theory is the question whether all regular languages can be expressed using regular expressions of limited star height, i.e. with a limited nesting depth of Kleene stars. Specifically, is a nesting depth of one always sufficient? If not, is there an algorithm to determine how many are required? The problem was raised by Eggan (1963).

In formal language theory, a context-free grammar is in Greibach normal form (GNF) if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables. A non-strict form allows one exception to this format restriction for allowing the empty word to be a member of the described language. The normal form was established by Sheila Greibach and it bears her name.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

In computer science, a linear bounded automaton is a restricted form of Turing machine.

Indexed languages are a class of formal languages discovered by Alfred Aho; they are described by indexed grammars and can be recognized by nested stack automata.

In formal grammar theory, the deterministic context-free grammars (DCFGs) are a proper subset of the context-free grammars. They are the subset of context-free grammars that can be derived from deterministic pushdown automata, and they generate the deterministic context-free languages. DCFGs are always unambiguous, and are an important subclass of unambiguous CFGs; there are non-deterministic unambiguous CFGs, however.

In computer science, in particular in the field of formal language theory, an abstract family of languages is an abstract mathematical notion generalizing characteristics common to the regular languages, the context-free languages and the recursively enumerable languages, and other families of formal languages studied in the scientific literature.

In formal language theory, a grammar is noncontracting if for all of its production rules, α → β, it holds that |α| ≤ |β|, that is β has at least as many symbols as α. A grammar is essentially noncontracting if there may be one exception, namely, a rule S → ε where S is the start symbol and ε the empty string, and furthermore, S never occurs in the right-hand side of any rule.

In formal language theory, a cone is a set of formal languages that has some desirable closure properties enjoyed by some well-known sets of languages, in particular by the families of regular languages, context-free languages and the recursively enumerable languages. The concept of a cone is a more abstract notion that subsumes all of these families. A similar notion is the faithful cone, having somewhat relaxed conditions. For example, the context-sensitive languages do not form a cone, but still have the required properties to form a faithful cone.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

Sige-Yuki Kuroda, also known as S.-Y. Kuroda, was Professor Emeritus and Research Professor of Linguistics at the University of California, San Diego. Although a pioneer in the application of Chomskyan generative syntax to the Japanese language, he is known for the broad range of his work across the language sciences. For instance, in formal language theory, the Kuroda normal form for context-sensitive grammars bears his name.

In mathematics, logic and computer science, a formal language is called recursive if it is a recursive subset of the set of all possible finite sequences over the alphabet of the language. Equivalently, a formal language is recursive if there exists a total Turing machine that, when given a finite sequence of symbols as input, accepts it if it belongs to the language and rejects it otherwise. Recursive languages are also called decidable.

In formal language theory, a growing context-sensitive grammar is a context-sensitive grammar in which the productions increase the length of the sentences being generated. These grammars are thus noncontracting and context-sensitive. A growing context-sensitive language is a context-sensitive language generated by these grammars.

References

  1. 1 2 3 4 Masami Ito; Yūji Kobayashi; Kunitaka Shoji (2010). Automata, Formal Languages and Algebraic Systems: Proceedings of AFLAS 2008, Kyoto, Japan, 20-22 September 2008. World Scientific. p. 182. ISBN   978-981-4317-60-3.
  2. 1 2 3 4 5 Mateescu, Alexandru; Salomaa, Arto (1997). "Chapter 4: Aspects of Classical Language Theory". In Rozenberg, Grzegorz; Salomaa, Arto (eds.). Handbook of Formal Languages. Volume I: Word, language, grammar. Springer-Verlag. p. 190. ISBN   978-3-540-61486-9.
  3. Willem J. M. Levelt (2008). An Introduction to the Theory of Formal Languages and Automata. John Benjamins Publishing. pp. 126–127. ISBN   978-90-272-3250-2.
  4. 1 2 Alexander Meduna (2000). Automata and Languages: Theory and Applications. Springer Science & Business Media. p. 722. ISBN   978-1-85233-074-3.
  5. Alexander Meduna (2000). Automata and Languages: Theory and Applications. Springer Science & Business Media. p. 728. ISBN   978-1-85233-074-3.

Further reading