Equivalence (formal languages)

Last updated December 12, 2021

In formal language theory, weak equivalence of two grammars means they generate the same set of strings, i.e. that the formal language they generate is the same. In compiler theory the notion is distinguished from strong (or structural) equivalence, which additionally means that the two parse trees ^{[ clarification needed ]} are reasonably similar in that the same semantic interpretation can be assigned to both.^[1]

On the other hand, if two grammars generate the same set of derivation trees (or more generally, the same set of abstract syntactic objects), then the two grammars are strongly equivalent. Chomsky (1963)^[3] introduces the notion of strong equivalence, and argues that only strong equivalence is relevant when comparing grammar formalisms. Kornai and Pullum (1990)^[4] and Miller (1994)^[5] offer a refined notion of strong equivalence that allows for isomorphic relationships between the syntactic analyses given by different formalisms. Yoshinaga, Miyao, and Tsujii (2002)^[6] offers a proof of the strong equivalency of the LTAG and HPSG formalisms.

Context-free grammar example

As an example, consider the following two context-free grammars,^{[note 1]} given in Backus-Naur form:

<expression>::=<expression> "+" <expression> | <expression> "-" <expression>                | <expression> "*" <expression> | <expression> "/" <expression>                 | "x" | "y" | "z"   |   "1" | "2" | "3"   |   "(" <expression> ")"

<expression>::=<term>   | <expression> "+" <term>   | <expression> "-" <term><term>::=<factor> |       <term> "*" <factor> |       <term> "/" <factor><factor>::= "x" | "y" | "z"   |   "1" | "2" | "3"   |   "(" <expression> ")"

Both grammars generate the same set of strings, viz. the set of all arithmetical expressions that can be built from the variables "x", "y", "z", the constants "1", "2", "3", the operators "+", "-", "*", "/", and parentheses "(" and ")". However, a concrete syntax tree of the second grammar always reflects the usual order of operations, while a tree from the first grammar need not.

For the example string "1+2*3", the right part of the picture shows its unique parse tree with the second grammar;^{[note 2]} evaluating this tree in postfix order will yield the proper value, 7. In contrast, the left picture part shows one of the parse trees for that string with the first grammar; evaluating it in postfix order will yield 9.

Since the second grammar cannot generate a tree corresponding to the left picture part, while the first grammar can, both grammars are not strongly equivalent.

Generative capacity

In linguistics, the weak generative capacity of a grammar is defined as the set of all strings generated by it,^{[note 3]} while a grammar's strong generative capacity refers to the set of "structural descriptions"^{[note 4]} generated by it.^[7] As a consequence, two grammars are considered weakly equivalent if their weak generative capacities coincide; similar for strong equivalence. The notion of generative capacity was introduced by Noam Chomsky in 1963.^[3]^[7]

Notes

↑ with the start symbol "<expression>"
↑ using the abbreviation "E", "T", and "F" for <expression>, <term>, and <factor>, respectively
↑ for context-free grammars: see Context-free grammar#Context-free language for a formal definition
↑ for context-free grammars: concrete syntax trees

Related Research Articles

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by CSG but not by context-free grammars. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSG are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

In theoretical computer science and formal language theory, a regular language is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science.

In linguistics, transformational grammar (TG) or transformational-generative grammar (TGG) is part of the theory of generative grammar, especially of natural languages. It considers grammar to be a system of rules that generate exactly those combinations of words that form grammatical sentences in a given language and involves the use of defined operations to produce new sentences from existing ones. The method is commonly associated with American linguist Noam Chomsky.

In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Generative grammar is a concept in generative linguistics, a linguistic theory that regards linguistics as the study of a hypothesised innate grammatical structure. It is a biological or biologistic modification of structuralist theories, deriving ultimately from glossematics. Generative grammar considers grammar as a system of rules that generates exactly those combinations of words that form grammatical sentences in a given language. The difference from structural and functional models is that the object is base-generated within the verb phrase in generative grammar. This purportedly cognitive structure is thought of as being a part of a universal grammar, a syntactic structure which is caused by a genetic mutation in humans.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are result of attribute evaluation rules associated with productions of the grammar. Attributes allow to transfer information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi. Tree-adjoining grammars are somewhat similar to context-free grammars, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees.

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language and a suffix. For instance, $can be recognized as a sum because it can be broken into, also a sum, and, a suitable suffix.$

Linguistic competence is the system of linguistic knowledge possessed by native speakers of a language. It is distinguished from linguistic performance, which is the way a language system is used in communication. Noam Chomsky introduced this concept in his elaboration of generative grammar, where it has been widely adopted and competence is the only level of language that is studied.

Indexed languages are a class of formal languages discovered by Alfred Aho; they are described by indexed grammars and can be recognized by nested stack automata.

An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated.

In formal language theory, a grammar is noncontracting if all of its production rules are of the form α → β where α and β are strings of nonterminal and terminal symbols, and the length of α is less than or equal to that of β, |α| ≤ |β|, that is β is not shorter than α. A grammar is essentially noncontracting if there may be one exception, namely, a rule S → ε where S is the start symbol and ε the empty string, and furthermore, S never occurs in the right-hand side of any rule.

In formal language theory, a grammar describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

In linguistics, the term formalism is used in a variety of meanings which relate to formal linguistics in different ways. In common usage, it is merely synonymous with a grammatical model or a syntactic model: a method for analyzing sentence structures. Such formalisms include different methodologies of generative grammar which are especially designed to produce grammatically correct strings of words; or the likes of Functional Discourse Grammar which builds on predicate logic.

Formal linguistics is the branch of linguistics which uses applied mathematical methods for the analysis of natural languages. Such methods include formal languages, formal grammars and first-order logical expressions. Formal linguistics also forms the basis of computational linguistics. Since the 1980s, the term is often used to refer to Chomskyan linguistics.

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed in an effort to provide adequate descriptions of the syntactic structure of natural language.

References

↑ Stefano Crespi Reghizzi (2009). Formal Languages and Compilation. Springer. p. 57. ISBN 978-1-84882-049-4.
↑ Vijay-Shanker, K. and Weir, David J. (1994). "The Equivalence of Four Extensions of Context-Free Grammars". Mathematical Systems Theory. 27 (6): 511–546. doi:10.1007/BF01191624. S2CID 12336597.CS1 maint: multiple names: authors list (link)
1 2 Noam Chomsky (1963). "Formal properties of grammar". In R.D. Luce; R.R. Bush; E. Galanter (eds.). Handbook of Mathematical Psychology. II. New York: Wiley. pp. 323—418.
↑ Kornai, A. and Pullum, G. K. 1990. The X-bar Theory of Phrase Structure. Language, 66:24-50.
↑ Miller, Philip H. 1999. Strong Generative Capacity. CSLI publications.
↑ "Yoshinaga, N., Miyao Y., and Tsujii, J. 2002. A formal proof of strong equivalence for a grammar conversion from LTAG to HPSG-style. In the Proceedings of the TAG+6 Workshop:187-192. Venice, Italy" (PDF). Archived from the original (PDF) on 2011-08-28. Retrieved 2012-02-05.
1 2 Emmon Bach; Philip Miller (2003). "Generative Capacity" (PDF). In William J. Frawley (ed.). International Encyclopedia of Linguistics (2nd ed.). Oxford University Press. doi:10.1093/acref/9780195139778.001.0001. ISBN 9780195139778.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[7] with the start symbol "<expression>"

[8] using the abbreviation "E", "T", and "F" for <expression>, <term>, and <factor>, respectively

[9] r context-free grammars: see Context-free grammar#Context-free language for a formal definition

[10] r context-free grammars: concrete syntax trees

[Reghizzi2009-1] Stefano Crespi Reghizzi (2009). Formal Languages and Compilation. Springer. p. 57. ISBN 978-1-84882-049-4.

[vijayshankarAndWeir1995-2] Vijay-Shanker, K. and Weir, David J. (1994). "The Equivalence of Four Extensions of Context-Free Grammars". Mathematical Systems Theory. 27 (6): 511–546. doi:10.1007/BF01191624. S2CID 12336597.CS1 maint: multiple names: authors list (link)

[Chomsky.1963-3] 1 2 Noam Chomsky (1963). "Formal properties of grammar". In R.D. Luce; R.R. Bush; E. Galanter (eds.). Handbook of Mathematical Psychology. II. New York: Wiley. pp. 323—418.

[kornaiAndPullum1990-4] Kornai, A. and Pullum, G. K. 1990. The X-bar Theory of Phrase Structure. Language, 66:24-50.

[miller1994-5] Miller, Philip H. 1999. Strong Generative Capacity. CSLI publications.

[yoshinagaMiyaoAndTsujii2002-6] "Yoshinaga, N., Miyao Y., and Tsujii, J. 2002. A formal proof of strong equivalence for a grammar conversion from LTAG to HPSG-style. In the Proceedings of the TAG+6 Workshop:187-192. Venice, Italy" (PDF). Archived from the original (PDF) on 2011-08-28. Retrieved 2012-02-05.

[Bach.Miller.2003-11] 1 2 Emmon Bach; Philip Miller (2003). "Generative Capacity" (PDF). In William J. Frawley (ed.). International Encyclopedia of Linguistics (2nd ed.). Oxford University Press. doi:10.1093/acref/9780195139778.001.0001. ISBN 9780195139778.

[1]

[2]

[3]

[4]

[5]

[6]

[note 1]

[note 2]

[note 3]

[note 4]

[7]