Applicative universal grammar

Last updated

Applicative universal grammar, or AUG, is a universal semantic metalanguage intended for studying the semantic processes in particular languages. [1] This is a linguistic theory that views the formation of phrase structure by analogy to function application in an applicative programming language. Among the innovations in this approach to natural language processing are the ideas of functional superposition and stratified types. [2] [3]

Contents

Example

In the paper entitled Using Types to Parse Natural Language Mark P. Jones, Paul Hudak and Sebastian Shaumyan, which describes an implementation of AUG parsing in Haskell, there is a brief introduction to AUG. It's paraphrased here, using one of their examples: [4]

AUG has just two primitive types: T for terms, S for "sentences" (though AUG apparently allows for sentence fragments being of type S.) There is one non-primitive type, that returns a function: Oxy. O reduces x and y to another type, xy, that combines the types of x and y, xy. Words can be functions of this type. The type of a word like "my" (thing) is OTT: it takes something of type T and yields something of type T. "My friend" has a structure like this:

Rules can transform p of type Oxy to q of type x:

Note that making O a prefix operator obviates the need for parentheses (as in more conventional mathematical orthography), and makes for a more compact presentation on the page.

Here is how "my friend lives in Boston" reduces in AUG.

Aug figure3.jpg

AUG allows both forward and backward application of O. Here is the rule for backward application:

The word "living" gets reduced to the more specific "living in Boston". AUG distinguishes between phenotype and genotype grammar; phenotype corresponds closely to the actual language, as in the previous graphic. Genotype is a more universal structure for the meaning. For the sake of clearer focus on genotype issues, tree branch order can be rendered so that functions are to the left of their arguments. A more genotypical parse tree looks like this:

Aug figure5.jpg

The intransitive "lives" is typed OTS: the subject is a T, but an S -- a sentence phrase -- must be the resulting type. Why does "in" have type OTOOTSOTS? "In" as a three-place function requires a location ("Boston"; type T), something happening ("lives"; type OTS), and in this example, a subject, someone who lives, also of type T.

See also

Related Research Articles

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

Combinatory logic is a notation to eliminate the need for quantified variables in mathematical logic. It was introduced by Moses Schönfinkel and Haskell Curry, and has more recently been used in computer science as a theoretical model of computation and also as a basis for the design of functional programming languages. It is based on combinators, which were introduced by Schönfinkel in 1920 with the idea of providing an analogous way to build up functions—and to remove any mention of variables—particularly in predicate logic. A combinator is a higher-order function that uses only function application and earlier defined combinators to define a result from its arguments.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are result of attribute evaluation rules associated with productions of the grammar. Attributes allow to transfer information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by Yehoshua Bar-Hillel and Joachim Lambek. It saw a surge of interest in the 1970s following the work of Richard Montague, whose Montague grammar assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.

Frame semantics is a theory of linguistic meaning developed by Charles J. Fillmore that extends his earlier case grammar. It relates linguistic semantics to encyclopedic knowledge. The basic idea is that one cannot understand the meaning of a single word without access to all the essential knowledge that relates to that word. For example, one would not be able to understand the word "sell" without knowing anything about the situation of commercial transfer, which also involves, among other things, a seller, a buyer, goods, money, the relation between the money and the goods, the relations between the seller and the goods and the money, the relation between the buyer and the goods and the money and so on. Thus, a word activates, or evokes, a frame of semantic knowledge relating to the specific concept to which it refers.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

<span class="mw-page-title-main">Sebastian Shaumyan</span> American linguist

Sebastian Konstantinovich Shaumyan was an Armenian American theoretician of linguistics and an outspoken adherent of structuralist analysis.

In programming language semantics, normalisation by evaluation (NBE) is a method of obtaining the normal form of terms in the λ-calculus by appealing to their denotational semantics. A term is first interpreted into a denotational model of the λ-term structure, and then a canonical (β-normal and η-long) representative is extracted by reifying the denotation. Such an essentially semantic, reduction-free, approach differs from the more traditional syntactic, reduction-based, description of normalisation as reductions in a term rewrite system where β-reductions are allowed deep inside λ-terms.

Donkey sentences are sentences that contain a pronoun with clear meaning but whose syntactical role in the sentence poses challenges to grammarians. Such sentences defy straightforward attempts to generate their formal language equivalents. The difficulty is with understanding how English speakers parse such sentences.

Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure, quantification and information structure. The formalism generates constituency-based structures and is therefore a type of phrase structure grammar.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

<span class="mw-page-title-main">Formal grammar</span> Structure of a formal language

A formal grammar describes how to form strings from an alphabet of a formal language that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

This is an index of Wikipedia articles in philosophy of language

Minimalist grammars are a class of formal grammars that aim to provide a more rigorous, usually proof-theoretic, formalization of Chomskyan Minimalist program than is normally provided in the mainstream Minimalist literature. A variety of particular formalizations exist, most of them developed by Edward Stabler, Alain Lecomte, Christian Retoré, or combinations thereof.

Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 under the Viewpoints Research Institute. The language is based on Parsing Expression Grammars (PEGs) rather than Context-Free Grammars with the intent of providing "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".

In mathematical logic, the intersection type discipline is a branch of type theory encompassing type systems that use the intersection type constructor to assign multiple types to a single term. In particular, if a term can be assigned both the type and the type , then can be assigned the intersection type . Therefore, the intersection type constructor can be used to express finite heterogeneous ad hoc polymorphism . For example, the λ-term can be assigned the type in most intersection type systems, assuming for the term variable both the function type and the corresponding argument type .

References

  1. Shaumyan, Sebastian (1987). A Semiotic Theory of Language (1st ed.). Bloomington and Indianapolis: Indiana University Press. ISBN   978-0253304728.
  2. Bernard P., Sypniewski (2009) [1997], "Functional Superposition" (PDF), LACUS Forum (Linguistic Association of Canada and the United States), Brigham Young University, 23: 279–287, doi:10.3115/991250.991285
  3. Sebastian, Shaumyan; Segond, Frédérique (1994), "Long-Distance Dependencies and Applicative Universal Grammar" (PDF), COLING '94: Proceedings of the 15th conference on Computational linguistics, 2: 853–858, doi:10.3115/991250.991285
  4. Jones, Mark P.; Hudak, Paul; Shaumyan, Sebastian (1995), "Using Types to Parse Natural Language" (PDF), in Turner, David (ed.), Proceedings of the Glasgow Workshop on Functional Programming, Workshops in Computer Science Series.(IFIP), Springer-Verlag, doi: 10.14236/ewic/FP1995.0 , ISBN   978-3540145806

Further reading