Applicative universal grammar

Last updated

Applicative universal grammar, or AUG, is a universal semantic metalanguage intended for studying the semantic processes in particular languages. [1] This is a linguistic theory that views the formation of phrase structure by analogy to function application in an applicative programming language. Among the innovations in this approach to natural language processing are the ideas of functional superposition and stratified types. [2] [3]

Contents

Example

In the paper entitled Using Types to Parse Natural Language Mark P. Jones, Paul Hudak and Sebastian Shaumyan, which describes an implementation of AUG parsing in Haskell, there is a brief introduction to AUG. It's paraphrased here, using one of their examples: [4]

AUG has just two primitive types: T for terms, S for "sentences" (though AUG apparently allows for sentence fragments being of type S.) There is one non-primitive type, that returns a function: Oxy. O reduces x and y to another type, xy, that combines the types of x and y, xy. Words can be functions of this type. The type of a word like "my" (thing) is OTT: it takes something of type T and yields something of type T. "My friend" has a structure like this:

Rules can transform p of type Oxy to q of type x:

Note that making O a prefix operator obviates the need for parentheses (as in more conventional mathematical orthography), and makes for a more compact presentation on the page.

Here is how "my friend lives in Boston" reduces in AUG.

Aug figure3.jpg

AUG allows both forward and backward application of O. Here is the rule for backward application:

The word "living" gets reduced to the more specific "living in Boston". AUG distinguishes between phenotype and genotype grammar; phenotype corresponds closely to the actual language, as in the previous graphic. Genotype is a more universal structure for the meaning. For the sake of clearer focus on genotype issues, tree branch order can be rendered so that functions are to the left of their arguments. A more genotypical parse tree looks like this:

Aug figure5.jpg

The intransitive "lives" is typed OTS: the subject is a T, but an S -- a sentence phrase -- must be the resulting type. Why does "in" have type OTOOTSOTS? "In" as a three-place function requires a location ("Boston"; type T), something happening ("lives"; type OTS), and in this example, a subject, someone who lives, also of type T.

See also

Related Research Articles

In mathematics and computer science, currying is the technique of translating a function that takes multiple arguments into a sequence of families of functions, each taking a single argument.

Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are the result of attribute evaluation rules associated with productions of the grammar. Attributes allow the transfer of information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.

Categorial grammar is a family of formalisms in natural language syntax that share the central assumption that syntactic constituents combine as functions and arguments. Categorial grammar posits a close relationship between the syntax and semantic composition, since it typically treats syntactic categories as corresponding to semantic types. Categorial grammars were developed in the 1930s by Kazimierz Ajdukiewicz and in the 1950s by Yehoshua Bar-Hillel and Joachim Lambek. It saw a surge of interest in the 1970s following the work of Richard Montague, whose Montague grammar assumed a similar view of syntax. It continues to be a major paradigm, particularly within formal semantics.

Binary combinatory logic (BCL) is a computer programming language that uses binary terms 0 and 1 to create a complete formulation of combinatory logic using only the symbols 0 and 1. Using the S and K combinators, complex boolean algebra functions can be made. BCL has applications in the theory of program-size complexity.

Frame semantics is a theory of linguistic meaning developed by Charles J. Fillmore that extends his earlier case grammar. It relates linguistic semantics to encyclopedic knowledge. The basic idea is that one cannot understand the meaning of a single word without access to all the essential knowledge that relates to that word. For example, one would not be able to understand the word "sell" without knowing anything about the situation of commercial transfer, which also involves, among other things, a seller, a buyer, goods, money, the relation between the money and the goods, the relations between the seller and the goods and the money, the relation between the buyer and the goods and the money and so on. Thus, a word activates, or evokes, a frame of semantic knowledge relating to the specific concept to which it refers.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

<span class="mw-page-title-main">Treebank</span> Text corpus with tree annotations

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Grammar induction is the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs.

<span class="mw-page-title-main">Sebastian Shaumyan</span> American linguist

Sebastian Konstantinovich Shaumyan was an Armenian American theoretician of linguistics and an outspoken adherent of structuralist analysis.

In programming language semantics, normalisation by evaluation (NBE) is a method of obtaining the normal form of terms in the λ-calculus by appealing to their denotational semantics. A term is first interpreted into a denotational model of the λ-term structure, and then a canonical (β-normal and η-long) representative is extracted by reifying the denotation. Such an essentially semantic, reduction-free, approach differs from the more traditional syntactic, reduction-based, description of normalisation as reductions in a term rewrite system where β-reductions are allowed deep inside λ-terms.

Combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicate–argument structure, quantification and information structure. The formalism generates constituency-based structures and is therefore a type of phrase structure grammar.

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

This is an index of Wikipedia articles in philosophy of language

Minimalist grammars are a class of formal grammars that aim to provide a more rigorous, usually proof-theoretic, formalization of Chomskyan Minimalist program than is normally provided in the mainstream Minimalist literature. A variety of particular formalizations exist, most of them developed by Edward Stabler, Alain Lecomte, Christian Retoré, or combinations thereof.

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 at the Viewpoints Research Institute. The language is based on parsing expression grammars (PEGs), rather than context-free grammars, with the intent to provide "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".

In mathematical logic, the intersection type discipline is a branch of type theory encompassing type systems that use the intersection type constructor to assign multiple types to a single term. In particular, if a term can be assigned both the type and the type , then can be assigned the intersection type . Therefore, the intersection type constructor can be used to express finite heterogeneous ad hoc polymorphism . For example, the λ-term can be assigned the type in most intersection type systems, assuming for the term variable both the function type and the corresponding argument type .

References

  1. Shaumyan, Sebastian (1987). A Semiotic Theory of Language (1st ed.). Bloomington and Indianapolis: Indiana University Press. ISBN   978-0253304728.
  2. Sypniewski, B. P (1996). "Functional superposition" (PDF). The LACUS Forum (23): 279–287. ISSN   0195-377X.
  3. Shaumyan, Sebastian; Segond, Frédérique (1994). "Long-distance dependencies and Applicative Universal Grammar" (PDF). Proceedings of the 15th conference on Computational linguistics -. Vol. 2. Association for Computational Linguistics. p. 853. doi: 10.3115/991250.991285 .
  4. Jones, Mark P.; Hudak, Paul; Shaumyan, Sebastian (1995), "Using Types to Parse Natural Language" (PDF), in Turner, David (ed.), Proceedings of the Glasgow Workshop on Functional Programming, Workshops in Computer Science Series.(IFIP), Springer-Verlag, doi: 10.14236/ewic/FP1995.0 , ISBN   978-3540145806

Further reading