Discourse representation theory

Last updated

In formal linguistics, discourse representation theory (DRT) is a framework for exploring meaning under a formal semantics approach. One of the main differences between DRT-style approaches and traditional Montagovian approaches is that DRT includes a level of abstract mental representations (discourse representation structures, DRS) within its formalism, which gives it an intrinsic ability to handle meaning across sentence boundaries. DRT was created by Hans Kamp in 1981. [1] A very similar theory was developed independently by Irene Heim in 1982, under the name of File Change Semantics (FCS). [2] Discourse representation theories have been used to implement semantic parsers [3] and natural language understanding systems. [4] [5] [6]

Contents

Discourse representation structures

DRT uses discourse representation structures (DRS) to represent a hearer's mental representation of a discourse as it unfolds over time. There are two critical components to a DRS:

Consider Sentence (1) below:

(1) A farmer owns a donkey.

The DRS of (1) can be notated as (2) below:

(2) [x,y: farmer(x), donkey(y), owns(x,y)]

What (2) says is that there are two discourse referents, x and y, and three discourse conditions farmer, donkey, and owns, such that the condition farmer holds of x, donkey holds of y, and owns holds of the pair x and y.

Informally, the DRS in (2) is true in a given model of evaluation if and only if there are entities in that model that satisfy the conditions. So, if a model contains two individuals, and one is a farmer, the other is a donkey, and the first owns the second, the DRS in (2) is true in that model.

Uttering subsequent sentences results in the existing DRS being updated.

(3) He beats it.

Uttering (3) after (1) results in the DRS in (2) being updated as follows, in (4) (assuming a way to disambiguate which pronoun refers to which individual).

(4) [x,y: farmer(x), donkey(y), own(x,y), beat(x,y)]

Successive utterances of sentences work in a similar way, although the process is somewhat more complicated for more complex sentences such as sentences containing negation, and conditionals.

Donkey anaphora

In one sense, DRT offers a variation of first-order predicate calculus—its forms are pairs of first-order formulae and the free variables that occur in them. In traditional natural language semantics, only individual sentences are examined, but the context of a dialogue plays a role in meaning as well. For example, anaphoric pronouns such as he and she rely upon previously introduced individual constants in order to have meaning. DRT uses variables for every individual constant in order to account for this problem. A discourse is represented in a discourse representation structure (DRS), a box with variables at the top and the sentences in the formal language below in the order of the original discourse. Sub-DRS can be used for different types of sentences.

One of the major advantages of DRT is its ability to account for donkey sentences (Geach 1962) in a principled fashion:

(5) Every farmer who owns a donkey beats it.

Sentence (5) can be paraphrased as follows: Every farmer who owns a donkey beats the donkey that he/she owns. Under a Montagovian approach, the indefinite a donkey, which is assumed to be inherently an existential quantifier, ends up becoming a universal quantifier, an unwelcome result because the change in quantificational force cannot be accounted for in any principled way.

DRT avoids this problem by assuming that indefinites introduce discourse referents (DRs), which are stored in the mental representation and are accessible (or not, depending on the conditions) to expressions like pronouns and other anaphoric elements. Furthermore, they are inherently non-quantificational, and pick up quantificational force depending upon the context.

On the other hand, genuine quantifiers (e.g., 'every professor') bear scope. An 'every-NP' triggers the introduction of a complex condition of the form K1 → K2, where K1 and K2 are sub-DRSs representing the restriction and the scope of the quantification respectively.

Unlike true quantifiers, indefinite noun phrases just contribute a new DR (together with some descriptive material in terms of conditions on the DR), which is placed in a larger structure. This larger structure can be the top-level DRS or some sub-DRS according to the sentence-internal environment of the analyzed noun phrase—in other words, a level that is accessible to an anaphor that comes later.

See also

Related Research Articles

First-order logic—also called predicate logic, predicate calculus, quantificational logic—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables. Rather than propositions such as "all men are mortal", in first-order logic one can have expressions in the form "for all x, if x is a man, then x is mortal"; where "for all x" is a quantifier, x is a variable, and "... is a man" and "... is mortal" are predicates. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

In linguistics and grammar, a pronoun is a word or a group of words that one may substitute for a noun or noun phrase.

In mathematics, and in other disciplines involving formal languages, including mathematical logic and computer science, a variable may be said to be either free or bound. Some older books use the terms real variable and apparent variable for free variable and bound variable, respectively. A free variable is a notation (symbol) that specifies places in an expression where substitution may take place and is not a parameter of this or any container expression. The idea is related to a placeholder, or a wildcard character that stands for an unspecified symbol.

In philosophy and logic, a deflationary theory of truth is one of a family of theories that all have in common the claim that assertions of predicate truth of a statement do not attribute a property called "truth" to such a statement.

In linguistics, anaphora is the use of an expression whose interpretation depends upon another expression in context. In a narrower sense, anaphora is the use of an expression that depends specifically upon an antecedent expression and thus is contrasted with cataphora, which is the use of an expression that depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally. In the sentence Before her arrival, nobody saw Sally, the pronoun her refers forward to the postcedent Sally, so her is now a cataphor. Usually, an anaphoric expression is a pro-form or some other kind of deictic expression. Both anaphora and cataphora are species of endophora, referring to something mentioned elsewhere in a dialog or text.

In linguistics, focus is a grammatical category that conveys which part of the sentence contributes new, non-derivable, or contrastive information. In the English sentence "Mary only insulted BILL", focus is expressed prosodically by a pitch accent on "Bill" which identifies him as the only person whom Mary insulted. By contrast, in the sentence "Mary only INSULTED Bill", the verb "insult" is focused and thus expresses that Mary performed no other actions towards Bill. Focus is a cross-linguistic phenomenon and a major topic in linguistics. Research on focus spans numerous subfields including phonetics, syntax, semantics, pragmatics, and sociolinguistics.

In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in Bill said Alice would arrive soon, and she did, the words Alice and she refer to the same person.

In generative grammar and related frameworks, a node in a parse tree c-commands its sister node and all of its sister's descendants. In these frameworks, c-command plays a central role in defining and constraining operations such as syntactic movement, binding, and scope. Tanya Reinhart introduced c-command in 1976 as a key component of her theory of anaphora. The term is short for "constituent command".

In generative grammar and related approaches, the logical form (LF) of a linguistic expression is the variant of its syntactic structure which undergoes semantic interpretation. It is distinguished from phonetic form, the structure which corresponds to a sentence's pronunciation. These separate representations are postulated in order to explain the ways in which an expression's meaning can be partially independent of its pronunciation, e.g. scope ambiguities.

Attempto Controlled English (ACE) is a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced.

In semantics, a donkey sentence is a sentence containing a pronoun which is semantically bound but syntactically free. They are a classic puzzle in formal semantics and philosophy of language because they are fully grammatical and yet defy straightforward attempts to generate their formal language equivalents. In order to explain how speakers are able to understand them, semanticists have proposed a variety of formalisms including systems of dynamic semantics such as Discourse representation theory. Their name comes from the example sentence "Every farmer who owns a donkey beats it", in which "it" acts as a donkey pronoun because it is semantically but not syntactically bound by the indefinite noun phrase "a donkey". The phenomenon is known as donkey anaphora.

Quantificational variability effect (QVE) is the intuitive equivalence of certain sentences with quantificational adverbs (Q-adverbs) and sentences without these, but with quantificational determiner phrases (DP) in argument position instead.

<span class="mw-page-title-main">Hans Kamp</span> Dutch philosopher and linguist

Johan Anthony Willem "Hans" Kamp is a Dutch philosopher and linguist, responsible for introducing discourse representation theory (DRT) in 1981.

A bound variable pronoun is a pronoun that has a quantified determiner phrase (DP) – such as every, some, or who – as its antecedent.

Dynamic semantics is a framework in logic and natural language semantics that treats the meaning of a sentence as its potential to update a context. In static semantics, knowing the meaning of a sentence amounts to knowing when it is true; in dynamic semantics, knowing the meaning of a sentence means knowing "the change it brings about in the information state of anyone who accepts the news conveyed by it." In dynamic semantics, sentences are mapped to functions called context change potentials, which take an input context and return an output context. Dynamic semantics was originally developed by Irene Heim and Hans Kamp in 1981 to model anaphora, but has since been applied widely to phenomena including presupposition, plurals, questions, discourse relations, and modality.

Logophoricity is a phenomenon of binding relation that may employ a morphologically different set of anaphoric forms, in the context where the referent is an entity whose speech, thoughts, or feelings are being reported. This entity may or may not be distant from the discourse, but the referent must reside in a clause external to the one in which the logophor resides. The specially-formed anaphors that are morphologically distinct from the typical pronouns of a language are known as logophoric pronouns, originally coined by the linguist Claude Hagège. The linguistic importance of logophoricity is its capability to do away with ambiguity as to who is being referred to. A crucial element of logophoricity is the logophoric context, defined as the environment where use of logophoric pronouns is possible. Several syntactic and semantic accounts have been suggested. While some languages may not be purely logophoric, logophoric context may still be found in those languages; in those cases, it is common to find that in the place where logophoric pronouns would typically occur, non-clause-bounded reflexive pronouns appear instead.

Formal semantics is the study of grammatical meaning in natural languages using formal concepts from logic, mathematics and theoretical computer science. It is an interdisciplinary field, sometimes regarded as a subfield of both linguistics and philosophy of language. It provides accounts of what linguistic expressions mean and how their meanings are composed from the meanings of their parts. The enterprise of formal semantics can be thought of as that of reverse-engineering the semantic components of natural languages' grammars.

Noun ellipsis (N-ellipsis), also noun phrase ellipsis (NPE), is a mechanism that elides, or appears to elide, part of a noun phrase that can be recovered from context. The mechanism occurs in many languages like English, which uses it less than related languages.

In linguistics, givenness is a phenomenon in which a speaker assumes that contextual information of a topic of discourse is already known to the listener. The speaker thus considers it unnecessary to supply further contextual information through an expression's linguistic properties, its syntactic form or position, or its patterns of stress and intonation. Givenness involves contextual information in a discourse that is given, or assumed to be known, by the addressee in the moment of utterance. Therefore, a given expression must be known from prior discourse.

In formal semantics, the scope of a semantic operator is the semantic object to which it applies. For instance, in the sentence "Paulina doesn't drink beer but she does drink wine," the proposition that Paulina drinks beer occurs within the scope of negation, but the proposition that Paulina drinks wine does not. Scope can be thought of as the semantic order of operations.

References

  1. Kamp, Hans and Reyle, U. 1993. From Discourse to Logic. Kluwer, Dordrecht.
  2. Javier Gutiérrez-Rexach (2003). Semantics: Noun phrase classes. Taylor & Francis. ISBN   978-0-415-26635-2.
  3. Guzmán, Francisco, et al. "Using discourse structure improves machine translation evaluation." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.
  4. Ahrenberg, Lars, Arne Jönsson, and Nils Dahlbäck. Discourse representation and discourse management for a natural language dialogue system. Universitetet i Linköping/Tekniska Högskolan i Linköping. Institutionen för Datavetenskap, 1991.
  5. Rapaport, William J. "Syntactic semantics: Foundations of computational natural-language understanding." Thinking Computers and Virtual Persons. 1994. 225-273.
  6. Juan Carlos Augusto; Reiner Wichert; Rem Collier; David Keyson, Albert A. Salah and Ah-Hwee Tan (23 November 2013). Ambient Intelligence: 4th International Joint Conference, AmI 2013, Dublin, Ireland, December 3–5, 2013. Proceedings. Springer. ISBN   978-3-319-03647-2.