Abstract Meaning Representation

Last updated

Abstract Meaning Representation (AMR) [1] [2] is a semantic representation language. AMR graphs are rooted, labeled, directed, acyclic graphs (DAGs), comprising whole sentences. They are intended to abstract away from syntactic representations, in the sense that sentences which are similar in meaning should be assigned the same AMR, even if they are not identically worded. By nature, the AMR language is biased towards English – it is not meant to function as an international auxiliary language.

Contents

Abstract Meaning Representations have originally been introduced by Langkilde and Knight (1998) [3] as a derivation from the Penman Sentence Plan Language, [4] they are thus continuing a long tradition in Natural Language Generation and this has been their original domain of application. AMRs have re-gained attention since Banarescu et al. (2013), [1] in particular, this includes the extension to novel tasks such as machine translation and natural language understanding. The modern (post-2010) AMR format preserves the syntax and many syntactic conceptions of the original AMR format but has been thoroughly revised to better align with PropBank. Moreover, AMR has been extended with formal conventions for metadata and conventions for entity linking (here, linking with Wikipedia entries).

Existing AMR technology includes tools and libraries for parsing, [5] visualization, [6] and surface generation [7] as well as a considerable number of publicly available data sets. Many of these resources are collected at the AMR homepage [8] at ISI/USC where AMR technology has been originally developed.

Example

Example sentence: The boy wants to go.

(w / want-01   :arg0 (b / boy)   :arg1 (g / go-01     :arg0 b)) 

As far as predicate semantics are concerned, the role inventory of PropBank is largely based on semantic role annotations in the style of PropBank. Note that in pre-2010 AMR format, `:arg0` would be `:agent`, etc.


Banarescu et al. (2013) [1] claim that this is equivalent to the following logical formula:

In addition, they claim that this representation makes the will of the boy more explicit, highlighting that the intention of the boy is that he himself goes away (because `want-01` is the type of the top-level predicate).

Uniform Meaning Representations

In an extension of the original AMR formalism, Uniform Meaning Representations (UMR) have been proposed. [9] While grounded in AMR, they eliminate specific characteristics of the English language that are featured in AMR, and are thus more easily applicable cross-linguistically. [9]

Related Research Articles

The following outline is provided as an overview and topical guide to linguistics:

Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits two separate levels of syntactic structure, a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

A symbolic linguistic representation is a representation of an utterance that uses symbols to represent linguistic information about the utterance, such as information about phonetics, phonology, morphology, syntax, or semantics. Symbolic linguistic representations are different from non-symbolic representations, such as recordings, because they use symbols to represent linguistic information rather than measurements.

The language of thought hypothesis (LOTH), sometimes known as thought ordered mental expression (TOME), is a view in linguistics, philosophy of mind and cognitive science, forwarded by American philosopher Jerry Fodor. It describes the nature of thought as possessing "language-like" or compositional structure. On this view, simple concepts combine in systematic ways to build thoughts. In its most basic form, the theory states that thought, like language, has syntax.

Montague grammar is an approach to natural language semantics, named after American logician Richard Montague. The Montague grammar is based on mathematical logic, especially higher-order predicate logic and lambda calculus, and makes use of the notions of intensional logic, via Kripke models. Montague pioneered this approach in the 1960s and early 1970s.

In semantics, mathematical logic and related disciplines, the principle of compositionality is the principle that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. The principle is also called Frege's principle, because Gottlob Frege is widely credited for the first modern formulation of it. However, the principle has never been explicitly stated by Frege, and arguably it was already assumed by George Boole decades before Frege's work.

In generative grammar and related frameworks, a node in a parse tree c-commands its sister node and all of its sister's descendants. In these frameworks, c-command plays a central role in defining and constraining operations such as syntactic movement, binding, and scope. Tanya Reinhart introduced c-command in 1976 as a key component of her theory of anaphora. The term is short for "constituent command".

SPL is an abstract notation representing the semantics of a sentence in natural language. In a classical Natural Language Generation (NLG) workflow, an initial text plan is transformed by a sentence planner (generator) component to a sequence of sentence plans modelled in a Sentence Plan Language. A surface generator can be used to transform the SPL notation into natural language sentences.

In formal linguistics, discourse representation theory (DRT) is a framework for exploring meaning under a formal semantics approach. One of the main differences between DRT-style approaches and traditional Montagovian approaches is that DRT includes a level of abstract mental representations within its formalism, which gives it an intrinsic ability to handle meaning across sentence boundaries. DRT was created by Hans Kamp in 1981. A very similar theory was developed independently by Irene Heim in 1982, under the name of File Change Semantics (FCS). Discourse representation theories have been used to implement semantic parsers and natural language understanding systems.

<span class="mw-page-title-main">Treebank</span> Text corpus with tree annotations

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

The linguistics wars were extended disputes among American theoretical linguists that occurred mostly during the 1960s and 1970s, stemming from a disagreement between Noam Chomsky and several of his associates and students. The debates started in 1967 when linguists Paul Postal, John R. Ross, George Lakoff, and James D. McCawley —self-dubbed the "Four Horsemen of the Apocalypse"—proposed an alternative approach in which the relation between semantics and syntax is viewed differently, which treated deep structures as meaning rather than syntactic objects. While Chomsky and other generative grammarians argued that meaning is driven by an underlying syntax, generative semanticists posited that syntax is shaped by an underlying meaning. This intellectual divergence led to two competing frameworks in generative semantics and interpretive semantics.

Meaning–text theory (MTT) is a theoretical linguistic framework, first put forward in Moscow by Aleksandr Žolkovskij and Igor Mel’čuk, for the construction of models of natural language. The theory provides a large and elaborate basis for linguistic description and, due to its formal character, lends itself particularly well to computer applications, including machine translation, phraseology, and lexicography.

Formal semantics is the study of grammatical meaning in natural languages using formal concepts from logic, mathematics and theoretical computer science. It is an interdisciplinary field, sometimes regarded as a subfield of both linguistics and philosophy of language. It provides accounts of what linguistic expressions mean and how their meanings are composed from the meanings of their parts. The enterprise of formal semantics can be thought of as that of reverse-engineering the semantic components of natural languages' grammars.

In linguistics, selection denotes the ability of predicates to determine the semantic content of their arguments. Predicates select their arguments, which means they limit the semantic content of their arguments. One sometimes draws a distinction between types of selection; one acknowledges both s(emantic)-selection and c(ategory)-selection. Selection in general stands in contrast to subcategorization: predicates both select and subcategorize for their complement arguments, whereas they only select their subject arguments. Selection is a semantic concept, whereas subcategorization is a syntactic one. Selection is closely related to valency, a term used in other grammars than the Chomskian generative grammar, for a similar phenomenon.

Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.

In natural language processing, a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

Dynamic Syntax (DS) is a grammar formalism and linguistic theory whose overall aim is to explain the real-time processes of language understanding and production, and describe linguistic structures as happening step-by-step over time. Under the DS approach, syntactic knowledge is understood as the ability to incrementally analyse the structure and content of spoken and written language in context and in real-time. While it posits representations similar to those used in Combinatory categorial grammars (CCG), it builds those representations left-to-right going word-by-word. Thus it differs from other syntactic models which generally abstract away from features of everyday conversation such as interruption, backtracking, and self-correction. Moreover, it differs from other approaches in that it does not postulate an independent level of syntactic structure over words.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

References

  1. 1 2 3 Banarescu, Laura; Bonial, Claire; Cai, Shu; Georgescu, Madalina; Griffitt, Kira; Hermjakob, Ulf; Knight, Kevin; Koehn, Philipp; Palmer, Martha; Schneider, Nathan (2013). Abstract Meaning Representation for Sembanking (PDF). Sofia, Bulgaria: Association for Computational Linguistics. pp. 178–186. Retrieved 28 June 2019.[ dead link ]
  2. "Abstract Meaning Representation (AMR)".[ dead link ]
  3. Langkilde, Irene  and Knight, Kevin (1998), Generation that Exploits Corpus-Based Statistical Knowledge, In COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics
  4. Kasper, Robert T. (1989). "A flexible interface for linking applications to Penman's sentence generator". Proceedings of the Workshop on Speech and Natural Language - HLT '89. Philadelphia, Pennsylvania: Association for Computational Linguistics: 153–158. doi: 10.3115/100964.100979 .
  5. "Penman 1.2.1 documentation".
  6. Jascob, Brad (2022-03-07), amrlib , retrieved 2022-03-16
  7. Montréal, RALI, Université de (2021-06-25), GoPhi : an AMR to ENGLISH VERBALIZER , retrieved 2022-03-16{{citation}}: CS1 maint: multiple names: authors list (link)
  8. "Abstract Meaning Representation (AMR)". amr.isi.edu. Retrieved 2022-03-16.
  9. 1 2 "umr-guidelines/guidelines.md at master · umr4nlp/umr-guidelines". GitHub. Retrieved 2023-10-05.