Abstract Meaning Representation

Last updated

Abstract Meaning Representation (AMR) [1] [2] is a semantic representation language. AMR graphs are rooted, labeled, directed, acyclic graphs (DAGs), comprising whole sentences. They are intended to abstract away from syntactic representations, in the sense that sentences which are similar in meaning should be assigned the same AMR, even if they are not identically worded. By nature, the AMR language is biased towards English – it is not meant to function as an international auxiliary language.

Contents

Abstract Meaning Representations have originally been introduced by Langkilde and Knight (1998) [3] as a derivation from the Penman Sentence Plan Language, [4] they are thus continuing a long tradition in Natural Language Generation and this has been their original domain of application. AMRs have re-gained attention since Banarescu et al. (2013), [1] in particular, this includes the extension to novel tasks such as machine translation and natural language understanding. The modern (post-2010) AMR format preserves the syntax and many syntactic conceptions of the original AMR format but has been thoroughly revised to better align with PropBank. Moreover, AMR has been extended with formal conventions for metadata and conventions for entity linking (here, linking with Wikipedia entries).

Existing AMR technology includes tools and libraries for parsing, [5] visualization, [6] and surface generation [7] as well as a considerable number of publicly available data sets. Many of these resources are collected at the AMR homepage [8] at ISI/USC where AMR technology has been originally developed.

Example

Example sentence: The boy wants to go.

(w / want-01   :arg0 (b / boy)   :arg1 (g / go-01     :arg0 b)) 

As far as predicate semantics are concerned, the role inventory of PropBank is largely based on semantic role annotations in the style of PropBank. Note that in pre-2010 AMR format, `:arg0` would be `:agent`, etc.


Banarescu et al. (2013) [1] claim that this is equivalent to the following logical formula:

In addition, they claim that this representation makes the will of the boy more explicit, highlighting that the intention of the boy is that he himself goes away (because `want-01` is the type of the top-level predicate).

Uniform Meaning Representations

In an extension of the original AMR formalism, Uniform Meaning Representations (UMR) have been proposed. [9] While grounded in AMR, they eliminate specific characteristics of the English language that are featured in AMR, and are thus more easily applicable cross-linguistically. [9]

Related Research Articles

The following outline is provided as an overview and topical guide to linguistics:

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits two separate levels of syntactic structure, a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

The language of thought hypothesis (LOTH), sometimes known as thought ordered mental expression (TOME), is a view in linguistics, philosophy of mind and cognitive science, forwarded by American philosopher Jerry Fodor. It describes the nature of thought as possessing "language-like" or compositional structure. On this view, simple concepts combine in systematic ways to build thoughts. In its most basic form, the theory states that thought, like language, has syntax.

Montague grammar is an approach to natural language semantics, named after American logician Richard Montague. The Montague grammar is based on mathematical logic, especially higher-order predicate logic and lambda calculus, and makes use of the notions of intensional logic, via Kripke models. Montague pioneered this approach in the 1960s and early 1970s.

In semantics, mathematical logic and related disciplines, the principle of compositionality is the principle that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. The principle is also called Frege's principle, because Gottlob Frege is widely credited for the first modern formulation of it. However, the principle has never been explicitly stated by Frege, and arguably it was already assumed by George Boole decades before Frege's work.

Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation and that can be traced back primarily to the work of Lucien Tesnière. Dependency is the notion that linguistic units, e.g. words, are connected to each other by directed links. The (finite) verb is taken to be the structural center of clause structure. All other syntactic units (words) are either directly or indirectly connected to the verb in terms of the directed links, which are called dependencies. Dependency grammar differs from phrase structure grammar in that while it can identify phrases it tends to overlook phrasal nodes. A dependency structure is determined by the relation between a word and its dependents. Dependency structures are flatter than phrase structures in part because they lack a finite verb phrase constituent, and they are thus well suited for the analysis of languages with free word order, such as Czech or Warlpiri.

In generative grammar and related frameworks, a node in a parse tree c-commands its sister node and all of its sister's descendants. In these frameworks, c-command plays a central role in defining and constraining operations such as syntactic movement, binding, and scope. Tanya Reinhart introduced c-command in 1976 as a key component of her theory of anaphora. The term is short for "constituent command".

SPL is an abstract notation representing the semantics of a sentence in natural language. In a classical Natural Language Generation (NLG) workflow, an initial text plan is transformed by a sentence planner (generator) component to a sequence of sentence plans modelled in a Sentence Plan Language. A surface generator can be used to transform the SPL notation into natural language sentences.

In formal linguistics, discourse representation theory (DRT) is a framework for exploring meaning under a formal semantics approach. One of the main differences between DRT-style approaches and traditional Montagovian approaches is that DRT includes a level of abstract mental representations within its formalism, which gives it an intrinsic ability to handle meaning across sentence boundaries. DRT was created by Hans Kamp in 1981. A very similar theory was developed independently by Irene Heim in 1982, under the name of File Change Semantics (FCS). Discourse representation theories have been used to implement semantic parsers and natural language understanding systems.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

In linguistics, an argument is an expression that helps complete the meaning of a predicate, the latter referring in this context to a main verb and its auxiliaries. In this regard, the complement is a closely related concept. Most predicates take one, two, or three arguments. A predicate and its arguments form a predicate-argument structure. The discussion of predicates and arguments is associated most with (content) verbs and noun phrases (NPs), although other syntactic categories can also be construed as predicates and as arguments. Arguments must be distinguished from adjuncts. While a predicate needs its arguments to complete its meaning, the adjuncts that appear with a predicate are optional; they are not necessary to complete the meaning of the predicate. Most theories of syntax and semantics acknowledge arguments and adjuncts, although the terminology varies, and the distinction is generally believed to exist in all languages. Dependency grammars sometimes call arguments actants, following Lucien Tesnière (1959).

Meaning–text theory (MTT) is a theoretical linguistic framework, first put forward in Moscow by Aleksandr Žolkovskij and Igor Mel’čuk, for the construction of models of natural language. The theory provides a large and elaborate basis for linguistic description and, due to its formal character, lends itself particularly well to computer applications, including machine translation, phraseology, and lexicography.

In natural language processing, semantic role labeling is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.

Formal semantics is the study of grammatical meaning in natural languages using formal tools from logic, mathematics and theoretical computer science. It is an interdisciplinary field, sometimes regarded as a subfield of both linguistics and philosophy of language. It provides accounts of what linguistic expressions mean and how their meanings are composed from the meanings of their parts. The enterprise of formal semantics can be thought of as that of reverse-engineering the semantic components of natural languages' grammars.

In linguistics, selection denotes the ability of predicates to determine the semantic content of their arguments. Predicates select their arguments, which means they limit the semantic content of their arguments. One sometimes draws a distinction between types of selection; one acknowledges both s(emantic)-selection and c(ategory)-selection. Selection in general stands in contrast to subcategorization: predicates both select and subcategorize for their complement arguments, whereas they only select their subject arguments. Selection is a semantic concept, whereas subcategorization is a syntactic one. Selection is closely related to valency, a term used in other grammars than the Chomskian generative grammar, for a similar phenomenon.

Deep linguistic processing is a natural language processing framework which draws on theoretical and descriptive linguistics. It models language predominantly by way of theoretical syntactic/semantic theory. Deep linguistic processing approaches differ from "shallower" methods in that they yield more expressive and structural representations which directly capture long-distance dependencies and underlying predicate-argument structures.
The knowledge-intensive approach of deep linguistic processing requires considerable computational power, and has in the past sometimes been judged as being intractable. However, research in the early 2000s had made considerable advancement in efficiency of deep processing. Today, efficiency is no longer a major problem for applications using deep linguistic processing.

Dynamic Syntax (DS) is a grammar formalism and linguistic theory whose overall aim is to explain the real-time processes of language understanding and production, and describe linguistic structures as happening step-by-step over time. Under the DS approach, syntactic knowledge is understood as the ability to incrementally analyse the structure and content of spoken and written language in context and in real-time. While it posits representations similar to those used in Combinatory categorial grammars (CCG), it builds those representations left-to-right going word-by-word. Thus it differs from other syntactic models which generally abstract way from features of everyday conversation such as interruption, backtracking, and self-correction. Moreover, it differs from other approaches in that it does not postulate an independent level of syntactic structure over words.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

References

  1. 1 2 3 Banarescu, Laura; Bonial, Claire; Cai, Shu; Georgescu, Madalina; Griffitt, Kira; Hermjakob, Ulf; Knight, Kevin; Koehn, Philipp; Palmer, Martha; Schneider, Nathan (2013). Abstract Meaning Representation for Sembanking (PDF). Sofia, Bulgaria: Association for Computational Linguistics. pp. 178–186. Retrieved 28 June 2019.
  2. "Abstract Meaning Representation (AMR)".
  3. Langkilde, Irene  and Knight, Kevin (1998), Generation that Exploits Corpus-Based Statistical Knowledge, In COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics
  4. Kasper, Robert T. (1989). "A flexible interface for linking applications to Penman's sentence generator". Proceedings of the Workshop on Speech and Natural Language - HLT '89. Philadelphia, Pennsylvania: Association for Computational Linguistics: 153–158. doi: 10.3115/100964.100979 .
  5. "Penman 1.2.1 documentation".
  6. Jascob, Brad (2022-03-07), amrlib , retrieved 2022-03-16
  7. Montréal, RALI, Université de (2021-06-25), GoPhi : an AMR to ENGLISH VERBALIZER , retrieved 2022-03-16{{citation}}: CS1 maint: multiple names: authors list (link)
  8. "Abstract Meaning Representation (AMR)". amr.isi.edu. Retrieved 2022-03-16.
  9. 1 2 "umr-guidelines/guidelines.md at master · umr4nlp/umr-guidelines". GitHub. Retrieved 2023-10-05.