Semantic role labeling

Last updated

In natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.

Contents

It serves to find the meaning of the sentence. To do this, it detects the arguments associated with the predicate or verb of a sentence and how they are classified into their specific roles. A common example is the sentence "Mary sold the book to John." The agent is "Mary," the predicate is "sold" (or rather, "to sell,") the theme is "the book," and the recipient is "John." Another example is how "the book belongs to me" would need two labels such as "possessed" and "possessor" and "the book was sold to John" would need two other labels such as theme and recipient, despite these two clauses being similar to "subject" and "object" functions. [1]

History

In 1968, the first idea for semantic role labeling was proposed by Charles J. Fillmore. [2] His proposal led to the FrameNet project which produced the first major computational lexicon that systematically described many predicates and their corresponding roles. Daniel Gildea (Currently at University of Rochester, previously University of California, Berkeley / International Computer Science Institute) and Daniel Jurafsky (currently teaching at Stanford University, but previously working at University of Colorado and UC Berkeley) developed the first automatic semantic role labeling system based on FrameNet. The PropBank corpus added manually created semantic role annotations to the Penn Treebank corpus of Wall Street Journal texts. Many automatic semantic role labeling systems have used PropBank as a training dataset to learn how to annotate new sentences automatically. [3]

Uses

Semantic role labeling is mostly used for machines to understand the roles of words within sentences. [4] This benefits applications similar to Natural Language Processing programs that need to understand not just the words of languages, but how they can be used in varying sentences. [5] A better understanding of semantic role labeling could lead to advancements in question answering, information extraction, automatic text summarization, text data mining, and speech recognition. [6]

See also

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.

Shallow parsing is an analysis of a sentence which first identifies constituent parts of sentences and then links them to higher order units that have discrete grammatical meanings. While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns, approaches that use machine learning techniques can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents. That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.

<span class="mw-page-title-main">Charles J. Fillmore</span> American linguist

Charles J. Fillmore was an American linguist and Professor of Linguistics at the University of California, Berkeley. He received his Ph.D. in Linguistics from the University of Michigan in 1961. Fillmore spent ten years at Ohio State University and a year as a Fellow at the Center for Advanced Study in the Behavioral Sciences at Stanford University before joining Berkeley's Department of Linguistics in 1971. Fillmore was extremely influential in the areas of syntax and lexical semantics.

FrameNet is a research and resource development project based at the International Computer Science Institute (ICSI) in Berkeley, California, which has produced an electronic resource based on a theory of meaning called frame semantics. The data that FrameNet has analyzed show that the sentence "John sold a car to Mary" essentially describes the same basic situation as "Mary bought a car from John", just from a different perspective. A semantic frame is a conceptual structure describing an event, relation, or object along with its participants. The FrameNet lexical database contains over 1,200 semantic frames, 13,000 lexical units and 202,000 example sentences. Charles J. Fillmore, who developed the theory of frame semantics which serves as the theoretical the basis of FrameNet, founded the project in 1997 and continued to lead the effort until he died in 2014. Frame Semantic theory and FrameNet have been influential in linguistics and natural language processing, where it led to the task of automatic Semantic Role Labeling.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

Daniel Jurafsky is a professor of linguistics and computer science at Stanford University, and also an author. With Daniel Gildea, he is known for developing the first automatic system for semantic role labeling (SRL). He is the author of The Language of Food: A Linguist Reads the Menu (2014) and a textbook on speech and language processing (2000). Jurafsky was given a MacArthur Fellowship in 2002.

PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer et al., the term propbank is also coming to be used as a common noun referring to any corpus that has been annotated with propositions and their arguments.

Linguistic categories include

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">Word embedding</span> Method in natural language processing

In natural language processing (NLP), word embedding is a term used for the representation of words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using a set of language modeling and feature learning techniques where words or phrases from the vocabulary are mapped to vectors of real numbers.

Abstract Meaning Representation (AMR) is a semantic representation language. AMR graphs are rooted, labeled, directed, acyclic graphs (DAGs), comprising whole sentences. They are intended to abstract away from syntactic representations, in the sense that sentences which are similar in meaning should be assigned the same AMR, even if they are not identically worded. By nature, the AMR language is biased towards English – it is not meant to function as an international auxiliary language.

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations.

Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages. These treebanks are openly accessible and available. Core applications are automated text processing in the field of natural language processing (NLP) and research into natural language syntax and grammar, especially within linguistic typology. The project's primary aim is to achieve cross-linguistic consistency of annotation, while still permitting language-specific extensions when necessary. The annotation scheme has it roots in three related projects: Stanford Dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets. The UD annotation scheme uses a representation in the form of dependency trees as opposed to a phrase structure trees. At the present time, there are just over 200 treebanks of more than 100 languages available in the UD inventory.

Syntactic parsing is the automatic analysis of syntactic structure of natural language, especially syntactic relations and labelling spans of constituents. It is motivated by the problem of structural ambiguity in natural language: a sentence can be assigned multiple grammatical parses, so some kind of knowledge beyond computational grammar rules are need to tell which parse is intended. Syntactic parsing is one of the important tasks in computational linguistics and natural language processing, and has been a subject of research since the mid-20th century with the advent of computers.

References

  1. Laux, Michael (2019-01-13). "If you did not already know". SunJackson Blog (in Simplified Chinese). Retrieved 2020-12-08.
  2. Boas, Hans; Dux, Ryan. "From the past into the present: From case frames to semantic frames" (PDF).
  3. Gildea, Daniel; Jurafsky, Daniel (2000). "Automatic labeling of semantic roles". Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00. Hong Kong: Association for Computational Linguistics: 512–520. doi: 10.3115/1075218.1075283 .
  4. Nizamani, Sarwat; Memon, Nasrullah; Nizamani, Saad; Nizamani, Sehrish (August 2017). "TDC: Typed Dependencies-Based Chunking Model". Arabian Journal for Science and Engineering. 42 (8): 3585–3595. doi:10.1007/s13369-017-2587-y. ISSN   2193-567X. S2CID   67233431.
  5. Park, Jaehui (2019). "Selectively Connected Self-Attentions for Semantic Role Labeling". Applied Sciences. 9 (8) via ProQuest.
  6. Gildea, Daniel; Jurafsky, Daniel. "Automatic Labeling of Semantic Roles" (PDF). Association for Computational Linguistics. 28 (3).