Academics | |
---|---|
Disciplines: | Natural language processing Computational linguistics Semantics |
Umbrella Organization: | ACL-SIGLEX hhdgdhc |
Workshop Overview | |
Founded: | 1998 (Senseval [usurped] ) |
Latest: | SemEval-2015 NAACL @ Denver, USA |
Upcoming: | SemEval-2018 |
History | |
Senseval-1 | 1998 @ Sussex |
Senseval-2 | 2001 @ Toulouse |
Senseval-3 [usurped] | 2004 @ Barcelona |
SemEval-2007 | 2007 @ Prague |
SemEval-2010 | 2010 @ Uppsala |
SemEval-2012 | 2012 @ Montreal |
SemEval-2013 | 2013 @ Atlanta |
SemEval-2014 | 2014 @ Dublin |
SemEval-2015 | 2015 @ Denver |
SemEval-2016 | 2016 @ San Diego |
SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.
This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word senses computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling), relations between sentences (e.g., coreference), and the nature of what we are saying (semantic relations and sentiment analysis).
The purpose of the SemEval and Senseval exercises is to evaluate semantic analysis systems. "Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation. [1]
The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation (WSD), each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. [2]
Triggered by the conception of the *SEM conference, the SemEval community had decided to hold the evaluation workshops yearly in association with the *SEM conference. It was also the decision that not every evaluation task will be run every year, e.g. none of the WSD tasks were included in the SemEval-2012 workshop.
From the earliest days, assessing the quality of word sense disambiguation algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”. [3] Only very recently (2006) had extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications. [4] Until 1990 or so, discussions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words. [5]
In April 1997, Martha Palmer and Marc Light organized a workshop entitled Tagging with Lexical Semantics: Why, What, and How? in conjunction with the Conference on Applied Natural Language Processing. [6] At the time, there was a clear recognition that manually annotated corpora had revolutionized other areas of NLP, such as part-of-speech tagging and parsing, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well. [7] Kilgarriff recalled that there was "a high degree of consensus that the field needed evaluation", and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises. [8] [9] [10]
After SemEval-2010, many participants feel that the 3-year cycle is a long wait. Many other shared tasks such as Conference on Natural Language Learning (CoNLL) and Recognizing Textual Entailments (RTE) run annually. For this reason, the SemEval coordinators gave the opportunity for task organizers to choose between a 2-year or a 3-year cycle. [11] The SemEval community favored the 3-year cycle.
Although the votes within the SemEval community favored a 3-year cycle, organizers and coordinators had settled to split the SemEval task into 2 evaluation workshops. This was triggered by the introduction of the new *SEM conference. The SemEval organizers thought it would be appropriate to associate our event with the *SEM conference and collocate the SemEval workshop with the *SEM conference. The organizers got very positive responses (from the task coordinators/organizers and participants) about the association with the yearly *SEM, and 8 tasks were willing to switch to 2012. Thus was born SemEval-2012 and SemEval-2013. The current plan is to switch to a yearly SemEval schedule to associate it with the *SEM conference but not every task needs to run every year. [12]
The framework of the SemEval/Senseval evaluation workshops emulates the Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)).
Stages of SemEval/Senseval evaluation workshops [14]
Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that looked into wider areas of semantics, such as Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms) and Senseval-3 explored performances of semantics analysis on Machine translation.
As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated.
The SemEval exercises provide a mechanism for examining issues in semantic analysis of texts. The topics of interest fall short of the logical rigor that is found in formal computational semantics, attempting to identify and characterize the kinds of issues relevant to human understanding of language. The primary goal is to replicate human processing by means of computer systems. The tasks (shown below) are developed by individuals and groups to deal with identifiable issues, as they take on some concrete form.
The first major area in semantic analysis is the identification of the intended meaning at the word level (taken to include idiomatic expressions). This is word-sense disambiguation (a concept that is evolving away from the notion that words have discrete senses, but rather are characterized by the ways in which they are used, i.e., their contexts). The tasks in this area include lexical sample and all-word disambiguation, multi- and cross-lingual disambiguation, and lexical substitution. Given the difficulties of identifying word senses, other tasks relevant to this topic include word-sense induction, subcategorization acquisition, and evaluation of lexical resources.
The second major area in semantic analysis is the understanding of how different sentence and textual elements fit together. Tasks in this area include semantic role labeling, semantic relation analysis, and coreference resolution. Other tasks in this area look at more specialized issues of semantic analysis, such as temporal information processing, metonymy resolution, and sentiment analysis. The tasks in this area have many potential applications, such as information extraction, question answering, document summarization, machine translation, construction of thesauri and semantic networks, language modeling, paraphrasing, and recognizing textual entailment. In each of these potential applications, the contribution of the types of semantic analysis constitutes the most outstanding research issue.
For example, in the word sense induction and disambiguation task, there are three separate phases:
The unsupervised evaluation for WSI considered two types of evaluation V Measure (Rosenberg and Hirschberg, 2007), and paired F-Score (Artiles et al., 2009). This evaluation follows the supervised evaluation of SemEval-2007 WSI task (Agirre and Soroa, 2007)
The tables below reflects the workshop growth from Senseval to SemEval and gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.
Workshop | No. of Tasks | Areas of study | Languages of Data Evaluated |
---|---|---|---|
Senseval-1 | 3 | Word Sense Disambiguation (WSD) - Lexical Sample WSD tasks | English, French, Italian |
Senseval-2 | 12 | Word Sense Disambiguation (WSD) - Lexical Sample, All Words, Translation WSD tasks | Czech, Dutch, English, Estonian, Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish |
Senseval-3 | 16 (incl. 2 cancelled) | Logic Form Transformation, Machine Translation (MT) Evaluation, Semantic Role Labelling, WSD | Basque, Catalan, Chinese, English, Italian, Romanian, Spanish |
SemEval2007 | 19 (incl. 1 cancelled) | Cross-lingual, Frame Extraction, Information Extraction, Lexical Substitution, Lexical Sample, Metonymy, Semantic Annotation, Semantic Relations, Semantic Role Labelling, Sentiment Analysis, Time Expression, WSD | Arabic, Catalan, Chinese, English, Spanish, Turkish |
SemEval2010 | 18 (incl. 1 cancelled) | Coreference, Cross-lingual, Ellipsis, Information Extraction, Lexical Substitution, Metonymy, Noun Compounds, Parsing, Semantic Relations, Semantic Role Labeling, Sentiment Analysis, Textual Entailment, Time Expressions, WSD | Catalan, Chinese, Dutch, English, French, German, Italian, Japanese, Spanish |
SemEval2012 | 8 | Common Sense Reasoning, Lexical Simplification, Relational Similarity, Spatial Role Labelling, Semantic Dependency Parsing, Semantic and Textual Similarity | Chinese, English |
SemEval2013 | 14 | Temporal Annotation, Sentiment Analysis, Spatial Role Labeling, Noun Compounds, Phrasal Semantics, Textual Similarity, Response Analysis, Cross-lingual Textual Entailment, BioMedical Texts, Cross and Multilingual WSD, Word Sense Induction, and Lexical Sample | Catalan, French, German, English, Italian, Spanish |
SemEval2014 | 10 | Compositional Distributional Semantic, Grammar Induction for Spoken Dialogue Systems, Cross-Level Semantic Similarity, Sentiment Analysis, L2 Writing Assistant, Supervised Semantic Parsing, Clinical Text Analysis, Semantic Dependency Parsing, Sentiment Analysis in Twitter, Multilingual Semantic Textual Similarity | English, Spanish, French, German, Dutch, |
SemEval2015 | 18 (incl. 1 cancelled) | Text Similarity and Question Answering, Time and Space, Sentiment, Word Sense Disambiguation and Induction, Learning Semantic Relations | English, Spanish, Arabic, Italian |
SemEval2016 | 14 | Textual Similarity and Question Answering, Sentiment Analysis, Semantic Parsing, Semantic Analysis, Semantic Taxonomy | |
SemEval2017 | 12 [15] | Semantic comparison for words and texts, Detecting sentiment, humor, and truth and Parsing semantic structures | |
SemEval2018 | 12 [16] | Affect and Creative Language in Tweets, Coreference, Information Extraction, Lexical Semantics and Reading Comprehension and Reasoning |
The Multilingual WSD task was introduced for the SemEval-2013 workshop. [17] The task is aimed at evaluating Word Sense Disambiguation systems in a multilingual scenario using BabelNet as its sense inventory. Unlike similar task like crosslingual WSD or the multilingual lexical substitution task, where no fixed sense inventory is specified, Multilingual WSD uses the BabelNet as its sense inventory. Prior to the development of BabelNet, a bilingual lexical sample WSD evaluation task was carried out in SemEval-2007 on Chinese-English bitexts. [18]
The Cross-lingual WSD task was introduced in the SemEval-2007 evaluation workshop and re-proposed in the SemEval-2013 workshop . [19] To facilitate the ease of integrating WSD systems into other Natural Language Processing (NLP) applications, such as Machine Translation and multilingual Information Retrieval, the cross-lingual WSD evaluation task was introduced a language-independent and knowledge-lean approach to WSD. The task is an unsupervised Word Sense Disambiguation task for English nouns by means of parallel corpora. It follows the lexical-sample variant of the Classic WSD task, restricted to only 20 polysemous nouns.
It is worth noting that the SemEval-2014 have only two tasks that were multilingual/crosslingual, i.e. (i) the L2 Writing Assistant task, which is a crosslingual WSD task that includes English, Spanish, German, French and Dutch and (ii) the Multilingual Semantic Textual Similarity task that evaluates systems on English and Spanish texts.
The major tasks in semantic evaluation include the following areas of natural language processing. This list is expected to grow as the field progresses. [20]
The following table shows the areas of studies that were involved in Senseval-1 through SemEval-2014 (S refers to Senseval and SE refers to SemEval, e.g. S1 refers to Senseval-1 and SE07 refers to SemEval2007):
Areas of Study | S1 | S2 | S3 | SE07 | SE10 | SE12 | SE13 | SE14 | SE15 | SE16 | SE17 |
---|---|---|---|---|---|---|---|---|---|---|---|
Bioinfomatics / Clinical Text Analysis | |||||||||||
Common Sense Reasoning (COPA) | |||||||||||
Coreference Resolution | |||||||||||
Noun Compounds (Information Extraction) | |||||||||||
Ellipsis | |||||||||||
Grammar Induction | |||||||||||
Keyphrase Extraction (Information Extraction) | |||||||||||
Lexical Simplification | |||||||||||
Lexical Substitution (Multilingual or Crosslingual) | |||||||||||
Lexical Complexity | |||||||||||
Metonymy (Information Extraction) | |||||||||||
Paraphrases | |||||||||||
Question Answering | |||||||||||
Relational Similarity | |||||||||||
Rumour and veracity | |||||||||||
Semantic Parsing | |||||||||||
Semantic Relation Identification | |||||||||||
Semantic Role Labeling | |||||||||||
Semantic Similarity | |||||||||||
Semantic Similarity (Crosslingual) | |||||||||||
Semantic Similarity (Multilingual) | |||||||||||
Sentiment Analysis | |||||||||||
Spatial Role Labelling | |||||||||||
Taxonomy Induction/Enrichment | |||||||||||
Textual Entailment | |||||||||||
Textual Entailment (Cross-lingual) | |||||||||||
Temporal annotation | |||||||||||
Twitter Analysis | |||||||||||
Word sense disambiguation (Lexical Sample) | |||||||||||
Word sense disambiguation (All-Words) | |||||||||||
Word sense disambiguation (Multilingual) | |||||||||||
Word sense disambiguation (Cross-lingual) | |||||||||||
Word sense induction | |||||||||||
SemEval tasks have created many types of semantic annotations, each type with various schema. In SemEval-2015, the organizers have decided to group tasks together into several tracks. These tracks are by the type of semantic annotations that the task hope to achieve. [21] Here lists the type of semantic annotations involved in the SemEval workshops:
A task and its track allocation is flexible; a task might develop into its own track, e.g. the taxonomy evaluation task in SemEval-2015 was under the Learning Semantic Relations track and in SemEval-2016, there is a dedicated track for Semantic Taxonomy with a new Semantic Taxonomy Enrichment task. [22] [23]
Natural language processing (NLP) is an interdisciplinary subfield of computer science and artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. There are now WordNets in more than 200 languages.
Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.
The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on natural language processing. Its namesake conference is one of the primary high impact conferences for natural language processing research, along with EMNLP. The conference is held each summer in locations where significant computational linguistics research is carried out.
Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computational linguistics.
Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.
Language resource management – Lexical markup framework, produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication.
Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the match, replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of game might be given.
The knowledge acquisition bottleneck is perhaps the major impediment to solving the word-sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases. Supervised learning methods depend heavily on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the Senseval exercises.
In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word. Given that the output of word-sense induction is a set of senses for the target word, this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.
Classic monolingual Word Sense Disambiguation evaluation tasks uses WordNet as its sense inventory and is largely based on supervised / semi-supervised classification with the manually sense annotated corpora:
BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions in many languages harvested from both WordNet and Wikipedia.
The following outline is provided as an overview of and topical guide to natural-language processing:
Temporal annotation is the study of how to automatically add semantic information regarding time to natural language documents. It plays a role in natural language processing and computational linguistics.
In natural language processing (NLP), a text graph is a graph representation of a text item. It is typically created as a preprocessing step to support NLP tasks such as text condensation term disambiguation (topic-based) text summarization, relation extraction and textual entailment.
UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on the ISO standard Lexical Markup Framework (LMF) and combines information from several expert-constructed and collaboratively constructed resources for English and German.
A semantic decomposition is an algorithm that breaks down the meanings of phrases or concepts into less complex concepts. The result of a semantic decomposition is a representation of meaning. This representation can be used for tasks, such as those related to artificial intelligence or machine learning. Semantic decomposition is common in natural language processing applications.
Preslav Nakov is a computer scientist who works on natural language processing. He is particularly known for his research on fake news detection, automatic detection of offensive language, and biomedical text mining. Nakov obtained a PhD in computer science under the supervision of Marti Hearst from the University of California, Berkeley. He was the first person to receive the prestigious John Atanasov Presidential Award for achievements in the development of the information society by the President of Bulgaria.
Mona Talat Diab is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning.
Ellen Riloff is an American computer scientist currently serving as a professor at the School of Computing at the University of Utah. Her research focuses on natural language processing and computational linguistics, specifically information extraction, sentiment analysis, semantic class induction, and bootstrapping methods that learn from unannotated texts.