Ruslan Mitkov | |
---|---|
Alma mater | Humboldt University Technical University of Dresden |
Known for | Contributions to Anaphora Resolution, and automatic generation of multiple choice questions |
Scientific career | |
Institutions | Lancaster University Institute of Mathematics and Informatics (Bulgarian Academy of Sciences) Saarland University University of Hamburg |
Thesis | Beiträge zum computergestützten Wissenstesten (1987) |
Doctoral advisor | Nikolaus Joachim Lehmann |
Ruslan Mitkov is a professor at Lancaster University, and a researcher in Natural Language Processing and Computational Linguistics. He completed his PhD at Technical University of Dresden under the supervision of Nikolaus Joachim Lehmann. He has published more than 240 refereed papers and is best known for his contributions to Anaphora Resolution, [1] [2] [3] and his seminal work in computer-aided generation of multiple-choice tests [4] [5] among others.
Mitkov is the sole editor of the Oxford Handbook of Computational Linguistics (Oxford University Press) [6] and the author of the book Anaphora Resolution (published by Longman), which have become standard textbooks in their fields. [7] [8] [9] He is also the co-founder and editor-in-chief of the Cambridge journal Natural Language Engineering [10] and the editor-in-chief of John Benjamins’ book series in Natural Language Processing. [11]
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
In linguistics and related fields, pragmatics is the study of how context contributes to meaning. The field of study evaluates how human language is utilized in social interactions, as well as the relationship between the interpreter and the interpreted. Linguists who specialize in pragmatics are called pragmaticians. The field has been represented since 1986 by the International Pragmatics Association (IPrA).
In linguistics, deixis is the use of general words and phrases to refer to a specific time, place, or person in context, e.g., the words tomorrow, there, and they. Words are deictic if their semantic meaning is fixed but their denoted meaning varies depending on time and/or place. Words or phrases that require contextual information to be fully understood—for example, English pronouns—are deictic. Deixis is closely related to anaphora. Although this article deals primarily with deixis in spoken language, the concept is sometimes applied to written language, gestures, and communication media as well. In linguistic anthropology, deixis is treated as a particular subclass of the more general semiotic phenomenon of indexicality, a sign "pointing to" some aspect of its context of occurrence.
Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.
In linguistics, anaphora is the use of an expression whose interpretation depends upon another expression in context. In a narrower sense, anaphora is the use of an expression that depends specifically upon an antecedent expression and thus is contrasted with cataphora, which is the use of an expression that depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally. In the sentence Before her arrival, nobody saw Sally, the pronoun her refers forward to the postcedent Sally, so her is now a cataphor. Usually, an anaphoric expression is a pro-form or some other kind of deictic expression. Both anaphora and cataphora are species of endophora, referring to something mentioned elsewhere in a dialog or text.
In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in Bill said Alice would arrive soon, and she did, the words Alice and she refer to the same person.
Aravind Krishna Joshi was the Henry Salvatori Professor of Computer and Cognitive Science in the computer science department of the University of Pennsylvania. Joshi defined the tree-adjoining grammar formalism which is often used in computational linguistics and natural language processing.
Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computational linguistics.
Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is a relatively new area, with the first dedicated conference organized in 1996.
Donkey sentences are sentences that contain a pronoun with clear meaning but whose syntactical role in the sentence poses challenges to grammarians. Such sentences defy straightforward attempts to generate their formal language equivalents. The difficulty is with understanding how English speakers parse such sentences.
CICLing is an annual conference on computational linguistics (CL) and natural language processing (NLP). The first CICLing conference was held in 2000 in Mexico City. The conference is attended by one to two hundred of NLP and CL researchers and students every year. As of 2017, it is ranked within top 20 sources on computational linguistics by Google Scholar. Past CICLing conferences have been held in Mexico, Korea, Israel, Romania, Japan, India, Greece, Nepal, Egypt, Turkey, Hungary, and Vietnam; the 2019 event was held in France.
Text, Speech and Dialogue (TSD) is an annual conference involving topics on natural language processing and computational linguistics. The meeting is held every September alternating in Brno and Plzeň, Czech Republic.
The following outline is provided as an overview of and topical guide to natural-language processing:
The Winograd schema challenge (WSC) is a test of machine intelligence proposed in 2012 by Hector Levesque, a computer scientist at the University of Toronto. Designed to be an improvement on the Turing test, it is a multiple-choice test that employs questions of a very specific structure: they are instances of what are called Winograd schemas, named after Terry Winograd, professor of computer science at Stanford University.
Natural Language Engineering is a bimonthly peer-reviewed academic journal published by Cambridge University Press which covers research and software in natural language processing. Its aim is to "bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use". Other than original publication on theoretical and applied aspects of computational linguistics, the journal also contains Industry Watch and Emerging Trends columns tracking developments in the field. The editor-in-chief is Ruslan Mitkov from Lancaster University. According to the Journal Citation Reports, the journal has a 2016 impact factor of 1.065. Its current 5-year impact factor is 2.203 and it will be fully open access starting in 2024.
Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages. These treebanks are openly accessible and available. Core applications are automated text processing in the field of natural language processing (NLP) and research into natural language syntax and grammar, especially within linguistic typology. The project's primary aim is to achieve cross-linguistic consistency of annotation, while still permitting language-specific extensions when necessary. The annotation scheme has it roots in three related projects: Stanford Dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets. The UD annotation scheme uses a representation in the form of dependency trees as opposed to a phrase structure trees. At the present time, there are just over 200 treebanks of more than 100 languages available in the UD inventory.
Bonnie Lynn Nash-Webber is a computational linguist. She is an honorary professor of intelligent systems in the Institute for Language, Cognition and Computation (ILCC) at the University of Edinburgh.
Mira Ariel is a professor of linguistics at Tel Aviv University, specializing in pragmatics. A pioneer of the study of information structure, she is best known for creating and developing Accessibility Theory.
DisCoCat is a mathematical framework for natural language processing which uses category theory to unify distributional semantics with the principle of compositionality. The grammatical derivations in a categorial grammar are interpreted as linear maps acting on the tensor product of word vectors to produce the meaning of a sentence or a piece of text. String diagrams are used to visualise information flow and reason about natural language semantics.