Preslav Nakov | |
---|---|
Born | 1977-01-26 |
Nationality | Bulgarian |
Alma mater | University of California, Berkeley (PhD in Computer Science) Sofia University (MSc in Computer Science) |
Known for | Natural language processing Detecting fake news online Sentiment analysis |
Scientific career | |
Fields | Computer Science |
Institutions | Qatar Computing Research Institute National University of Singapore Sofia University Bulgarian Academy of Sciences University of California, Berkeley University College London |
Thesis | Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics (2007) |
Doctoral advisor | Marti Hearst |
Website | Personal website |
Preslav Nakov (born on 26 January 1977 in Veliko Turnovo, Bulgaria) is a computer scientist who works on natural language processing. He is particularly known for his research on fake news detection, [1] automatic detection of offensive language, [2] and biomedical text mining. [3] Nakov obtained a PhD in computer science under the supervision of Marti Hearst from the University of California, Berkeley. He was the first person to receive the prestigious John Atanasov Presidential Award for achievements in the development of the information society by the President of Bulgaria. [4]
Preslav Nakov grew up in Veliko Turnovo, Bulgaria, where he attended primary and secondary school, obtaining a Diploma in Mathematics from the Secondary School of Mathematics and Natural Sciences 'Vassil Drumev' in 1996. He then obtained a MSc degree in Informatics (Computer Science) with specialisations in Artificial Intelligence and Information and Communication Technologies from Sofia University in 2011. During his MSc studies, he worked as a teaching assistant at Sofa University and the Bulgarian Academy of Sciences, as well as a guest lecturer at University College London during a visit in Spring 1999. Subsequently, he enrolled into the PhD program at the Department of Electrical Engineering and Computer Science, University of California, Berkeley, partly supported by a Fulbright Scholarship. Under the supervision of Marti Hearst, he wrote a thesis on the topic of text mining from the Web, and graduated with a PhD in Computer Science from UC Berkeley in 2007. [5]
Upon graduating from the University of California, Berkeley, Nakov started work as a Research Fellow at the National University of Singapore. Since 2012, he has been a Senior Scientist at the Qatar Computing Research Institute (QCRI). He maintains a position as an honorary lecturer at Sofia University.
Preslav Nakov works in the area of natural language processing and text mining. He has published over 300 peer-reviewed research papers. [6] Preslav Nakov's early research was on lexical semantics and text mining. He published influential papers on biomedical text mining, most prominently on methods to identify citation sentences in biomedical papers. [3] He is though most well-known for his research on fake news detection, such as his work on predicting the factuality and bias of news sources, [1] as well as for his research on the automatic detection of offensive language. [2] Nakov also previously led the organisation of a popular evaluation campaign on sentiment analysis systems as part of SemEval between the years of 2015 and 2017. [7] He currently coordinates the Tanbih News Aggregator project, a large project with partners at the Qatar Computing Research Institute and the MIT Computer Science and Artificial Intelligence Laboratory, which aims to uncover stance, bias and propaganda in news. [8]
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.
Text mining, also referred to as text data mining, similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a KDD process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.
Charles J. Fillmore was an American linguist and Professor of Linguistics at the University of California, Berkeley. He received his Ph.D. in Linguistics from the University of Michigan in 1961. Fillmore spent ten years at Ohio State University and a year as a Fellow at the Center for Advanced Study in the Behavioral Sciences at Stanford University before joining Berkeley's Department of Linguistics in 1971. Fillmore was extremely influential in the areas of syntax and lexical semantics.
Jerry R. Hobbs is an American researcher in the fields of computational linguistics, discourse analysis, and artificial intelligence.
Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computational linguistics.
In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.
Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called Distributional hypothesis: linguistic items with similar distributions have similar meanings.
Terminology extraction is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus.
Yorick Wilks FBCS, a British computer scientist, is Emeritus Professor of Artificial Intelligence at the University of Sheffield, Visiting Professor of Artificial Intelligence at Gresham College, Former Senior Research Fellow at the Oxford Internet Institute, Senior Scientist at the Florida Institute for Human and Machine Cognition, and a member of the Epiphany Philosophers.
In natural language processing, semantic role labeling is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.
Semantic computing is a field of computing that combines elements of semantic analysis, natural language processing, data mining, knowledge graphs, and related fields.
SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.
Semantic analysis (computational) within applied linguistics and computer science, is a composite of semantic analysis and computational components. Semantic analysis refers to a formal analysis of meaning, and computational refers to approaches that in principle support effective implementation in digital computers.
Dragomir R. Radev is a Yale University professor of computer science working on natural language processing and information retrieval. He previously served as a University of Michigan computer science professor and Columbia University computer science adjunct professor. Radev serves as Member of the Advisory Board of Lawyaw.
The following outline is provided as an overview of and topical guide to natural-language processing:
Marti Hearst is a professor in the School of Information at the University of California, Berkeley. She did early work in corpus-based computational linguistics, including some of the first work in automating sentiment analysis, and word sense disambiguation. She invented an algorithm that became known as "Hearst patterns" which applies lexico-syntactic patterns to recognize hyponymy (ISA) relations with high accuracy in large text collections, including an early application of it to WordNet; this algorithm is widely used in commercial text mining applications including ontology learning. Hearst also developed early work in automatic segmentation of text into topical discourse boundaries, inventing a now well-known approach called TextTiling.
Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations.
Mona Talat Diab is a computer science professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning.
Ellen Riloff is an American computer scientist currently serving as a professor at the School of Computing at the University of Utah. Her research focuses on Natural Language Processing and Computational Linguistics, specifically information extraction, sentiment analysis, semantic class induction, and bootstrapping methods that learn from unannotated texts.