Preslav Nakov

Last updated
Preslav Nakov
Born (1977-01-26) 26 January 1977 (age 47)
NationalityBulgarian
Alma mater University of California, Berkeley (PhD in Computer Science)
Sofia University (MSc in Computer Science)
Known for Natural language processing
Detecting fake news online
Sentiment analysis
Scientific career
Fields Computer Science
Institutions Qatar Computing Research Institute
National University of Singapore
Sofia University
Bulgarian Academy of Sciences
University of California, Berkeley
University College London
Thesis Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics  (2007)
Doctoral advisor Marti Hearst
Website Personal website

Preslav Nakov (born on 26 January 1977 in Veliko Turnovo, Bulgaria) is a computer scientist who works on natural language processing. He is particularly known for his research on fake news detection, [1] automatic detection of offensive language, [2] and biomedical text mining. [3] Nakov obtained a PhD in computer science under the supervision of Marti Hearst from the University of California, Berkeley. He was the first person to receive the prestigious John Atanasov Presidential Award for achievements in the development of the information society by the President of Bulgaria. [4]

Contents

Education

Preslav Nakov grew up in Veliko Turnovo, Bulgaria, where he attended primary and secondary school, obtaining a Diploma in Mathematics from the Secondary School of Mathematics and Natural Sciences 'Vassil Drumev' in 1996. He then obtained a MSc degree in Informatics (Computer Science) with specialisations in Artificial Intelligence and Information and Communication Technologies from Sofia University in 2011. During his MSc studies, he worked as a teaching assistant at Sofa University and the Bulgarian Academy of Sciences, as well as a guest lecturer at University College London during a visit in Spring 1999. Subsequently, he enrolled into the PhD program at the Department of Electrical Engineering and Computer Science, University of California, Berkeley, partly supported by a Fulbright Scholarship. Under the supervision of Marti Hearst, he wrote a thesis on the topic of text mining from the Web, and graduated with a PhD in Computer Science from UC Berkeley in 2007. [5]

Career

Upon graduating from the University of California, Berkeley, Nakov started work as a Research Fellow at the National University of Singapore. Since 2012, he has been a Senior Scientist at the Qatar Computing Research Institute (QCRI). He maintains a position as an honorary lecturer at Sofia University.

Research

Preslav Nakov works in the area of natural language processing and text mining. He has published over 300 peer-reviewed research papers. [6] Preslav Nakov's early research was on lexical semantics and text mining. He published influential papers on biomedical text mining, most prominently on methods to identify citation sentences in biomedical papers. [3] He is though most well-known for his research on fake news detection, such as his work on predicting the factuality and bias of news sources, [1] as well as for his research on the automatic detection of offensive language. [2] Nakov also previously led the organisation of a popular evaluation campaign on sentiment analysis systems as part of SemEval between the years of 2015 and 2017. [7] He currently coordinates the Tanbih News Aggregator project, a large project with partners at the Qatar Computing Research Institute and the MIT Computer Science and Artificial Intelligence Laboratory, which aims to uncover stance, bias and propaganda in news. [8]

Selected honors and distinctions

Related Research Articles

Natural language processing (NLP) is an interdisciplinary subfield of computer science and artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

A paraphrase or rephrase is the rendering of the same text in different words without losing the meaning of the text itself. More often than not, a paraphrased text can convey its meaning better than the original words. In other words, it is a copy of the text in meaning, but which is different from the original. For example, when someone tells a story they heard in own words, they paraphrase, with the meaning being the same. The term itself is derived via Latin paraphrasis, from Ancient Greek παράφρασις (paráphrasis) 'additional manner of expression'. The act of paraphrasing is also called paraphrasis.

<span class="mw-page-title-main">Charles J. Fillmore</span> American linguist

Charles J. Fillmore was an American linguist and Professor of Linguistics at the University of California, Berkeley. He received his Ph.D. in Linguistics from the University of Michigan in 1961. Fillmore spent ten years at Ohio State University and a year as a Fellow at the Center for Advanced Study in the Behavioral Sciences at Stanford University before joining Berkeley's Department of Linguistics in 1971. Fillmore was extremely influential in the areas of syntax and lexical semantics.

Jerry R. Hobbs is an American researcher in the fields of computational linguistics, discourse analysis, and artificial intelligence.

Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computational linguistics.

In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

In natural language processing, semantic role labeling is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of an agent, goal, or result.

Semantic computing is a field of computing that combines elements of semantic analysis, natural language processing, data mining, knowledge graphs, and related fields.

In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear approximately equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. The "topics" produced by topic modeling techniques are clusters of similar words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document's balance of topics is.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

Semantic analysis (computational) within applied linguistics and computer science, is a composite of semantic analysis and computational components. Semantic analysis refers to a formal analysis of meaning, and computational refers to approaches that in principle support effective implementation in digital computers.

Dragomir R. Radev was an American computer scientist who was a professor at Yale University, working on natural language processing and information retrieval. He also served as a University of Michigan computer science professor and Columbia University computer science adjunct professor, as well as a Member of the Advisory Board of Lawyaw.

In natural language processing, textual entailment (TE), also known as natural language inference (NLI), is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text.

The following outline is provided as an overview of and topical guide to natural-language processing:

<span class="mw-page-title-main">Marti Hearst</span> American computer scientist

Martha Alice Hearst is a professor in the School of Information at the University of California, Berkeley. She did early work in corpus-based computational linguistics, including some of the first work in automating sentiment analysis, and word sense disambiguation. She invented an algorithm that became known as "Hearst patterns" which applies lexico-syntactic patterns to recognize hyponymy (ISA) relations with high accuracy in large text collections, including an early application of it to WordNet; this algorithm is widely used in commercial text mining applications including ontology learning. Hearst also developed early work in automatic segmentation of text into topical discourse boundaries, inventing a now well-known approach called TextTiling.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

Mona Talat Diab is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning.

Ellen Riloff is an American computer scientist currently serving as a professor at the School of Computing at the University of Utah. Her research focuses on natural language processing and computational linguistics, specifically information extraction, sentiment analysis, semantic class induction, and bootstrapping methods that learn from unannotated texts.

References

  1. 1 2 Baly, Ramy; Karadzhov, Georgi; Alexandrov, Dimitar; Glass, James; Nakov, Preslav (2018-11-01). "Predicting Factuality of Reporting and Bias of News Media Sources". Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. pp. 3528–3539.
  2. 1 2 Zampieri, Marcos; Malmasi, Shervin; Nakov, Preslav; Rosenthal, Sara; Farra, Noura; Kumar, Ritesh (2019-06-01). "Predicting the Type and Target of Offensive Posts in Social Media". Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). North American Chapter of the Association for Computational Linguistics. Minneapolis, Minnesota: Association for Computational Linguistics. pp. 1415–1420.
  3. 1 2 Nakov, Preslav; Schwartz, Ariel; Hearst, Marti (2004-07-25). "Citances: Citation Sentences for Semantic Analysis of Bioscience Text". Proceedings of SIGIR. International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM. pp. 81–88. CiteSeerX   10.1.1.59.2666 .
  4. 1 2 "John Atanasov Presidential Award -- Preslav Nakov (2013)". Administration of the President of the Republic of Bulgaria. 2003. Retrieved February 21, 2021.
  5. Nakov, Preslav (2007). Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics (PhD).
  6. "Preslav Nakov - Google Scholar Citations". scholar.google.com. Retrieved 2021-02-21.
  7. Rosenthal, Sara; Farra, Noura; Nakov, Preslav (2017-08-01). "SemEval-2017 Task 4: Sentiment Analysis in Twitter". Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Workshop on Lexical and Computational Semantics and Semantic Evaluation. Association for Computational Linguistics. pp. 502–518.
  8. "TANBIH News Aggregator" . Retrieved February 21, 2021.
  9. "RANLP-2011 Young Researcher Award" (PDF). 2011. Retrieved February 21, 2021.
  10. "CIKM -- Best papers Nominees". 2020. Retrieved February 21, 2021.