Mona Diab

Last updated
Mona T. Diab
BornEgypt
Alma mater University of Maryland (PhD in Computational Linguistics)
The George Washington University (MSc in Computer Science)
The American University in Cairo (BSc in Computer Science)
Helwan University in Cairo (BSc in Tourist Guidance/Egyptology & Archaeology)
Scientific career
FieldsComputer science
Computational linguistics
Applied machine learning
InstitutionsCarnegie Mellon University
The George Washington University
Facebook AI
Columbia University
Stanford University
Thesis Word Sense Disambiguation within a Multilingual Framework(2003)
Doctoral advisorPhilip Resnik (University of Maryland, College Park)
Postdoc advisorDan Jurafsky (Stanford University)
Website Personal website

Mona Talat Diab is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning. [1]

Contents

Education

Diab completed her M.Sc. in computer science with a major in machine learning and artificial intelligence at The George Washington University (1997) and her Ph.D. in computational linguistics at the University of Maryland, Linguistics Department and University of Maryland Institute for Advanced Computer Studies (UMIACS) in 2003, under the supervision of Philip Resnik. She was also a postdoctoral research scientist at Stanford University (2003–2005) under the mentorship of Dan Jurafsky, where she was a part of the Stanford NLP Group. [2] [3]

Career

After her postdoc at Stanford, Diab took a position as research scientist (principal investigator) at the Center for Computational Learning Systems (CCLS) in Columbia University, where she was also adjunct professor in the computer science department. In 2013 she joined the George Washington University as an associate professor, where she was promoted to full professor in 2017. Diab is the founder and director of the GW NLP lab CARE4Lang. Diab served as an elected faculty senator at Columbia University for 6 years (2007–2012) and an elected faculty senator at GW (2013–2014). She served the computational linguistics community as elected member, secretary and president of ACL SIGLEX (2005–2016) and elected president of ACL SIGSemitic. She currently serves as the elected VP-elect for ACL SIGDAT. In 2017 Diab joined Amazon AWS AI Deep Learning Group for Human Language Technologies, where she led the AWS Lex project for task oriented dialogue systems for enterprises. A couple of years later, she moved to Facebook AI as a research scientist. [2] In the fall of 2023, she became the director of CMU's Language Technologies Institute -- the first full time director since the passing of its founder Jaime Carbonell.[ citation needed ]

Research

Diab's research interests include several areas in computational linguistics/natural language processing, like conversational AI, computational lexical semantics, multilingual and cross lingual processing, social media processing with an emphasis on computational socio- pragmatics, information extraction & text analytics, machine translation. [4] Besides this, she also has special interests in Arabic NLP and low resource scenarios. [5]

Diab co-established two research trends in the computational linguistics field, computational approaches to linguistic code switching in 2007 and semantic textual similarity in 2010. [2]

Diab together with Nizar Habash and Owen Rambow, co-founded CADIM in 2005, a global reference point in Arabic dialect processing.[ citation needed ]

In 2012, Diab together with Eneko Agirre and Johan Bos, brought together two ACL communities SIGLEX and SIGSEM and established the 1st tier conference *SEM.[ citation needed ]

Awards and recognition

Publications

Diab has over 250 publications, and she is an acting editor for several scientific journals. [6]

Selected publications

Related Research Articles

Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others.

<span class="mw-page-title-main">Natural language processing</span> Field of linguistics and computer science

Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate speech. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to conscious attention when ambiguity impairs clarity of communication, given the pervasive polysemy in natural language. In computational linguistics, it is an open problem that affects other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference.

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a natural language.

<span class="mw-page-title-main">Association for Computational Linguistics</span> Professional organization devoted to linguistics

The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on natural language processing. Its namesake conference is one of the primary high impact conferences for natural language processing research, along with EMNLP. The conference is held each summer in locations where significant computational linguistics research is carried out.

<span class="mw-page-title-main">Treebank</span>

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data.

<span class="mw-page-title-main">Distributional semantics</span> Field of linguistics

Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. The basic idea of distributional semantics can be summed up in the so-called distributional hypothesis: linguistic items with similar distributions have similar meanings.

Linguistic categories include

The knowledge acquisition bottleneck is perhaps the major impediment to solving the word sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases. Supervised learning methods depend heavily on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the Senseval exercises.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

In natural language processing, textual entailment (TE), also known as natural language inference (NLI), is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text.

LEPOR is an automatic language independent machine translation evaluation metric with tunable parameters and reinforced factors.

Temporal annotation is the study of how to automatically add semantic information regarding time to natural language documents. It plays a role in natural language processing and computational linguistics.

<span class="mw-page-title-main">Word embedding</span> Method in natural language processing

In natural language processing (NLP), a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using language modeling and feature learning techniques, where words or phrases from the vocabulary are mapped to vectors of real numbers.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations.. Semantic parsing is one of the important tasks in computational linguistics and natural language processing. Semantic parsing maps text to formal meaning representations. This contrasts with semantic role labeling and other forms of shallow semantic processing, which do not aim to produce complete formal meanings..

Paraphrase or paraphrasing in computational linguistics is the natural language processing task of detecting and generating paraphrases. Applications of paraphrasing are varied including information retrieval, question answering, text summarization, and plagiarism detection. Paraphrasing is also useful in the evaluation of machine translation, as well as semantic parsing and generation of new samples to expand existing corpora.

Bidirectional Encoder Representations from Transformers (BERT) is a family of language models introduced in October 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

Ellen Riloff is an American computer scientist currently serving as a professor at the School of Computing at the University of Utah. Her research focuses on Natural Language Processing and Computational Linguistics, specifically information extraction, sentiment analysis, semantic class induction, and bootstrapping methods that learn from unannotated texts.

<span class="mw-page-title-main">Self-supervised learning</span> A paradigm in machine learning

Self-supervised learning (SSL) is a paradigm in machine learning for processing data of lower quality, rather than improving ultimate outcomes. Self-supervised learning more closely imitates the way humans learn to classify objects.

deepset German natural language processing startup

deepset is an enterprise software vendor that provides developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the open source software Haystack and its commercial SaaS offering deepset Cloud.

References

  1. "Mona Diab". scholar.google.com. Retrieved 2021-03-12.
  2. 1 2 3 4 "Mona T. Diab". web.seas.gwu.edu. Retrieved 2021-03-12.
  3. "Home Page for Mona Talat Diab". www1.cs.columbia.edu. Retrieved 2021-03-12.
  4. "Mona Diab | School of Engineering & Applied Science | The George Washington University". www.seas.gwu.edu. Retrieved 2021-03-12.
  5. "KDD 2020 | Invited Speakers: Mona Diab - Mona Diab". www.kdd.org. Retrieved 2021-03-12.
  6. "Mona Diab". scholar.google.com. Retrieved 2021-03-12.
  7. "Mona Diab". Amazon Science. Retrieved 2021-03-12.