Roberto Navigli

Last updated
Roberto Navigli
Roberto Navigli.jpg
Alma mater
Awards AAAI Fellow (2025)
ACL Fellow (2023)
ELLIS Fellow (2024)
EurAI Fellow (2024)
Scientific career
Fields
Institutions Sapienza University of Rome
Thesis Structural Semantic Interconnections: a Knowledge-Based WSD Algorithm, its Evaluation and Applications  (2007)
Doctoral advisors
Website www.diag.uniroma1.it/navigli

Roberto Navigli (born 1978) is an Italian computer scientist and professor in the Department of Computer, Control and Management Engineering "Antonio Ruberti" at the Sapienza University of Rome, [3] where he is also the director of the Sapienza NLP Group. [4] His research focuses on Artificial Intelligence, specifically on enabling computers to understand and represent meaning across hundreds of languages, making significant contributions to various fields within Natural Language Processing, including Word Sense Disambiguation, Entity Linking, Semantic Role Labeling and semantic parsing. [1] He created BabelNet, a multilingual knowledge graph that brings together knowledge from resources including WordNet, Wikipedia, Wiktionary and Wikidata. At the core of his research lies the goal of making semantic representations of words and sentences independent of the language in which they are written. More recently, he has focused on Large Language Models (LLMs), leading the Minerva project, [5] [6] the first Italian effort for pretraining a LLM from scratch. [7]

Contents

Education

Navigli obtained his Master of Science degree in Computer Science in 2001 at Sapienza University of Rome, followed, in 2007, by a PhD from the same institution, under the supervision of Paola Velardi. [2] Navigli's doctoral thesis focused on devising and evaluating an innovative knowledge-based algorithm for Word Sense Disambiguation, named Structural Semantic Interconnections. [8]

Career and research

Navigli was a visiting research fellow and visiting professor [9] of the School of Informatics, University of Edinburgh, the University of Sussex, the University of Wolverhampton and the Center for Advanced Studies of Ludwig Maximilian University of Munich [10] He then obtained academic positions as researcher, and later associate and full professor, at the Sapienza University of Rome, where he established the Sapienza NLP group. [11] Between 2017 and 2023, Navigli served as a member of the ERC Starting Grant panel for Computer Science and Informatics (PE6). [12]

Navigli was granted a European Research Council (ERC) Starting Grant [13] to fund his work on the creation of BabelNet and multilingual Word Sense Disambiguation, most notably Babelfy, and a subsequent ERC Consolidator Grant [14] to work on sentence-level, language-independent semantic representations, leading to the BabelNet Meaning Representation and its semantic parser, with the goal of creating 'the DNA of language'. [15] These two grants have been highlighted among the 15 projects through which the ERC transformed science. [16]

In 2016, Navigli founded Babelscape, [17] a successful university spinoff company, focused on multilingual neuro-symbolic Natural Language Understanding. [18]

Awards

Selected publications

Related Research Articles

Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches in machine learning and deep learning.

<span class="mw-page-title-main">WordNet</span> Computational lexicon of English

WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. Until about 2024 the English WordNet could be used as an online dictionary/lexical database, and references with links to single words could be made, but thereafter one have to download the database to use it. There are now WordNets in more than 200 languages.

Word-sense disambiguation is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Eugene Charniak was a professor of computer Science and cognitive Science at Brown University. He held an A.B. in Physics from the University of Chicago and a Ph.D. from M.I.T. in Computer Science. His research was in the area of language understanding or technologies which relate to it, such as knowledge representation, reasoning under uncertainty, and learning. Since the early 1990s he was interested in statistical techniques for language understanding. His research in this area included work in the subareas of part-of-speech tagging, probabilistic context-free grammar induction, and, more recently, syntactic disambiguation through word statistics, efficient syntactic parsing, and lexical resource acquisition through statistical means.

Language resource management – Lexical markup framework, produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication.

In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database.

A constrained conditional model (CCM) is a machine learning and inference framework that augments the learning of conditional models with declarative constraints. The constraint can be used as a way to incorporate expressive prior knowledge into the model and bias the assignments made by the learned model to satisfy these constraints. The framework can be used to support decisions in an expressive output space while maintaining modularity and tractability of training and inference.

SemEval is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

Dragomir R. Radev was an American computer scientist who was a professor at Yale University, working on natural language processing and information retrieval. He also served as a University of Michigan computer science professor and Columbia University computer science adjunct professor, as well as a Member of the Advisory Board of Lawyaw.

<span class="mw-page-title-main">BabelNet</span> Multilingual lexical-semantic knowledge graph and encyclopedic dictionary

BabelNet is a multilingual lexical-semantic knowledge graph, ontology and encyclopedic dictionary developed at the NLP group of the Sapienza University of Rome under the supervision of Roberto Navigli. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions in many languages harvested from both WordNet and Wikipedia.

UBY is a large-scale lexical-semantic resource for natural language processing (NLP) developed at the Ubiquitous Knowledge Processing Lab (UKP) in the department of Computer Science of the Technische Universität Darmstadt . UBY is based on the ISO standard Lexical Markup Framework (LMF) and combines information from several expert-constructed and collaboratively constructed resources for English and German.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

<span class="mw-page-title-main">Rada Mihalcea</span> American computer scientist

Rada Mihalcea is the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering at the University of Michigan. She has made significant contributions to natural language processing, multimodal processing, and computational social science. With Paul Tarau, she is the co-inventor of TextRank Algorithm, which is widely used for text summarization.

Mirella Lapata is a computer scientist and Professor in the School of Informatics at the University of Edinburgh. Working on the general problem of extracting semantic information from large bodies of text, Lapata develops computer algorithms and models in the field of natural language processing (NLP).

Yejin Choi is Wissner-Slivka Chair of Computer Science at the University of Washington. Her research considers natural language processing and computer vision.

Mona Talat Diab is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning.

Ellen Riloff is an American computer scientist currently serving as a professor at the School of Computing at the University of Utah. Her research focuses on natural language processing and computational linguistics, specifically information extraction, sentiment analysis, semantic class induction, and bootstrapping methods that learn from unannotated texts.

Giuseppe De Giacomo is an Italian computer scientist. He is a Professor of Computer Science at the Department of Computer Science, University of Oxford, and Professor of Computer Engineering at the Department of Computer, Control and Management Engineering, Sapienza University of Rome. He is also a Senior Research Fellow at the Green Templeton College.

Lucia Specia is a British computer scientist, professor of natural language processing at Imperial College London and Chief Scientist at Contex.ai. She holds a joint position in language engineering at the University of Sheffield. Her research investigates data-driven approaches to natural language processing (NLP).

References

  1. 1 2 Roberto Navigli publications indexed by Google Scholar
  2. 1 2 "Roberto Navigli's institutional page - Publications".
  3. "Official DIAG Sapienza Page" (in Italian). Retrieved 2024-11-03.
  4. "Sapienza NLP Page" . Retrieved 2024-11-03.
  5. "Minerva" . Retrieved 2024-11-03.
  6. "Minerva 7B, l'IA generativa italiana è diventata grande" (in Italian). 26 November 2024. Retrieved 2024-11-28.
  7. "Ecco Minerva, la prima famiglia di LLM addestrati da zero in italiano" (in Italian). 23 April 2024. Retrieved 2024-11-03.
  8. Navigli, Roberto; Velardi, Paola (2005). "Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation". IEEE Transactions on Pattern Analysis and Machine Intelligence. 27 (7): 1075–1086. doi:10.1109/TPAMI.2005.149. PMID   16013755 . Retrieved 2024-11-03.
  9. "Roberto Navigli's institutional page - CV".
  10. "LMU visiting fellow page" . Retrieved 2024-11-03.
  11. "Sapienza NLP page".
  12. "ERC Starting Grant Panels 2023" (PDF). Retrieved 2024-11-03., "ERC Starting Grant Panels 2021" (PDF). Retrieved 2024-11-03., "ERC Starting Grant Panels 2019" (PDF). Retrieved 2024-11-03., "ERC Starting Grant Panels 2017" (PDF). Retrieved 2024-11-03.
  13. "MultiJEDI on CORDIS". CORDIS. Retrieved 2024-08-24.
  14. "MOUSSE". CORDIS. Retrieved 2024-08-24.
  15. "Project breaks new grounds in AI to create 'DNA of language'". CORDIS.
  16. "How the ERC transformed science" . Retrieved 2024-08-24.
  17. "Babelscape - about".
  18. "Dalla ricerca arriva l'IA tutta made in Italy" (in Italian). 24 October 2024. Retrieved 2024-11-06.
  19. "Elected AAAI Fellows".
  20. "Current EurAI Fellows".
  21. "EurAI Fellow motivation on X" . Retrieved 2024-11-03.
  22. "Fellows & Scholars of the ELLIS Society - Natural Language Processing" . Retrieved 2024-12-04.
  23. "Current ACL Fellows".
  24. 1 2 "ACL 2024 best paper awards".
  25. 1 2 "ACL 2023 best paper awards".
  26. 1 2 "NAACL 2021 best paper awards". 2 June 2021.
  27. 1 2 3 4 "AIJ Awards: List of Current and Previous Winners".
  28. 1 2 "Sapienza DIAG ACL 2022 best resource paper announcement".
  29. "META Prize page". Archived from the original on 2023-03-06.
  30. "Marco Somalvico awards page".
  31. "Marco Cadoli awards page".