Natasha Noy

Last updated
Natasha Noy
Born
Natalya Fridman Noy

Russia
Alma mater
Known for Protégé ontology editor
Google Dataset Search
Awards AAAI Fellow (2020)
ACM Fellow (2023) [1]
Scientific career
Fields Semantic Web
Ontologies
Structured data
Data integration [2]
Institutions Google
Stanford University
Thesis Knowledge representation for intelligent information retrieval in experimental sciences  (1997)
Website research.google.com/pubs/NatalyaNoy.html OOjs UI icon edit-ltr-progressive.svg

Natasha Fridman Noy is a Russian-born American Research scientist [3] who works at Google Research in Mountain View, CA, [4] who focuses on making structured data more accessible and usable. [5] [2] [6] She is the team leader for Dataset Search, a web-based search engine for all datasets. [7] Natasha worked at Stanford Center for Biomedical Informatics Research before joining Google, where she made significant contributions to ontology building and alignment, as well as collaborative ontology engineering. [4] Natasha is on the Editorial Boards of many Semantic Web and Information Systems publications and is the Immediate Past President of the Semantic Web Science Association. [4] From 2011 to 2017, she was the president of the Semantic Web Science Association. [7]

Contents

Education

Natasha Noy earned a bachelor's degree in applied mathematics from Moscow State University, a master's degree in computer science from Boston University and a doctorate from Northeastern University. [2] Her thesis focused on knowledge-rich documents, in particular information retrieval for scientific articles. [8]

Career and research

Noy moved from Northeastern to Stanford University, to work with Mark Musen on the Protégé ontology editor as a postdoctoral researcher, and later as a research scientist. It was here that she completed her important work on Prompt, an environment for automated ontology alignment, which was published in 2002. [9] [10] For recognizing the specifics of the problem and providing an inventive solution, this study received the AAAI classic paper award in 2018. By far her most widely distributed work, [2] however, was the Ontology 101 tutorial, [11] which Noy developed as part of the education program for Protégé customers, the tutorial became a standard introductory document for the semantic web and ontologies, It has been cited nearly 6800 times as of 2018, and downloaded often. [2]

In April 2014, Noy went to Google Research; Google has released a search engine to help researchers find publicly available online data. On September 5, the program was launched, and it is aimed towards "scientists, data journalists, data geeks, or anybody else." [7] Dataset Search, which is now following Google's other specialized search engines including news and picture search, as well as Google Scholar and Google Books, locates files and databases based on how their owners have categorised them. It does not read the content of the files in the same manner that search engines read web pages. [7] Researchers who want to know what kinds of data are accessible or who want to find data that they already know exists, according to Natasha Noy, must often rely on word of mouth, this problem is particularly acute, according to Noy, for early-career academics who have yet to "connect" into a network of professional ties. Noy and her Google colleague Dan Brickley wrote a blog post in January 2017 proposing a solution to the problem. Typical search engines operate in two stages: The first stage is to search the Internet for sites to index on a regular basis, the second stage is to rank those indexed sites so that the engine can return relevant results in order when a user puts in a search word. Owners of datasets should 'tag' them using a standardized vocabulary called Schema.org. According to Noy and Brickley, Google and three other search engine behemoths (Microsoft, Yahoo, and Yandex) created Schema.org to help search engines in scanning existing data sets. [7]

Awards and honors

Noy is best known for her work on the Protégé ontology editor and the Prompt alignment tool, for which she and co-author Mark Musen received the AAAI Classic Paper award in 2018, the AAAI Classic Paper award honors the author(s) of the most influential paper(s) from a specific conference year, with the time period examined advancing by one year per year. [12] She was elected an AAAI Fellow in 2020 [13] and an ACM Fellow in 2023. [1]

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

<span class="mw-page-title-main">FOAF</span> Semantic Web ontology to describe relations between people

FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe social networks without the need for a centralised database.

<span class="mw-page-title-main">Deborah McGuinness</span>

Deborah Louise McGuinness is an American computer scientist and researcher at Rensselaer Polytechnic Institute (RPI). She is a professor of Computer, Cognitive and Web Sciences, Industrial and Systems Engineering, and an endowed chair in the Tetherless World Constellation, a multidisciplinary research institution within RPI that focuses on the study of theories, methods and applications of the World Wide Web. Her fields of expertise include interdisciplinary data integration, artificial intelligence, specifically in knowledge representation and reasoning, description logics, the semantic web, explanation, and trust.

Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.

<span class="mw-page-title-main">Rudi Studer</span> German computer scientist

Rudi Studer is a German computer scientist and professor emeritus at KIT, Germany. He served as head of the knowledge management research group at the Institute AIFB and one of the directors of the Karlsruhe Service Research Institute (KSRI). He is a former president of the Semantic Web Science Association, an STI International Fellow, and a member of numerous programme committees and editorial boards. He was one of the inaugural editors-in-chief of the Journal of Web Semantics, a position he held until 2007. He is a co-author of the "Semantic Wikipedia" proposal which led to the development of Wikidata.

<span class="mw-page-title-main">Linked data</span> Structured data and method for its publication

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.

Semantic analytics, also termed semantic relatedness, is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relatedness of different ontological concepts.

<span class="mw-page-title-main">Karen Spärck Jones</span> British computer scientist (1935–2007)

Karen Ida Boalth Spärck Jones was a self-taught programmer and a pioneering British computer scientist responsible for the concept of inverse document frequency (IDF), a technology that underlies most modern search engines. She was an advocate for women in the field of computer science. She even came up with a slogan: “Computing is too important to be left to men.” In 2019, The New York Times published her belated obituary in its series Overlooked, calling her "a pioneer of computer science for work combining statistics and linguistics, and an advocate for women in the field." From 2008, to recognize her achievements in the fields of information retrieval (IR) and natural language processing (NLP), the Karen Spärck Jones Award is awarded to a new recipient with outstanding research in one or both of her fields.

Amit Sheth is a computer scientist at University of South Carolina in Columbia, South Carolina. He is the founding Director of the Artificial Intelligence Institute, and a Professor of Computer Science and Engineering. From 2007 to June 2019, he was the Lexis Nexis Ohio Eminent Scholar, director of the Ohio Center of Excellence in Knowledge-enabled Computing, and a Professor of Computer Science at Wright State University. Sheth's work has been cited by over 48,800 publications. He has an h-index of 106, which puts him among the top 100 computer scientists with the highest h-index. Prior to founding the Kno.e.sis Center, he served as the director of the Large Scale Distributed Information Systems Lab at the University of Georgia in Athens, Georgia.

A deductive classifier is a type of artificial intelligence inference engine. It takes as input a set of declarations in a frame language about a domain such as medical research or molecular biology. For example, the names of classes, sub-classes, properties, and restrictions on allowable values. The classifier determines if the various declarations are logically consistent and if not will highlight the specific inconsistent declarations and the inconsistencies among them. If the declarations are consistent the classifier can then assert additional information based on the input. For example, it can add information about existing classes, create additional classes, etc. This differs from traditional inference engines that trigger off of IF-THEN conditions in rules. Classifiers are also similar to theorem provers in that they take as input and produce output via first-order logic. Classifiers originated with KL-ONE frame languages. They are increasingly significant now that they form a part in the enabling technology of the Semantic Web. Modern classifiers leverage the Web Ontology Language. The models they analyze and generate are called ontologies.

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched the service on September 5, 2018, and stated that the product was targeted at scientists and data journalists. The service was out of beta as of January 23, 2020.

Yolanda Gil is a Spanish computer scientist specializing in knowledge discovery and knowledge-based systems at the University of Southern California (USC). She served as chair of SIGAI the Association for Computing Machinery (ACM) Special Interest Group (SIG) on Artificial Intelligence, and the president of the Association for the Advancement of Artificial Intelligence (AAAI).

<span class="mw-page-title-main">Stefan Decker</span> Computer scientist

Stefan Decker is a computer scientist, Full Professor for Database and Information Systems at RWTH Aachen University, and managing director of the Fraunhofer Institute for Applied Information Technology. He specializes in the Semantic Web. As of 25 January 2020, his research reached 21,206 Google Scholar Citations, making him one of the most influential Semantic Web researchers.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics or relationships underlying these entities.

<span class="mw-page-title-main">Ontotext GraphDB</span> RDF-store

Ontotext GraphDB is a graph database and knowledge discovery tool compliant with RDF and SPARQL and available as a high-availability cluster. Ontotext GraphDB is used in various European research projects.

Mark Alan Musen is a Professor of Biomedical Informatics and of Biomedical Data Science at Stanford University, and Division Director of the Stanford Center for Biomedical Informatics Research. Musen's research focuses on open science, data stewardship, intelligent systems, and biomedical decision support. Since the late 1980s, Musen has led the development of Protégé, which is currently the most "widely used domain-independent, freely available, platform-independent technology for developing and managing terminologies, ontologies, and knowledge bases" in a range of application domains.

References

  1. 1 2 Anon (2024). "2023 ACM Fellows Celebrated for Contributions to Computing That Underpin Our Daily Lives". acm.org. New York: Association for Computing Machinery.
  2. 1 2 3 4 5 Natasha Noy publications indexed by Google Scholar OOjs UI icon edit-ltr-progressive.svg
  3. Anon (2021). "SIMBig Conference 2021". simbig.org. Retrieved 2022-04-08.
  4. 1 2 3 Anon (2018). "Natasha Noy at the International Semantic Web Conference (ISWC)". semanticweb.org. Retrieved 2022-04-08.
  5. Natasha Noy publications from Europe PubMed Central
  6. "Google launches search engine for open datasets - The Tartan". thetartan.org. Retrieved 13 October 2018.
  7. 1 2 3 4 5 Castelvecchi, Davide (2018). "Google unveils search engine for open data". Nature. 561 (7722): 161–162. Bibcode:2018Natur.561..161C. doi:10.1038/d41586-018-06201-x. PMID   30206390. S2CID   52190512.
  8. Noy, Natalya Fridman (1997). Knowledge Representation for Intelligent Information Retrieval in Experimental Sciences (PDF). semanticscholar.org (PhD thesis). Northeastern University. S2CID   23795878. ProQuest   9826118. Archived from the original (PDF) on 2018-10-13. Retrieved 13 October 2018.
  9. Noy, Natalya Fridman; Musen, Mark A. (2000). "Algorithm and Tool for Automated Ontology Merging and Alignment" (PDF). aaai.org. Retrieved 13 October 2018.
  10. "Ontology Mapping and Alignment". videolectures.net. Retrieved 13 October 2018.
  11. Noy, Natalya F.; McGuinness, Deborah L. (2001). "Ontology Development 101: A Guide to Creating Your First Ontology" (PDF). protege.stanford.edu. Retrieved 13 October 2018.
  12. Anon (2018). "AAAI Classic Paper Award". aaai.org. Retrieved 2022-04-18.
  13. Anon (2020). "Elected AAAI Fellows". aaai.org. Association for the Advancement of Artificial Intelligence . Retrieved 2024-01-04.