Semantic Scholar

Last updated
Semantic Scholar
Semantic Scholar logo.png
Type of site
Search engine
Created by Allen Institute for Artificial Intelligence
LaunchedNovember 2015 (2015-11)

Semantic Scholar is a project developed at the Allen Institute for Artificial Intelligence. Publicly released in November 2015, it is designed to be an AI-backed search engine for scientific journal articles. [1] The project uses a combination of machine learning, natural language processing, and machine vision to add a layer of semantic analysis to the traditional methods of citation analysis, and to extract relevant figures, entities, and venues from papers. [2] In comparison to Google Scholar and PubMed, Semantic Scholar is designed to highlight the most important and influential papers, and to identify the connections between them.


As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from computer science and biomedicine. [3] In March 2018, Doug Raymond, who developed machine learning initiatives for the Amazon Alexa platform, was hired to lead the Semantic Scholar project. [4] As of August 2019, the number of included papers had grown to more than 173 million [5] after the addition of the Microsoft Academic Graph records [6] .

See also

Related Research Articles

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

CiteSeerx is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer is considered as a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search. CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. For this reason, authors whose documents are freely available are more likely to be represented in the index.

Google Scholar Academic search service by Google

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents. While Google does not publish the size of Google Scholar's database, scientometric researchers estimated it to contain roughly 389 million documents including articles, citations and patents making it the world's largest academic search engine in January 2018. Previously, the size was estimated at 160 million documents as of May 2014. An earlier statistical estimate published in PLOS ONE using a Mark and recapture method estimated approximately 80–90% coverage of all articles published in English with an estimate of 100 million. This estimate also determined how many documents were freely available on the web.

Semantic analytics, also termed semantic relatedness, is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relatedness of different ontological concepts.

Figure Eight Inc. American software company

Figure Eight is a human-in-the-loop machine learning and artificial intelligence company based in San Francisco. The company raised $58 million in venture capital and was acquired by Appen in March 2019 for $300 million.

Andrew Ng American artificial intelligence researcher

Andrew Yan-Tak Ng is a Chinese-American businessman, computer scientist, investor, and writer. He is focusing on machine learning and AI. As a businessman and investor, Ng co-founded and led Google Brain and was a former Vice President and Chief Scientist at Baidu, building the company's Artificial Intelligence Group into a team of several thousand people.

Kaggle internet platform for data science competitions

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

International Aging Research Portfolio (IARP) is a non-profit, open-access knowledge management system incorporating grants, publications, conferences in natural and social & behavioral sciences. In addition to the advanced search and visual trend analysis tools the system includes a directory of research projects classified into categories related to aging research. The system uses automatic classification algorithms with elements of machine learning to assign research projects to the relevant categories. The directory is curated by many expert category editors and science advisory board members. The chair of the science advisory board is Dr. Charles Cantor.

The Allen Institute for AI is a research institute founded by late Microsoft co-founder Paul Allen. The institute seeks to achieve scientific breakthroughs by constructing AI systems with reasoning, learning, and reading capabilities. Oren Etzioni was appointed by Paul Allen in September 2013 to direct the research at the institute.

Oren Etzioni American businessman

Oren Etzioni is an American entrepreneur, professor of computer science, and CEO of the Allen Institute for Artificial Intelligence. He joined the University of Washington faculty in 1991, where he became the Washington Research Foundation Entrepreneurship Professor in the Department of Computer Science and Engineering. In May 2005, he founded and became the director of the university's Turing Center. The center investigated problems in data mining, natural language processing, the Semantic Web and other web search topics. Etzioni coined the term machine reading and helped to create the first commercial comparison shopping agent.

Farshad Fotouhi American computer scientist

Farshad Fotouhi is the dean of Wayne State University's college of engineering. Farshad Fotouhi joined Wayne State University in 1988 as faculty in the Department of Computer Science. From 2000 to 2004, he was associate chair of the department, and from 2004 to 2010 he was chair.

Meta (academic company) artificial intelligence company specializing in big data analysis of scientific and technical literature

Meta is a company performing big data analysis of scientific literature. Company is headquartered in Redwood City, California and operates Meta Science, a literature discovery platform. The company was acquired by the Chan Zuckerberg Initiative in 2017.

Sparrho combines human and artificial intelligence to help research professionals and layman users stay up-to-date with new scientific publications and patents. Sparrho's recommendation engine provides personalized scientific news-feeds by using proprietary machine learning algorithms to "aggregate, distill and recommend" relevant content. The platform aims to complement traditional methods of finding relevant academic material such as Google Scholar and PubMed with a system which enables the serendipitous discovery of pertinent content and across relevant scientific fields through user input-aided machine learning and contextual analysis.

Fei-Fei Li American computer scientist

Fei-Fei Li is a Chinese-born American computer scientist, non-profit executive, and writer. She is a professor at Stanford University and the co-director of Stanford's Human-Centered AI Institute and the Stanford Vision and Learning Lab. She served as the director of the Stanford Artificial Intelligence Laboratory (SAIL) from 2013 to 2018. In 2017, she co-founded AI4ALL, a nonprofit organization working to increase diversity and inclusion in the field of artificial intelligence. Her research expertise includes artificial intelligence (AI), machine learning, deep learning, computer vision and cognitive neuroscience. She was the leading scientist and principal investigator of ImageNet.

Gregory Piatetsky-Shapiro data scientist and co-founder of KDD conferences and ACM SIGKDD association

Gregory I. Piatetsky-Shapiro is a data scientist and the co-founder of the KDD conferences, and co-founder and past chair of the Association for Computing Machinery SIGKDD group for Knowledge Discovery, Data Mining and Data Science. He is the founder and president of KDnuggets, a discussion and learning website for Business Analytics, Data Mining and Data Science.

Microsoft Academic online bibliographic database

Microsoft Academic is a free public web search engine for academic publications and literature, developed by Microsoft Research. Re-launched in 2016, the tool features an entirely new data structure and search engine using semantic search technologies. It currently indexes over 220 million publications, 88 million of which are journal articles. The Academic Knowledge API offers information retrieval from the underlying database using REST endpoints for advanced research purposes.

Paola Velardi

Paola Velardi is a Full Professor of computer science at Sapienza University in Rome, Italy. She is an Italian scientist born in Rome, on April 24, 1955. Her research encompasses natural language processing, machine learning, business intelligence and semantic web, web information extraction in particular. Velardi is one of the hundred female scientists included in the database "". This online, open database champions the recognition of top-rated female scientists in the Science, Technology, Engineering and Mathematics (STEM) area.

Andrea Frome is an American computer scientist who works in computer vision and machine learning.


  1. "Paul Allen's AI research group unveils program that aims to shake up how we search scientific knowledge. Give it a try". The Washington Post. Retrieved November 3, 2015.
  2. Bohannon, John (11 November 2016). "A computer program just ranked the most influential brain scientists of the modern era". Science . doi:10.1126/science.aal0371 . Retrieved 12 November 2016.
  3. "AI2 scales up Semantic Scholar search engine to encompass biomedical research". GeekWire. 2017-10-17. Retrieved 2018-01-18.
  4. "Tech Moves: Allen Instititue Hires Amazon Alexa Machine Learning Leader; Microsoft Chairman Takes on New Investor Role; and More". GeekWire. 2018-05-02.
  5. "main page". Semantic Scholar. Retrieved 11 August 2019.
  6. "AI2 joins forces with Microsoft Research to upgrade search tools for scientific studies". GeekWire. 2018-12-05. Retrieved 2019-08-25.