Semantic Scholar

Last updated

Semantic Scholar
Semantic Scholar logo.svg
Type of site
Search engine
Created by Allen Institute for Artificial Intelligence
URL semanticscholar.org
LaunchedNovember 2, 2015;9 years ago (2015-11-02) [1]

Semantic Scholar is a research tool for scientific literature powered by artificial intelligence. It is developed at the Allen Institute for AI and was publicly released in November 2015. [2] Semantic Scholar uses modern techniques in natural language processing to support the research process, for example by providing automatically generated summaries of scholarly papers. [3] The Semantic Scholar team is actively researching the use of artificial intelligence in natural language processing, machine learning, human–computer interaction, and information retrieval. [4]

Contents

Semantic Scholar began as a database for the topics of computer science, geoscience, and neuroscience. [5] In 2017, the system began including biomedical literature in its corpus. [5] As of September 2022, it includes over 200 million publications from all fields of science. [6]

Technology

Semantic Scholar provides a one-sentence summary of scientific literature. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices. [7] It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature is ever read. [8]

Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique. [3] The project uses a combination of machine learning, natural language processing, and machine vision to add a layer of semantic analysis to the traditional methods of citation analysis, and to extract relevant figures, tables, entities, and venues from papers. [9] [10]

Another key AI-powered feature is Research Feeds, an adaptive research recommender that uses AI to quickly learn what papers users care about reading and recommends the latest research to help scholars stay up to date. It uses a state-of-the-art paper embedding model trained using contrastive learning to find papers similar to those in each Library folder. [11]

Semantic Scholar also offers Semantic Reader, an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual. [12] Semantic Reader provides in-line citation cards that allow users to see citations with TLDR (short for Too Long, Didn't Read) automatically generated short summaries as they read and skimming highlights that capture key points of a paper so users can digest faster.

In contrast with Google Scholar and PubMed, Semantic Scholar is designed to highlight the most important and influential elements of a paper. [13] The AI technology is designed to identify hidden connections and links between research topics. [14] Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's SciGraph, and the Semantic Scholar Corpus (originally a 45 million papers corpus in computer science, neuroscience and biomedicine). [15] [16]

Article identifier

Each paper hosted by Semantic Scholar is assigned a unique identifier called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:

Liu, Ying; Gayle, Albert A; Wilder-Smith, Annelies; Rocklöv, Joacim (March 2020). "The reproductive number of COVID-19 is higher compared to SARS coronavirus". Journal of Travel Medicine. 27 (2). doi:10.1093/jtm/taaa021. PMID   32052846. S2CID   211099356.

Indexing

Semantic Scholar is free to use and unlike similar search engines (i.e. Google Scholar) does not search for material that is behind a paywall. [5] [ citation needed ]

One study compared the index scope of Semantic Scholar to Google Scholar, and found that for the papers cited by secondary studies in computer science, the two indices had comparable coverage, each only missing a handful of the papers. [17]

Number of users and publications

As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from computer science and biomedicine. [18] In March 2018, Doug Raymond, who developed machine learning initiatives for the Amazon Alexa platform, was hired to lead the Semantic Scholar project. [19] As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million [20] after the addition of the Microsoft Academic Graph records. [21] In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus. [22] At the end of 2020, Semantic Scholar had indexed 190 million papers. [23] In 2020, Semantic Scholar reached seven million users per month. [7]

See also

Related Research Articles

Knowledge representation and reasoning is a field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks, such as diagnosing a medical condition or having a natural-language dialog. Knowledge representation incorporates findings from psychology about how humans solve problems and represent knowledge, in order to design formalisms that make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning.

CiteSeerX is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.

The Association for the Advancement of Artificial Intelligence (AAAI) is an international scientific society devoted to promote research in, and responsible use of, artificial intelligence. AAAI also aims to increase public understanding of artificial intelligence (AI), improve the teaching and training of AI practitioners, and provide guidance for research planners and funders concerning the importance and potential of current AI developments and future directions.

<span class="mw-page-title-main">Google Scholar</span> Academic search service by Google

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents.

<i>Journal of Artificial Intelligence Research</i> Academic journal

The Journal of Artificial Intelligence Research (JAIR) is an open access peer-reviewed scientific journal covering research in all areas of artificial intelligence.

<span class="mw-page-title-main">Deborah McGuinness</span>

Deborah Louise McGuinness is an American computer scientist and researcher at Rensselaer Polytechnic Institute (RPI). She is a professor of Computer, Cognitive and Web Sciences, Industrial and Systems Engineering, and an endowed chair in the Tetherless World Constellation, a multidisciplinary research institution within RPI that focuses on the study of theories, methods and applications of the World Wide Web. Her fields of expertise include interdisciplinary data integration, artificial intelligence, specifically in knowledge representation and reasoning, description logics, the semantic web, explanation, and trust.

Semantic analytics, also termed semantic relatedness, is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relatedness of different ontological concepts.

<span class="mw-page-title-main">Web of Science</span> Online subscription index of citations

The Web of Science is a paid-access platform that provides access to multiple databases that provide reference and citation data from academic journals, conference proceedings, and other documents in various academic disciplines.

<span class="mw-page-title-main">Jaime Carbonell</span> American computer scientist (1953–2020)

Jaime Guillermo Carbonell was a computer scientist who made seminal contributions to the development of natural language processing tools and technologies. His extensive research in machine translation resulted in the development of several state-of-the-art language translation and artificial intelligence systems. He earned his B.S. degrees in Physics and in Mathematics from MIT in 1975 and did his Ph.D. under Dr. Roger Schank at Yale University in 1979. He joined Carnegie Mellon University as an assistant professor of computer science in 1979 and lived in Pittsburgh from then. He was affiliated with the Language Technologies Institute, Computer Science Department, Machine Learning Department, and Computational Biology Department at Carnegie Mellon.

<span class="mw-page-title-main">Eric Horvitz</span> American computer scientist, and Technical Fellow at Microsoft

Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.

The Allen Institute for AI is a 501(c)3 non-profit research institute founded by late Microsoft co-founder and philanthropist Paul Allen in 2014. The institute seeks to conduct high-impact AI research and engineering in service of the common good. Oren Etzioni was appointed by Paul Allen in September 2013 to direct the research at the institute. After leading the organization for nine years, Oren Etzioni stepped down from his role as CEO on September 30, 2022. He was replaced in an interim capacity by the leading researcher of the company's Aristo project, Peter Clark. On June 20, 2023, AI2 announced Ali Farhadi as its next CEO starting July 31, 2023. The company's board formed a search committee for a new CEO. AI2 also has an active office in Tel Aviv, Israel.

<span class="mw-page-title-main">Citation graph</span> Directed graph describing citations in documents

A citation graph, in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents.

<span class="mw-page-title-main">Oren Etzioni</span> Professor Emeritus of computer science, founder of Allen Institute for Artificial Intelligence

Oren Etzioni is Professor Emeritus of computer science, and founding CEO of the Allen Institute for Artificial Intelligence (AI2). Etzioni is the founder and CEO of TrueMedia.org, a non-profit dedicated to fighting political deepfakes, which launched in April 2024. Etzioni is a Technical Director of the AI2 Incubator, and a venture partner at the Madrona Venture Group.

<span class="mw-page-title-main">Microsoft Academic</span> Online bibliographic database

Microsoft Academic was a free internet-based academic search engine for academic publications and literature, developed by Microsoft Research in 2016 as a successor of Microsoft Academic Search. Microsoft Academic was shut down in 2022. Both OpenAlex and The Lens claim to be successors to Microsoft Academic.

<span class="mw-page-title-main">Sophia Ananiadou</span> Greek computational linguist

Sophia Ananiadou is a Greek-British computer scientist and computational linguist. She led the development of and directs the National Centre for Text Mining (NaCTeM) in the United Kingdom. She is also Professor in Computer Science in the Department of Computer Science at the University of Manchester.

Natasha Fridman Noy is a Russian-born American Research scientist who works at Google Research in Mountain View, CA, who focuses on making structured data more accessible and usable. She is the team leader for Dataset Search, a web-based search engine for all datasets. Natasha worked at Stanford Center for Biomedical Informatics Research before joining Google, where she made significant contributions to ontology building and alignment, as well as collaborative ontology engineering. Natasha is on the Editorial Boards of many Semantic Web and Information Systems publications and is the Immediate Past President of the Semantic Web Science Association. From 2011 to 2017, she was the president of the Semantic Web Science Association.

Paulo Shakarian is an associate professor at Arizona State University where he leads Lab V2 which is focused on neurosymbolic artificial intelligence. His work on artificial intelligence and security has been featured in Forbes, the New Yorker, Slate, the Economist, Business Insider, TechCrunch, CNN and BBC. He has authored numerous books on artificial intelligence and the intersection of AI and security. He previously served as a military officer, had experience at DARPA, and co-founded a startup.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.

Sébastien Bubeck is a French-American computer scientist and mathematician. He was Microsoft's Vice President of Applied Research and led the Machine Learning Foundations group at Microsoft Research Redmond. Bubeck was formerly professor at Princeton University and a researcher at the University of California, Berkeley. He is known for his contributions to online learning, optimization and more recently studying deep neural networks, and in particular transformer models. Since 2024, he works for OpenAI.

Gjergji Kasneci is a German computer scientist known for his contributions to the field of Artificial Intelligence, specifically, knowledge base construction, semantic search, and data science. He is a full professor and heads the chair for Responsible Data Science at the Technical University of Munich (TUM), and is a core member of the Munich Data Science Institute. Before his current appointment, Kasneci has held multiple positions in academia and industry, including the role of Chief Technology Officer (CTO) at Schufa Holding AG and an honorary professorship at the University of Tübingen.

References

  1. Jones, Nicola (2015). "Artificial-intelligence institute launches free science search engine". Nature . doi: 10.1038/nature.2015.18703 . ISSN   1476-4687. S2CID   182440976.
  2. Eunjung Cha, Ariana (3 November 2015). "Paul Allen's AI research group unveils program that aims to shake up how we search scientific knowledge. Give it a try". The Washington Post. Archived from the original on 6 November 2019. Retrieved November 3, 2015.
  3. 1 2 Hao, Karen (November 18, 2020). "An AI helps you summarize the latest in AI". MIT Technology Review. Retrieved 2021-02-16.
  4. "Semantic Scholar Research". research.semanticscholar.org. Retrieved 2021-11-22.
  5. 1 2 3 Fricke, Suzanne (2018-01-12). "Semantic Scholar". Journal of the Medical Library Association . 106 (1): 145–147. doi: 10.5195/jmla.2018.280 . ISSN   1558-9439. PMC   5764585 . S2CID   45802944.
  6. Matthews, David (1 September 2021). "Drowning in the literature? These smart software tools can help". Nature. Retrieved 5 September 2022. ...the publicly available corpus compiled by Semantic Scholar – a tool set up in 2015 by the Allen Institute for Artificial Intelligence in Seattle, Washington – amounting to around 200 million articles, including preprints.
  7. 1 2 Grad, Peter (November 24, 2020). "AI tool summarizes lengthy papers in a sentence". Tech Xplore. Retrieved 2021-02-16.
  8. "Allen Institute's Semantic Scholar now searches across 175 million academic papers". VentureBeat. 2019-10-23. Retrieved 2021-02-16.
  9. Bohannon, John (11 November 2016). "A computer program just ranked the most influential brain scientists of the modern era". Science . doi:10.1126/science.aal0371. Archived from the original on 29 April 2020. Retrieved 12 November 2016.
  10. Christopher Clark; Santosh Divvala (2016). PDFFigures 2.0: Mining figures from research papers. Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries - JCDL '16. Wikidata   Q108172042.
  11. "Semantic Scholar | Frequently Asked Questions". Archived from the original on July 15, 2023.
  12. "Semantic Scholar | Semantic Reader". Semantic Scholar. Archived from the original on July 15, 2023.
  13. "Semantic Scholar". International Journal of Language and Literary Studies. Retrieved 2021-11-09.
  14. Baykoucheva, Svetla (2021). Driving Science Information Discovery in the Digital Age. Chandos Publishing. p. 91. ISBN   978-0-12-823724-3. OCLC   1241441806.
  15. Jose, Joemon M.; Yilmaz, Emine; Magalhães, João; Castells, Pablo; Ferro, Nicola; Silva, Mário J.; Martins, Flávio (2020). Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Cham, Switzerland: Springer Nature. p. 254. ISBN   978-3-030-45438-8. OCLC   1164658107.
  16. Ammar, Waleed (2019). "Open Research Corpus". Semantic Scholar Lab Open Research Corpus. Archived from the original on 2019-03-29. Retrieved 2024-08-05.
  17. Hannousse, Abdelhakim (2021). "Searching relevant papers for software engineering secondary studies: Semantic Scholar coverage and identification role". IET Software. 15 (1): 126–146. doi:10.1049/sfw2.12011. ISSN   1751-8814. S2CID   234053002.
  18. "AI2 scales up Semantic Scholar search engine to encompass biomedical research". GeekWire. 2017-10-17. Archived from the original on 2018-01-19. Retrieved 2018-01-18.
  19. "Tech Moves: Allen Instititue Hires Amazon Alexa Machine Learning Leader; Microsoft Chairman Takes on New Investor Role; and More". GeekWire. 2018-05-02. Archived from the original on 2018-05-10. Retrieved 2018-05-09.
  20. "Semantic Scholar". Semantic Scholar. Archived from the original on 11 August 2019. Retrieved 11 August 2019.
  21. "AI2 joins forces with Microsoft Research to upgrade search tools for scientific studies". GeekWire. 2018-12-05. Archived from the original on 2019-08-25. Retrieved 2019-08-25.
  22. "The University of Chicago Press joins more than 500 publishers working with Semantic Scholar to improve search and discoverability". RCNi Company Limited. Retrieved 2021-11-22.
  23. Dunn, Adriana (December 14, 2020). "Semantic Scholar Adds 25 Million Scientific Papers in 2020 Through New Publisher Partnerships" (PDF). Semantic Scholar. Retrieved November 22, 2021.