SciGraph

Last updated
SciGraph
Type of site
Search engine
Created by Springer Nature
LaunchedMarch 2017 (2017-03)

SciGraph was a search engine tool developed by Springer Nature, the former URL was https://scigraph.springernature.com/explorer. The technology, which was considered a Linked Open Data (LOD) platform, [1] collects information that covers the research landscape, which includes research projects, publications, conferences, funding agencies, and others. [2] Key features of the platform include the detailed semantic description of the relationship of information and the visualization of the scholarly domain.

Contents

Development

The development of SciGraph began with an initiative to create a platform that will host Springer Nature's entire publication archive, which cover texts published as early as 1815. [3] The number of these resources is reported to be about 13 million. [3] The technology behind the platform was built on earlier Springer Nature projects developed for the purpose of collecting information on the research landscape. [4] The first SciGraph data set was published in February 2017. [4] The platform was launched in March 2017 and significantly expanded with the addition of publications of key partners. [5] The datasets span a broad range of topics, which include computer science, medicine, life sciences, chemistry, engineering, and astronomy, among others. [6] The developers also plan to include citations, patents, and clinical trials in the future. [7]

Technology

SciGraph constitutes 1.5 to 2 billion triples where a triple is formatted as "subject-predicate-object" and could link any subject or concept through a predicate (verb) to another object, demonstrating the type of relationship that exists between them. [8] Its graph structure is used by other academic search engines such as Semantic Scholar. [9]

SciGraph collects data from Springer Nature and its partners from the scholarly domain as well as funders, research projects, conferences, affiliations, and publications. [10] The collected information serves as rich semantic description of how information is related and it also provides a visualization of the scholarly domain. [11] The platform has been considered the only large-scale dataset that reconciles authors' affiliations through the disambiguation and linking with external authoritative datasets according to institutions. [6]

Related Research Articles

An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For annotations of different digital media, see web annotation and text annotation.

<span class="mw-page-title-main">Anscombe's quartet</span> Four data sets with the same descriptive statistics, yet very different distributions

Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (xy) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data when analyzing it, and the effect of outliers and other influential observations on statistical properties. He described the article as being intended to counter the impression among statisticians that "numerical calculations are exact, but graphs are rough".

<span class="mw-page-title-main">Linked data</span> Structured data and method for its publication

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.

<span class="mw-page-title-main">DBpedia</span> Online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.

<span class="mw-page-title-main">Ontology engineering</span> Field that studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

The International Semantic Web Conference (ISWC) is a series of academic conferences and the premier international forum for the Semantic Web, Linked Data and Knowledge Graph Community. Here, scientists, industry specialists, and practitioners meet to discuss the future of practical, scalable, user-friendly, and game changing solutions. Its proceedings are published in the Lecture Notes in Computer Science by Springer-Verlag.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. The company, OpenCorporates Ltd, was incorporated on 18 December 2010 by Chris Taggart and Rob McKinnon, and the website was officially launched on 20th.

GeoSPARQL is a standard for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language.

The Extended Semantic Web Conference, formerly known as the European Semantic Web Conference, is a yearly international academic conference on the topic of the Semantic Web. The event began in 2004, as the European Semantic Web Symposium. The goal of the event is "to bring together researchers and practitioners dealing with different aspects of semantics on the Web".

Enhanced publications or enhanced ebooks are a form of electronic publishing for the dissemination and sharing of research outcomes, whose first formal definition can be tracked back to 2009. As many forms of digital publications, they typically feature a unique identifier and descriptive metadata information. Unlike traditional digital publications, enhanced publications are often tailored to serve specific scientific domains and are generally constituted by a set of interconnected parts corresponding to research assets of several kinds and to textual descriptions of the research. The nature and format of such parts and of the relationships between them, depends on the application domain and may largely vary from case to case.

<span class="mw-page-title-main">Klaus Tochtermann</span>

Klaus Tochtermann is a professor in the Institute for Computer Science at Kiel University and also the director of the ZBW – German National Library of Economics – Leibniz Information Centre for Economics.

<span class="mw-page-title-main">Author name disambiguation</span>

Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

Semantic Scholar is a research tool for scientific literature powered by artificial intelligence. It is developed at the Allen Institute for AI and was publicly released in November 2015. Semantic Scholar uses modern techniques in natural language processing to support the research process, for example by providing automatically generated summaries of scholarly papers. The Semantic Scholar team is actively researching the use of artificial intelligence in natural language processing, machine learning, human–computer interaction, and information retrieval.

In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.

<span class="mw-page-title-main">Microsoft Academic</span> Online bibliographic database

Microsoft Academic was a free internet-based academic search engine for academic publications and literature, developed by Microsoft Research in 2016 as a successor of Microsoft Academic Search. Microsoft Academic was shut down in 2022. Both OpenAlex and The Lens claim to be successors to Microsoft Academic.

The Computer Science Ontology (CSO) is an automatically generated taxonomy of research topics in the field of Computer Science. It was produced by the Open University in collaboration with Springer Nature by running an information extraction system over a large corpus of scientific articles. Several branches were manually improved by domain experts. The current version includes about 14K research topics and 160K semantic relationships.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics or relationships underlying these entities.

Datacommons.org is an open knowledge graph hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph. The Datacommons.org site was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network. Google has worked with partners including the United States Census, the World Bank, and US Bureau of Labor Statistics to populate the repository, which also hosts data from Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation. The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019. In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.

<span class="mw-page-title-main">Ontotext GraphDB</span> RDF-store

Ontotext GraphDB is a graph database and knowledge discovery tool compliant with RDF and SPARQL and available as a high-availability cluster. Ontotext GraphDB is used in various European research projects.

References

  1. "Springer Nature SciGraph". EurekAlert!. Retrieved 2021-10-25.
  2. Rucci, Enzo (2020). Cloud Computing, Big Data & Emerging Topics: 8th Conference, JCC-BD&ET 2020, La Plata, Argentina, September 8-10, 2020, Proceedings. Cham, Switzerland: Springer Nature. p. 86. doi:10.1007/978-3-030-61218-4_6. ISBN   978-3-030-61217-7. ISSN   1865-0929. OCLC   1204142972.
  3. 1 2 "Springer Nature Uses LOD to Create a Rich Database for Scientists to Work Together". Ontotext. Retrieved 2021-11-04.
  4. 1 2 Hammond, Tony; Pasin, Michele; Theodoris, Evangelos (2017). Data integration and disintegration: Managing Springer Nature SciGraph with SHACL and OWL (PDF). ISWC (Posters, Demos & Industry Tracks). Kobe, Japan. ISSN   1613-0073. S2CID   45786582 . Retrieved October 26, 2021.
  5. "SciGraph – Access". 22 December 2017. Retrieved 2021-10-25.
  6. 1 2 González-Beltrán, Alejandra; Osborne, Francesco; Peroni, Silvio; Vahdati, Sahar (2018). Semantics, Analytics, Visualization: 3rd International Workshop, SAVE-SD 2017, Perth, Australia, April 3, 2017, and 4th International Workshop, SAVE-SD 2018, Lyon, France, April 24, 2018, Revised Selected Papers. Cham: Springer. p. 64. doi:10.1007/978-3-030-01379-0_5. ISBN   978-3-030-01378-3.
  7. Garcia-Silva, Andres; Gómez-Pérez, José Manuél (1 April 2018). Not Just About Size - A Study on the Role of Distributed Word Representations in the Analysis of Scientific Publications (PDF). Dl4Kgs@Eswc 2018. Heraklion, Greece. pp. 21–32. arXiv: 1804.01772 . Bibcode:2018arXiv180401772G. ISSN   1613-0073.
  8. Light, Ryan; Moody, James (2020). The Oxford Handbook of Social Networks. New York: Oxford University Press. p. 603. ISBN   978-0-19-025176-5.
  9. Jose, Joemon M.; Yilmaz, Emine; Magalhães, João; Castells, Pablo; Ferro, Nicola; Silva, Mário J.; Martins, Flávio (2020). Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Cham, Switzerland: Springer Nature. p. 254. arXiv: 1912.13080 . doi:10.1007/978-3-030-45442-5_31. ISBN   978-3-030-45438-8.
  10. Bespalov, Anton; Michel, Martin C.; Steckler, Thomas (2020). Good Research Practice in Non-Clinical Pharmacology and Biomedicine. Handbook of Experimental Pharmacology. Vol. 257. Cham: Springer Nature. p. 343. doi:10.1007/164_2019_290. ISBN   978-3-030-33655-4. PMID   31691858. S2CID   207902492.
  11. Gayo, Jose Emilio Labra; Prud'hommeaux, Eric; Boneva, Iovka; Kontokostas, Dimitris (2018). "Applications". Validating RDF Data. Synthesis Lectures on Data, Semantics, and Knowledge. Morgan & Claypool Publishers. p. 212. doi:10.1007/978-3-031-79478-0_6. ISBN   978-1-68173-164-3. OCLC   1019932975.