Original author(s) | Cornell University Library |
---|---|
Stable release | 1.15.0 / 15 July 2024 |
Repository | github |
Written in | Java, Web Ontology Language |
License | Apache License |
Website | www |
VIVO is a web-based, open-source suite of computer software for managing data about researchers, scientists, and faculty members. VIVO uses Semantic Web techniques to represent people and their work. As of 2020, it is used by dozens of universities and the United States Department of Agriculture. [1]
The Cornell University Library originally created VIVO in 2003 as a "virtual life sciences community". [2] In 2009, the National Institutes of Health awarded a $12.2 million grant to University of Florida, Cornell University, Indiana University, Ponce School of Medicine, The Scripps Research Institute, Washington University in St. Louis, and Weill Cornell Medical College to expand the tool for use outside of Cornell. [3]
VIVO can harvest publication data from PubMed, CSV files, relational databases, or OAI-PMH harvest. It then uses a semi-automated process to match publications to researchers. [4] It also harvests information about researchers from Human Resources systems and student information systems. [5]
The VIVO ontology incorporates elements of several established ontologies, including Dublin Core, Basic Formal Ontology, Bibliographic Ontology, FOAF, and SKOS. The ontology can be used to describe several roles of faculty members, including research, teaching, and service. [6]
The Dutch Data Archiving and Networked Services and Indiana University worked to develop the ontology to enable bilingual modeling of researchers. [7]
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can thus be seen as a combination and extension of a dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. It was first created in the English language and the English WordNet database and software tools have been released under a BSD style license and are freely available for download from that WordNet website. Until about 2024 the English WordNet could be used as an online dictionary/lexical database, and references with links to single words could be made, but thereafter one have to download the database to use it. There are now WordNets in more than 200 languages.
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.
An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For annotations of different digital media, see web annotation and text annotation.
Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".
The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people who build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.
Data preprocessing can refer to manipulation, filtration or augmentation of data before it is analyzed, and is often an important step in the data mining process. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues.
Dieter Fensel was a German researcher in the field of formal languages and the semantic web. He was University Professor at the University of Innsbruck, where he directed the Semantic Technologies Institute Innsbruck, a research center associated with the university.
Swoogle was a search engine for Semantic Web ontologies, documents, terms and data published on the Web. Swoogle employed a system of crawlers to discover RDF documents and HTML documents with embedded RDF content. Swoogle reasoned about these documents and their constituent parts and recorded and indexed meaningful metadata about them in its database.
Amit Sheth is a computer scientist at University of South Carolina in Columbia, South Carolina. He is the founding Director of the Artificial Intelligence Institute, and a professor of Computer Science and Engineering. From 2007 to June 2019, he was the Lexis Nexis Ohio Eminent Scholar, director of the Ohio Center of Excellence in Knowledge-enabled Computing, and a professor of Computer Science at Wright State University. Sheth's work has been cited by over 48,800 publications. He has an h-index of 117, which puts him among the top 100 computer scientists with the highest h-index. Prior to founding the Kno.e.sis Center, he served as the director of the Large Scale Distributed Information Systems Lab at the University of Georgia in Athens, Georgia.
A digital library is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital media formats or a library accessible through the internet. Objects can consist of digitized content like print or photographs, as well as originally produced digital content like word processor files or social media posts. In addition to storing content, digital libraries provide means for organizing, searching, and retrieving the content contained in the collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals or organizations. The digital content may be stored locally, or accessed remotely via computer networks. These information retrieval systems are able to exchange information with each other through interoperability and sustainability.
The Semantic Sensor Web (SSW) is a marriage of sensor web and semantic Web technologies. The encoding of sensor descriptions and sensor observation data with Semantic Web languages enables more expressive representation, advanced access, and formal analysis of sensor resources. The SSW annotates sensor data with spatial, temporal, and thematic semantic metadata. This technique builds on current standardization efforts within the Open Geospatial Consortium's Sensor Web Enablement (SWE) and extends them with Semantic Web technologies to provide enhanced descriptions and access to sensor data.
Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.
The iPlant Collaborative, renamed Cyverse in 2017, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research.
Research networking (RN) is about using tools to identify, locate and use research and scholarly information about people and resources. Research networking tools serve as knowledge management systems for the research enterprise. RN tools connect institution-level/enterprise systems, national research networks, publicly available research data, and restricted/proprietary data by harvesting information from disparate sources into compiled profiles for faculty, investigators, scholars, clinicians, community partners and facilities. RN tools facilitate collaboration and team science to address research challenges through the rapid discovery and recommendation of researchers, expertise and resources.
Katy Börner is an engineer, scholar, author, educator, and speaker specializing in data analysis and visualization, particularly in the areas of science and technology (S&T) studies and biomedical applications. Based out of Indiana University, Bloomington, Börner is the Victor Yngve Distinguished Professor of Engineering & Information Science in the Department of Intelligent Systems Engineering and the Department of Information and Library Science at the Luddy School of Informatics, Computing, and Engineering and a member of the Core Cognitive Science Faculty. Since 2012, she has also held the position of visiting professor at the Royal Netherlands Academy of Arts and Sciences (KNAW) in Amsterdam, the Netherlands, and in 2017-2019, she was a Humboldt Fellow at Dresden University of Technology, Germany.
Jure Leskovec is a Slovenian computer scientist, entrepreneur and associate professor of Computer Science at Stanford University focusing on networks. He was the chief scientist at Pinterest and co-founder / chief scientist AI graph-ML startup Kumo.
In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.
In semantics and lexical typology, colexification is the ability for a language to express different meanings with the same word.
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.