Biositemap

Last updated

A Biositemap is a way for a biomedical research institution of organisation to show how biological information is distributed throughout their Information Technology systems and networks. This information may be shared with other organisations and researchers.

Contents

The Biositemap enables web browsers, crawlers and robots to easily access and process the information to use in other systems, media and computational formats. Biositemaps protocols provide clues for the Biositemap web harvesters, allowing them to find resources and content across the whole interlink of the Biositemap system. This means that human or machine users can access any relevant information on any topic across all organisations throughout the Biositemap system and bring it to their own systems for assimilation or analysis.

File framework

iTools representation of a biositemap Biositemap iTools NCBC.png
iTools representation of a biositemap

The information is normally stored in a biositemap.rdf or biositemap.xml file which contains lists of information about the data, software, tools material and services provided or held by that organisation. Information is presented in metafields and can be created online through sites such as the biositemaps online editor. [1]

The information is a blend of sitemaps and RSS feeds and is created using the Information Model (IM) and Biomedical Resource Ontology (BRO). The IM is responsible for defining the data held in the metafields and the BRO controls the terminology of the data held in the resource_type field. The BRO is critical in aiding the interactivity of both the other organisations and third parties to search and refine those searches.

Data formats

The Biositemaps Protocol [2] allows scientists, engineers, centers and institutions engaged in modeling, software tool development and analysis of biomedical and informatics data to broadcast and disseminate to the world the information about their latest computational biology resources (data, software tools and web services). The biositemap concept is based on ideas from Efficient, Automated Web Resource Harvesting [3] and Crawler-friendly Web Servers, [4] and it integrates the features of sitemaps and RSS feeds into a decentralized mechanism for computational biologists and bio-informaticians to openly broadcast and retrieve meta-data about biomedical resources.

These site, institution, or investigator specific biositemap descriptions are published in RDF format online and are searched, parsed, monitored and interpreted by web search engines, web applications specific to biositemaps and ontologies, and other applications interested in discovering updated or novel resources for bioinformatics and biomedical research investigations. The biositemap mechanism separates the providers of biomedical resources (investigators or institutions) from the consumers of resource content (researchers, clinicians, news media, funding agencies, educational and research initiatives).

A Biositemap is an RDF file that lists the biomedical and bioinformatics resources for a specific research group or consortium. It allows developers of biomedical resources to describe the functionality and usability of each of their software tools, databases or web-services. [2] [5]

Biositemaps supplement and do not replace the existing frameworks for dissemination of data, tools and services. Using a biositemap does not guarantee that resources will be included in search indexes nor does it influence the way that tools are ranked or perceived by the community. What the Biositemaps protocol will do is provide clues, information and directives to all Biositemap web harvesters that point to the existence and content of biomedical resources at different sites.

Biositemap Information Model

The Biositemap protocol relies on an extensible information model that includes specific properties [6] that are commonly used and necessary for characterizing biomedical resources:

Up-to-date documentation on the information model is available at the Biositemaps website.

See also

Related Research Articles

Bioinformatics Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

In computer science and information science, an ontology encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.

Web annotation refers to

  1. online annotations of web resources such as web pages or parts of them, and
  2. a set of W3C standards developed for this purpose.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences,. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.

Expasy is an online bioinformatics resource operated by the SIB Swiss Institute of Bioinformatics. It is an extensible and integrative portal which provides access to over 160 databases and software tools and supports a range of life science and clinical research areas, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. The individual resources are hosted in a decentralised way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions.

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

Ontotext is a Bulgarian software company headquartered in Sofia. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

Robert David Stevens

Robert David Stevens is a professor of bio-health informatics. and Head of Department of Computer Science at The University of Manchester

The National Centers for Biomedical Computing (NCBCs) are part of the U.S. National Institutes of Health plan to develop and implement the core of a universal computing infrastructure that is urgently needed to speed progress in biomedical research. Their mission is to create innovative software programs and other tools that will enable the biomedical community to integrate, analyze, model, simulate, and share data on human health and disease.

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response to the requirements of the United Kingdom academic community.

iTools Resourceome

iTools is a distributed infrastructure for managing, discovery, comparison and integration of computational biology resources. iTools employs Biositemap technology to retrieve and service meta-data about diverse bioinformatics data services, tools, and web-services. iTools is developed by the National Centers for Biomedical Computing as part of the NIH Road Map Initiative.

Ontology engineering field which studies the methods and methodologies for building ontologies

In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which are formal representations of a set of concepts within a domain and the relationships between those concepts. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.

Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components developed specifically to provide a complete Web application framework. OSF is made available under the Apache 2 license.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

DisGeNET is a discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET is one of the largest and comprehensive repositories of human gene-disease associations (GDAs) currently available. It also offers a set of bioinformatic tools to facilitate the analysis of these data by different user profiles. It is maintained by the Integrative Biomedical Informatics (IBI) Group, of the (GRIB)-IMIM/UPF, based at the Barcelona Biomedical Research Park (PRBB), Barcelona, Spain.

Biocuration

Biocuration is the field of life sciences research dedicated to translating and integrating biomedical knowledge from scientific articles to interoperable databases. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians.

References

  1. Biositemaps online editor Archived July 30, 2010, at the Wayback Machine
  2. 1 2 Dinov ID, Rubin D, Lorensen W, et al. (2008). "iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources". PLOS ONE. 3 (5): e2265. Bibcode:2008PLoSO...3.2265D. doi:10.1371/journal.pone.0002265. PMC   2386255 . PMID   18509477.
  3. M.L. Nelson; J.A. Smith; del Campo; H. Van de Sompel; X. Liu (2006). "Efficient, Automated Web Resource Harvesting" (PDF). WIDM'06.
  4. Brandman O; Cho J; Garcia-Molina H; Shivakumar N (2000). "Crawler-friendly Web Servers". ACM SIGMETRICS Performance Evaluation Review. 28 (2): 9–14. CiteSeerX   10.1.1.34.7957 . doi:10.1145/362883.362894. S2CID   5732912.
  5. Cannata N, Merelli E, Altman RB (December 2005). "Time to organize the bioinformatics resourceome". PLOS Comput. Biol. 1 (7): e76. Bibcode:2005PLSCB...1...76C. doi:10.1371/journal.pcbi.0010076. PMC   1323464 . PMID   16738704.
  6. Chen YB, Chattopadhyay A, Bergen P, Gadd C, Tannery N (January 2007). "The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System—a one-stop gateway to online bioinformatics databases and software tools". Nucleic Acids Res. 35 (Database issue): D780–5. doi:10.1093/nar/gkl781. PMC   1669712 . PMID   17108360.