Global Biodata Coalition

Last updated

The Global Biodata Coalition is an organization promoting biocuration and fostering support of research funders for the sustainability of biological data resources. [1] [2] [3]

Contents

Global Core Biodata Resources (GCBRs)

The organization maintains a list of resources, representing "critical components for ensuring the reproducibility and integrity of life sciences research." [4]

As of 2023, the list includes the following organizations: [5]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

<span class="mw-page-title-main">Michael Ashburner</span> English biologist (1942–2023)

Michael Ashburner was an English biologist and Professor in the Department of Genetics at University of Cambridge. He was also the former joint-head and co-founder of the European Bioinformatics Institute (EBI) of the European Molecular Biology Laboratory (EMBL) and a Fellow of Churchill College, Cambridge.

BRENDA is an information system representing one of the most comprehensive enzyme repositories. It is an electronic resource that comprises molecular and biochemical information on enzymes that have been classified by the IUBMB. Every classified enzyme is characterized with respect to its catalyzed biochemical reaction. Kinetic properties of the corresponding reactants are described in detail. BRENDA contains enzyme-specific data manually extracted from primary scientific literature and additional data derived from automatic information retrieval methods such as text mining. It provides a web-based user interface that allows a convenient and sophisticated access to the data.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<span class="mw-page-title-main">ENCODE</span> Research consortium investigating functional elements in human and model organism DNA

The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims "to build a comprehensive parts list of functional elements in the human genome."

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

Chemical Entities of Biological Interest, also known as ChEBI, is a chemical database and ontology of molecular entities focused on 'small' chemical compounds, that is part of the Open Biomedical Ontologies (OBO) effort at the European Bioinformatics Institute (EBI). The term "molecular entity" refers to any "constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity". The molecular entities in question are either products of nature or synthetic products which have potential bioactivity. Molecules directly encoded by the genome, such as nucleic acids, proteins and peptides derived from proteins by proteolytic cleavage, are not as a rule included in ChEBI.

Reactome is a free online database of biological pathways. It is manually curated and authored by PhD-level biologists, in collaboration with Reactome editorial staff. The content is cross-referenced to many bioinformatics databases. The rationale behind Reactome is to visually represent biological pathways in full mechanistic detail, while making the source data available in a computationally accessible format.

FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

Xenbase is a Model Organism Database (MOD), providing informatics resources, as well as genomic and biological data on Xenopus frogs. Xenbase has been available since 1999, and covers both X. laevis and X. tropicalis Xenopus varieties. As of 2013 all of its services are running on virtual machines in a private cloud environment, making it one of the first MODs to do so. Other than hosting genomics data and tools, Xenbase supports the Xenopus research community though profiles for researchers and laboratories, and job and events postings.

The IUPHAR/BPS Guide to PHARMACOLOGY is an open-access website, acting as a portal to information on the biological targets of licensed drugs and other small molecules. The Guide to PHARMACOLOGY is developed as a joint venture between the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). This replaces and expands upon the original 2009 IUPHAR Database. The Guide to PHARMACOLOGY aims to provide a concise overview of all pharmacological targets, accessible to all members of the scientific and clinical communities and the interested public, with links to details on a selected set of targets. The information featured includes pharmacological data, target, and gene nomenclature, as well as curated chemical information for ligands. Overviews and commentaries on each target family are included, with links to key references.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

The Monarch Initiative is a large scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research that combats genetic diseases. Monarch does this by integrating multi-species genotype, phenotype, genetic variant and disease knowledge from various existing biomedical data resources into a centralized and structured database. While this integration process has been traditionally done manually by basic researchers and clinicians on a case-by-case basis, The Monarch Initiative provides an aggregated and structured collection of data and tools that make biomedical knowledge exploration more efficient and effective.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

References

  1. Cook, Chuck; Cochrane, Guy (2023-09-08). "The Global Biodata Coalition: Towards a sustainable biodata infrastructure". Biodiversity Information Science and Standards. 7. doi: 10.3897/biss.7.112303 . ISSN   2535-0897. Archived from the original on 2023-12-11. Retrieved 2023-12-11.
  2. "Global Biodata Coalition: White Paper on Open Data Strategies". RDA. 2023-11-27. Archived from the original on 2023-12-11. Retrieved 2023-12-11.
  3. "Global Biodata Coalition coordinates worldwide funding of data resources". Genome.gov. Archived from the original on 2023-12-11. Retrieved 2023-12-11.
  4. "GBIF named a Global Core Biodata Resource". www.gbif.org. Archived from the original on 2023-12-11. Retrieved 2023-12-11.
  5. "List of Current Global Core Biodata Resources". Global Biodata Coalition. Archived from the original on 2023-12-11. Retrieved 2023-12-11.