GigaDB

Last updated
GigaDB
GigaDB Logo.png
Type of site
Disciplinary repository
Available inEnglish
Owner GigaScience/China National GeneBank
URL gigadb.org
Launched2011
Current statusActive
Content license
CC0

GigaDB (GigaScience DataBase) is a disciplinary repository launched in 2011 [1] with the aim of ensuring long-term access to massive multidimensional datasets from life science and biomedical science studies. The datasets are diverse and include genomic, transcriptomic, and imaging data. The datasets are curated by GigaDB biocurators who are employed by BGI and China National GeneBank.

In its inception, GigaDB was designed as the supporting archive for large-scale research data submitted to the GigaScience Press data journals GigaScience and Gigabyte whose focus are on ensuring reproducibility and reusability of biological and biomedical research. The scope of GigaDB has broadened to include computational research objects such as synthetic data, software and workflows. [2] The database uses Genomics Standard Consortium (GSC)-approved sample attributes and standards, also collaborating with the GSC to ensure data are comprehensive and discoverable. [3] Datasets hosted in GigaDB are defined as a group of files and metadata that support a specific article or study. For each published GigaDB dataset, a DataCite digital object identifier is assigned and the data are indexed and discoverable in NCBI Datamed and the Clarivate Analytics Data Citation Index. GigaDB has also collaborated with Repositive to boost the discoverability of their human datasets. [4]

Related Research Articles

<span class="mw-page-title-main">Biological database</span>

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

<span class="mw-page-title-main">Generic Model Organism Database</span>

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

Mark Bender Gerstein is an American scientist working in bioinformatics and Data Science. As of 2009, he is co-director of the Yale Computational Biology and Bioinformatics program.

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

<span class="mw-page-title-main">Genomic Standards Consortium</span>

The Genomic Standards Consortium (GSC) is an initiative working towards richer descriptions of our collection of genomes, metagenomes and marker genes. Established in September 2005, this international community includes representatives from a range of major sequencing and bioinformatics centres and research institutions. The goal of the GSC is to promote mechanisms for standardizing the description of (meta)genomes, including the exchange and integration of (meta)genomic data. The number and pace of genomic and metagenomic sequencing projects will only increase as the use of ultra-high-throughput methods becomes common place and standards are vital to scientific progress and data sharing.

Enhanced publications or enhanced ebooks are a form of electronic publishing for the dissemination and sharing of research outcomes, whose first formal definition can be tracked back to 2009. As many forms of digital publications, they typically feature a unique identifier and descriptive metadata information. Unlike traditional digital publications, enhanced publications are often tailored to serve specific scientific domains and are generally constituted by a set of interconnected parts corresponding to research assets of several kinds and to textual descriptions of the research. The nature and format of such parts and of the relationships between them, depends on the application domain and may largely vary from case to case.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

<i>GigaScience</i> Academic journal

GigaScience is a peer-reviewed scientific journal that was established in 2012. It covers research and large data-sets that result from work in the biomedical and life sciences. The editor-in-chief is Scott Edmunds. Originally, the journal was co-published by BioMed Central and the Beijing Genomics Institute (BGI). In 2016, it left BioMed Central to form a new partnership between the GigaScience Press department of BGI and Oxford University Press. In 2018, GigaScience won the Association of American Publishers' PROSE Award for Innovation in journal publishing in the multidisciplinary category.

Julio Collado-Vides is a Guatemalan scientist and Professor of Computational Genomics at the National Autonomous University of Mexico. His research focuses on genomics and bioinformatics.

The Plant Genomics and Phenomics Research Data Repository (PGP) is a data publication infrastructure to comprehensively publish multi-domain plant research data. It is hosted at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben, Germany. The repository hosts DOI citeable datasets that are not being published in public repositories because of their volume or data scope. PGP enables the publication of gigabyte-scale datasets and is registered as a research data repository at FAIRSharing.org, re3data.org and OpenAIRE as a valid EU Horizon 2020 open data archive. The above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. The PGP repository was created using the e!DAL software infrastructure and applies an on-premises approach to "bring the infrastructure to the data" (I2D).

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

<i>Gigabyte</i> (journal) Academic journal


GigaByte is a peer-reviewed open-science journal published by GigaScience Press since 2020. It focuses on short, focused, data-driven articles describing and sharing open research data sets and software. Using an exclusively XML-based publishing system that automates the production process to make it simple to change views, languages and embed interactive content, in 2022 it won the ALPSP Award for Innovation in Publishing.

References

  1. Sneddon TP, Li P, Edmunds SC (July 2012). "GigaDB: announcing the GigaScience database". GigaScience. 1 (1): 11. doi: 10.1186/2047-217X-1-11 . PMC   3626507 . PMID   23587345.
  2. Xiao SZ, Armit C, Edmunds S, Goodman L, Li P, Tuli MA, Hunter CI (January 2019). "Increased interactivity and improvements to the GigaScience database, GigaDB". Database. 2019: baz016. doi:10.1093/database/baz016. PMC   6376146 . PMID   30753480.
  3. Armit C, Tuli MA, Hunter CI (June 2022). "A Decade of GigaScience: GigaDB and the Open Data Movement". GigaScience. 11. doi:10.1093/gigascience/giac053. PMC   9197680 . PMID   35701374.
  4. "Repositive developing premium gene data collaboration platform for drug researchers". biopharma-reporter.com. 2 February 2016. Retrieved 2019-07-01.