GDB Human Genome Database

Last updated
GDB Human Genome Database
Content
Data types
captured
Genetic mapping data
Organisms Homo sapiens
Contact
Research center
Primary citation PMID   2041809
Release date1989

The GDB Human Genome Database was a community curated collection of human genomic data. It was a key database in the Human Genome Project [1] [2] and was in service from 1989 to 2008.

Contents

History

In 1989 the Howard Hughes Medical Institute provided funding to establish a central repository for human genetic mapping data. This project ultimately resulted in the creation of the GDB Human Genome DataBase in September 1990. [3] [4] In order to ensure a high degree of quality, records within GDB were subjected to a curation process by human genetics specialists, including the HUGO Gene Nomenclature Committee. [5]

Established under the leadership of Peter Pearson and Dick Lucier, [6] GDB received financial support from the US Department of Energy and the National Institutes of Health. [3] Located at the Johns Hopkins University School of Medicine, GDB became a source of high quality mapping data which were made available both online as well as through numerous printed publications.[ citation needed ] The project was supported internationally by the EU, Japan, and other countries.

The GDB had several directors in its time. Peter Pearson, David T. Kingsbury, Stantley Letovsky, Peter Li, and A. Jamie Cuticchia.[ citation needed ]

Funds from the US Department of Energy that were previously allocated for GDB were transferred in 1998 due to the shift in emphasis in the human genome project. [7] However that same year, A. Jamie Cuticchia obtained funding from Canadian public and private sources to continue the operations of GDB. While the data curation continued to be performed at Johns Hopkins, GDB central operations were moved to The Hospital for Sick Children (HSC) in Toronto, Ontario, Canada. [8] In November 2001, the HSC fired Cuticchia due to a dispute over the GDB website domain name. [7]

In 2003 RTI International became the new host for GDB where it continued to be maintained as a public resource; [9] GDB was closed in 2008 after control of the project reverted to Johns Hopkins. [10]

Related Research Articles

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 37.0, was released in June 2024 and contains 21,979 families. It is currently provided through InterPro website.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

Anthony James Cuticchia Jr. was an American scientist with expertise in the fields of genetics, bioinformatics, and genomics. In particular, he was responsible for the collection of the data constituting the human gene map, prior to the final sequencing of the genome. He was also a practicing attorney. He died due to cancer on January 6, 2022.

<span class="mw-page-title-main">Amos Bairoch</span> Swiss bioinformatician

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">BioGRID</span> Biological database

The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.

<span class="mw-page-title-main">Integrated Microbial Genomes System</span> Genome browsing and annotation platform

The Integrated Microbial Genomes system is a genome browsing and annotation platform developed by the U.S. Department of Energy (DOE)-Joint Genome Institute. IMG contains all the draft and complete microbial genomes sequenced by the DOE-JGI integrated with other publicly available genomes. IMG provides users a set of tools for comparative analysis of microbial genomes along three dimensions: genes, genomes and functions. Users can select and transfer them in the comparative analysis carts based upon a variety of criteria. IMG also includes a genome annotation pipeline that integrates information from several tools, including KEGG, Pfam, InterPro, and the Gene Ontology, among others. Users can also type or upload their own gene annotations and the IMG system will allow them to generate Genbank or EMBL format files containing these annotations.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

<span class="mw-page-title-main">STRING</span>

In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.

<span class="mw-page-title-main">Richard M. Durbin</span> British computational biologist

Richard Michael Durbin is a British computational biologist and Al-Kindi Professor of Genetics at the University of Cambridge. He also serves as an associate faculty member at the Wellcome Sanger Institute where he was previously a senior group leader.

The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region contains the cis-regulatory elements of most of the transcription factors (TFs).

EPD is a biological database and web resource of eukaryotic RNA polymerase II promoters with experimentally defined transcription start sites. Originally, EPD was a manually curated resource relying on transcript mapping experiments targeted at individual genes and published in academic journals. More recently, automatically generated promoter collections derived from electronically distributed high-throughput data produced with the CAGE or TSS-Seq protocols were added as part of a special subsection named EPDnew. The EPD web server offers additional services, including an entry viewer which enables users to explore the genomic context of a promoter in a UCSC Genome Browser window, and direct links for uploading EPD-derived promoter subsets to associated web-based promoter analysis tools of the Signal Search Analysis (SSA) and ChIP-Seq servers. EPD also features a collection of position weight matrices (PWMs) for common promoter sequence motifs.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

References

  1. "Human Genome News, September-December 1995: 7(3-4):15". web.ornl.gov.
  2. Guyer, M. S.; Collins, F. S. (21 November 1995). "How is the Human Genome Project doing, and what have we learned so far?". Proceedings of the National Academy of Sciences. 92 (24): 10841–10848. Bibcode:1995PNAS...9210841G. doi: 10.1073/pnas.92.24.10841 . PMC   40527 . PMID   7479895.
  3. 1 2 Cuticchia, A.Jamie; Fasman, Kenneth H.; Kingsbury, David T.; Robbins, Robert J.; Pearson, Peter L. (1993). "The GDB TM human genome data base anno 1993". Nucleic Acids Research. 21 (13): 3003–3006. doi:10.1093/nar/21.13.3003. PMC   309725 . PMID   8332522.
  4. Cuticchia, A.J. (27 Dec 1999). "Future vision of the GDB human genome database". Human Mutation. 15 (1): 62–67. doi: 10.1002/(SICI)1098-1004(200001)15:1<62::AID-HUMU13>3.0.CO;2-R . PMID   10612824. S2CID   25606440.
  5. Letovsky, S. (1 January 1998). "GDB: the Human Genome Database". Nucleic Acids Research. 26 (1): 94–99. doi: 10.1093/nar/26.1.94 . PMC   147203 . PMID   9399808.
  6. Pearson, P.L. (25 April 1991). "The genome data base (GDB)--a human gene mapping repository". Nucleic Acids Research. 19 (suppl): 2237–2239. doi:10.1093/nar/19.suppl.2237. PMC   331357 . PMID   2041809.
  7. 1 2 Bonetta, Laura (November 2001). "Sackings leave gene database floundering". Nature. 414 (6862): 384. Bibcode:2001Natur.414..384B. doi: 10.1038/35106703 . PMID   11719765.
  8. "Human Genome News Vol. 10, No. 1-2, February 1999". web.ornl.gov. Retrieved 3 September 2020.
  9. Seewald, A.K. (2004). "Ranking for BioMinT: investigating performance, local search and homonymy recognition". Proceedings of the Symposium on Knowledge Exploration in Life Science Informatics (KELSI 2004). CiteSeerX   10.1.1.117.8840 . doi:10.1007/978-3-540-30478-4_10.
  10. Galperin, M. Y.; Cochrane, G. R. (1 January 2009). "Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009". Nucleic Acids Research. 37 (Database): D1–D4. doi: 10.1093/nar/gkn942 . PMC   2686608 . PMID   19033364.