CyberCell

The CyberCell database
Content
Description	A database providing quantitative genomic, proteomic, and metabolomic data of E. coli.
Data types; captured	Gene and protein data; functional or ontological information; gene position and protein location; protein, metabolite and RNA expression levels; protein interaction and protein stoichiometry information; enzyme rate constants; metabolite structures, reactions and pathways; lists of cofactors and ligands; other molecular data.
Contact
Research center	University of Alberta
Laboratory	David S. Wishart
Primary citation
Access
Website	http://ccdb.wishartlab.com/CCDB/
Miscellaneous
Data release; frequency	Last updated on 2006

Last updated August 31, 2023

The CyberCell Database (CCDB) is a freely available, web-accessible database that provides quantitative genomic, proteomic as well metabolomic data on Escherichia coli.^[1] Escherichia coli (strain K12) is perhaps the best-studied bacterium on the planet and has been the organism of choice for several international efforts in cell simulation. These cell simulation efforts require up-to-date web-accessible resources that provide comprehensive, non-redundant, and quantitative data on this bacterium. The intent of CCDB is to facilitate the collection, revision, coordination and storage of the key information required for in silico E. coli simulation.^[1]

Content

The CCDB contains four different browsable databases providing gene/protein information (CCDB), 3D protein structure data (CC3D), tRNA and rRNA information (CCRD), and metabolite data (CCMD), respectively. The data has been collected or generated using various sources and tools, including textbooks, published scientific articles, electronic databases, in house software as well as web-based programs. Each database exists as a re-formattable, easily browsed synoptic table which allows users to casually scroll through the different databases. Detailed information about each gene, protein, RNA, 3D structure or metabolite may be obtained by clicking on the ‘ColiCard’ on the left column. Every card contains more than 60 fields concerning all aspects of the sequence, function or structure of a given molecule as well as hyperlinks to other sources of information such as EcoGene ^[2] and EcoCyc,^[3] scientific abstracts, and interactive applets to view structures or chromosomal maps. One of the more attractive characteristics of CCDB is its ability to support database searching and sorting. It offers utilities for local BLAST searches, Boolean text searches, chemical structure searches, and relational database extraction. The results of the latter can be exported in various formats including HTML, Excel and a circular chromosome applet view. One of the most popular features in the CCDB are its “E. coli Statistics” pages (available through the “Stats” link at the top of the CCDB home page). This link provides a rich source of information about the content, dimensions and physical-chemical characteristics of the E. coli cell.

Community annotation

In order to facilitate the correction and submission of data, the CyberCell database allows users to electronically edit or update ‘ColiCards’. These modifications are permanently added only after they have been reviewed and accepted by an archivist. In addition to modifications by users, CCDB also performs automated self-updating operations on a regular basis, which keeps the database up-to-date.

Scope and Access

All data in the CCDB is non-proprietary or is derived from a non-proprietary source. It is freely accessible and available to anyone. In addition, nearly every data item is fully traceable and explicitly referenced to the original source. CCDB data is available through a public web interface and downloads.

Related Research Articles

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The small molecule chemicals found in a given metabolome may include both endogenous metabolites that are naturally produced by an organism as well as exogenous chemicals that are not naturally produced by an organism.

Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. A reconstruction breaks down metabolic pathways into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.

The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets created and maintained by the University of Alberta and The Metabolomics Innovation Centre located in Alberta, Canada. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data with comprehensive drug target information. DrugBank has used content from Wikipedia; Wikipedia also often links to Drugbank, posing potential circular reporting issues.

A colicin is a type of bacteriocin produced by and toxic to some strains of Escherichia coli. Colicins are released into the environment to reduce competition from other bacterial strains. Colicins bind to outer membrane receptors, using them to translocate to the cytoplasm or cytoplasmic membrane, where they exert their cytotoxic effect, including depolarisation of the cytoplasmic membrane, DNase activity, RNase activity, or inhibition of murein synthesis.

In bioinformatics EcoCyc is a biological database for the bacterium Escherichia coli K-12. The EcoCyc project performs literature-based curation of the E. coli genome, and of E. coli transcriptional regulation, transporters, and metabolic pathways. EcoCyc contains written summaries of E. coli genes, distilled from over 36,000 scientific articles. EcoCyc is also a description of the genome and cellular networks of E. coli that supports scientists to carry out computational analyses.

The BioCyc database collection is an assortment of organism specific Pathway/Genome Databases (PGDBs) that provide reference to genome and metabolic pathway information for thousands of organisms. As of July 2023, there were over 20,040 databases within BioCyc. SRI International, based in Menlo Park, California, maintains the BioCyc database family.

Therapeutic Target Database (TTD) is a pharmaceutical and medical repository constructed by the Innovative Drug Research and Bioinformatics Group (IDRB) at Zhejiang University, China and the Bioinformatics and Drug Design Group at the National University of Singapore. It provides information about known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Detailed knowledge about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, and clinical development status. TTD is freely accessible without any login requirement at https://idrblab.org/ttd/.

Bacterial small RNAs (bsRNA) are small RNAs produced by bacteria; they are 50- to 500-nucleotide non-coding RNA molecules, highly structured and containing several stem-loops. Numerous sRNAs have been identified using both computational analysis and laboratory-based techniques such as Northern blotting, microarrays and RNA-Seq in a number of bacterial species including Escherichia coli, the model pathogen Salmonella, the nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti, marine cyanobacteria, Francisella tularensis, Streptococcus pyogenes, the pathogen Staphylococcus aureus, and the plant pathogen Xanthomonas oryzae pathovar oryzae. Bacterial sRNAs affect how genes are expressed within bacterial cells via interaction with mRNA or protein, and thus can affect a variety of bacterial functions like metabolism, virulence, environmental stress response, and structure.

The TisB-IstR toxin-antitoxin system is the first known toxin-antitoxin system which is induced by the SOS response in response to DNA damage.

Mycobacterium tuberculosis contains at least nine small RNA families in its genome. The small RNA (sRNA) families were identified through RNomics – the direct analysis of RNA molecules isolated from cultures of Mycobacterium tuberculosis. The sRNAs were characterised through RACE mapping and Northern blot experiments. Secondary structures of the sRNAs were predicted using Mfold.

The Cervical Cancer gene DataBase (CCDB) is a database of genes involved in the cervical carcinogenesis. The Cervical Cancer Database is the first database that has been manually curated. The database serves as an entity for clinicians and researchers to examine basic information as well as advanced information about the genes that differentiates into cervical cancer. There are 537 genes that have been cataloged into the CCBD. The genes that have been cataloged based on polymorphism, methylation, amplification of genes, and the change in how the gene is expressed. Science investigators have examined data that compared normal cervical cells with malignant cervical cells which has been used to study the different gene expressions that result in cervical cancer. Of the 500,000 women that have succumbed to cervical, most are from developing countries as well as of the low socioeconomic level in developed countries. The CCBD is designed to present information that will novel therapeutic treatments for leading cause of cancer within the population of women.

RegulonDB is a database of the regulatory network of gene expression in Escherichia coli K-12. RegulonDB also models the organization of the genes in transcription units, operons and regulons. A total of 120 sRNAs with 231 total interactions which all together regulate 192 genes are also included. RegulonDB was founded in 1998 and also contributes data to the EcoCyc database.

Escherichia coli contains a number of small RNAs located in intergenic regions of its genome. The presence of at least 55 of these has been verified experimentally. 275 potential sRNA-encoding loci were identified computationally using the QRNA program. These loci will include false positives, so the number of sRNA genes in E. coli is likely to be less than 275. A computational screen based on promoter sequences recognised by the sigma factor sigma 70 and on Rho-independent terminators predicted 24 putative sRNA genes, 14 of these were verified experimentally by northern blotting. The experimentally verified sRNAs included the well characterised sRNAs RprA and RyhB. Many of the sRNAs identified in this screen, including RprA, RyhB, SraB and SraL, are only expressed in the stationary phase of bacterial cell growth. A screen for sRNA genes based on homology to Salmonella and Klebsiella identified 59 candidate sRNA genes. From this set of candidate genes, microarray analysis and northern blotting confirmed the existence of 17 previously undescribed sRNAs, many of which bind to the chaperone protein Hfq and regulate the translation of RpoS. UptR sRNA transcribed from the uptR gene is implicated in suppressing extracytoplasmic toxicity by reducing the amount of membrane-bound toxic hybrid protein.

The Human Metabolome Database (HMDB) is a comprehensive, high-quality, freely accessible, online database of small molecule metabolites found in the human body. It bas been created by the Human Metabolome Project funded by Genome Canada and is one of the first dedicated metabolomics databases. The HMDB facilitates human metabolomics research, including the identification and characterization of human metabolites using NMR spectroscopy, GC-MS spectrometry and LC/MS spectrometry. To aid in this discovery process, the HMDB contains three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data (Fig. 1–3). The chemical data includes 41,514 metabolite structures with detailed descriptions along with nearly 10,000 NMR, GC-MS and LC/MS spectra.

The Yeast Metabolome Database (YMDB) is a comprehensive, high-quality, freely accessible, online database of small molecule metabolites found in or produced by Saccharomyces cerevisiae. The YMDB was designed to facilitate yeast metabolomics research, specifically in the areas of general fermentation as well as wine, beer and fermented food analysis. YMDB supports the identification and characterization of yeast metabolites using NMR spectroscopy, GC-MS spectrometry and Liquid chromatography–mass spectrometry. The YMDB contains two kinds of data: 1) chemical data and 2) molecular biology/biochemistry data. The chemical data includes 2027 metabolite structures with detailed metabolite descriptions along with nearly 4000 NMR, GC-MS and LC/MS spectra.

The E. coli Metabolome Database (ECMDB) is a comprehensive, high-quality, freely accessible, online database of small molecule metabolites found in or produced by Escherichia coli. Escherichia coli is perhaps the best studied bacterium on earth and has served as the "model microbe" in microbiology research for more than 60 years. The ECMDB is essentially an E. coli "omics" encyclopedia containing detailed data on E. coli's genome, proteome and its metabolome. ECMDB is part of a suite of organism-specific metabolomics databases that includes DrugBank, HMDB, YMDB and SMPDB. As a metabolomics resource, the ECMDB is designed to facilitate research in the area gut/microbiome metabolomics and environmental metabolomics. The ECMDB contains two kinds of data: 1) chemical data and 2) molecular biology and/or biochemical data. The chemical data includes more than 2700 metabolite structures with detailed metabolite descriptions along with nearly 5000 NMR, GC-MS and LC-MS spectra corresponding to these metabolites. The biochemical data includes nearly 1600 protein sequences and more than 3100 biochemical reactions that are linked to these metabolite entries. Each metabolite entry in the ECMDB contains more than 80 data fields with approximately 65% of the information being devoted to chemical data and the other 35% of the information devoted to enzymatic or biochemical data. Many data fields are hyperlinked to other databases. The ECMDB also has a variety of structure and pathway viewing applets. The ECMDB database offers a number of text, sequence, spectral, chemical structure and relational query searches. These are described in more detail below.

Monica Riley was an American scientist who contributed to the discovery of messenger RNA in her Ph.D work with Arthur Pardee, and was later a pioneer in the exploration and computer representation of the Escherichia coli genome.

Julio Collado-Vides is a Guatemalan scientist and Professor of Computational Genomics at the National Autonomous University of Mexico. His research focuses on genomics and bioinformatics.

References

1 2 3 Sundararaj, S; Guo A; Habibi-Nazhad B; Rouani M; Stothard P; Ellison M; Wishart DS (2004). "The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli". Nucleic Acids Res. 32 (Database issue): D293-5. doi:10.1093/nar/gkh108. PMC 308842 . PMID 14681416.
↑ Rudd, K.E. (2000). "EcoGene: a genome sequence database for Escherichia coli K-12". Nucleic Acids Res. 28 (1): 60–64. doi:10.1093/nar/28.1.60. PMC 102481 . PMID 10592181.
↑ Karp, PD; Riley M; Saier M; Paulsen IT; Collado-Vides J; Paley SM; Pellegrini-Toole A; Bonavides C; Gama-Castro S. (2002). "The EcoCyc Database". Nucleic Acids Res. 30 (1): 56–8. doi:10.1093/nar/30.1.56. PMC 99147 . PMID 11752253.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ref_14681416-1] 1 2 3 Sundararaj, S; Guo A; Habibi-Nazhad B; Rouani M; Stothard P; Ellison M; Wishart DS (2004). "The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli". Nucleic Acids Res. 32 (Database issue): D293-5. doi:10.1093/nar/gkh108. PMC 308842 . PMID 14681416.

[ref_10592181-2] Rudd, K.E. (2000). "EcoGene: a genome sequence database for Escherichia coli K-12". Nucleic Acids Res. 28 (1): 60–64. doi:10.1093/nar/28.1.60. PMC 102481 . PMID 10592181.

[ref_11752253-3] Karp, PD; Riley M; Saier M; Paulsen IT; Collado-Vides J; Paley SM; Pellegrini-Toole A; Bonavides C; Gama-Castro S. (2002). "The EcoCyc Database". Nucleic Acids Res. 30 (1): 56–8. doi:10.1093/nar/30.1.56. PMC 99147 . PMID 11752253.

[1]

[2]

[3]