BioCyc database collection

Last updated
BioCyc
Database.png
Content
DescriptionTools for navigating, visualizing, and analyzing the underlying databases, and for analyzing omics data
Contact
Research center SRI International
Authors Peter Karp et al
Release date1997
Access
Website biocyc.org

The BioCyc database collection is an assortment of organism specific Pathway/Genome Databases (PGDBs) that provide reference to genome and metabolic pathway information for thousands of organisms. [1] As of July 2023, there were over 20,040 databases within BioCyc. [2] SRI International, [3] based in Menlo Park, California, maintains the BioCyc database family.

Contents

Categories of Databases

Based on the manual curation done, BioCyc database family is divided into 3 tiers:

Tier 1: Databases which have received at least one year of literature based manual curation. Currently there are seven databases in Tier 1. Out of the seven, MetaCyc is a major database that contains almost 2500 metabolic pathways from many organisms. [1] [4] The other important Tier 1 database is HumanCyc which contains around 300 metabolic pathways found in humans. [5] The remaining five databases include, EcoCyc (E. coli), [6] AraCyc (Arabidopsis thaliana), YeastCyc (Saccharomyces cerevisiae), LeishCyc (Leishmania major Friedlin) and TrypanoCyc (Trypanosoma brucei).

Tier 2: Databases that were computationally predicted but have received moderate manual curation (most with 1–4 months curation). Tier 2 Databases are available for manual curation by scientists who are interested in any particular organism. Tier 2 databases currently contain 43 different organism databases.

Tier 3: Databases that were computationally predicted by PathoLogic and received no manual curation. As with Tier 2, Tier 3 databases are also available for curation for interested scientists.

Software tools

The BioCyc website contains a variety of software tools for searching, visualizing, comparing, and analyzing genome and pathway information. It includes a genome browser, and browsers for metabolic and regulatory networks. The website also includes tools for painting large-scale ("omics") datasets onto metabolic and regulatory networks, and onto the genome.

Use in Research

Since BioCyc Database family comprises a long list of organism specific databases and also data at different systems level in a living system, the usage in research has been in a wide variety of context. Here, two studies are highlighted which show two different varieties of uses, one on a genome scale and other on identifying specific SNPs (Single Nucleotide Polymorphisms) within a genome.

AlgaGEM

AlgaGEM is a genome scale metabolic network model for a compartmentalized algae cell developed by Gomes de Oliveira Dal’Molin et al. [7] based on the Chlamydomonas reinhardtii genome. It has 866 unique ORFs, 1862 metabolites, 2499 gene-enzyme-reaction-association entries, and 1725 unique reactions. One of the Pathway databases used for reconstruction is MetaCyc.

SNPs

The study by Shimul Chowdhury et al. [8] showed association differed between maternal SNPs and metabolites involved in homocysteine, folate, and transsulfuration pathways in cases with Congenital Heart Defects (CHDs) as opposed to controls. The study used HumanCyc to select candidate genes and SNPs.

Related Research Articles

<i>Chlamydomonas reinhardtii</i> Species of alga

Chlamydomonas reinhardtii is a single-cell green alga about 10 micrometres in diameter that swims with two flagella. It has a cell wall made of hydroxyproline-rich glycoproteins, a large cup-shaped chloroplast, a large pyrenoid, and an eyespot that senses light.

<span class="mw-page-title-main">Metabolome</span>

The metabolome refers to the complete set of small-molecule chemicals found within a biological sample. The biological sample can be a cell, a cellular organelle, an organ, a tissue, a tissue extract, a biofluid or an entire organism. The small molecule chemicals found in a given metabolome may include both endogenous metabolites that are naturally produced by an organism as well as exogenous chemicals that are not naturally produced by an organism.

<span class="mw-page-title-main">Metabolic network</span> Set of biological pathways

A metabolic network is the complete set of metabolic and physical processes that determine the physiological and biochemical properties of a cell. As such, these networks comprise the chemical reactions of metabolism, the metabolic pathways, as well as the regulatory interactions that guide these reactions.

<span class="mw-page-title-main">Metabolic network modelling</span> Form of biological modelling

Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome with molecular physiology. A reconstruction breaks down metabolic pathways into their respective reactions and enzymes, and analyzes them within the perspective of the entire network. In simplified terms, a reconstruction collects all of the relevant metabolic information of an organism and compiles it in a mathematical model. Validation and analysis of reconstructions can allow identification of key features of metabolism such as growth yield, resource distribution, network robustness, and gene essentiality. This knowledge can then be applied to create novel biotechnology.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

<span class="mw-page-title-main">Biohydrogen</span> Hydrogen that is produced biologically

Biohydrogen is H2 that is produced biologically. Interest is high in this technology because H2 is a clean fuel and can be readily produced from certain kinds of biomass, including biological waste. Furthermore some photosynthetic microorganisms are capable to produce H2 directly from water splitting using light as energy source.

The hisB gene, found in the enterobacteria, in Campylobacter jejuni and in Xylella/Xanthomonas encodes a protein involved in catalysis of two step in histidine biosynthesis, namely the bifunctional Imidazoleglycerol-phosphate dehydratase/histidinol-phosphatase.

In bioinformatics EcoCyc is a biological database for the bacterium Escherichia coli K-12. The EcoCyc project performs literature-based curation of the E. coli genome, and of E. coli transcriptional regulation, transporters, and metabolic pathways. EcoCyc contains written summaries of E. coli genes, distilled from over 36,000 scientific articles. EcoCyc is also a description of the genome and cellular networks of E. coli that supports scientists to carry out computational analyses.

The MetaCyc database is one of the largest metabolic pathways and enzymes databases currently available. The data in the database is manually curated from the scientific literature, and covers all domains of life. MetaCyc has extensive information about chemical compounds, reactions, metabolic pathways and enzymes. The data have been curated from more than 58,000 publications.

<span class="mw-page-title-main">Aldolase A deficiency</span> Medical condition

Aldolase A deficiency is an autosomal recessive metabolic disorder resulting in a deficiency of the enzyme aldolase A; the enzyme is found predominantly in red blood cells and muscle tissue. The deficiency may lead to hemolytic anaemia as well as myopathy associated with exercise intolerance and rhabdomyolysis in some cases.

<span class="mw-page-title-main">Phosphoribosylamine</span> Chemical compound

Phosphoribosylamine (PRA) is a biochemical intermediate in the formation of purine nucleotides via inosine-5-monophosphate, and hence is a building block for DNA and RNA. The vitamins thiamine and cobalamin also contain fragments derived from PRA.

Nitrite oxidoreductase is an enzyme involved in nitrification. It is the last step in the process of aerobic ammonia oxidation, which is carried out by two groups of nitrifying bacteria: ammonia oxidizers such as Nitrosospira, Nitrosomonas, and Nitrosococcus convert ammonia to nitrite, while nitrite oxidizers such as Nitrobacter and Nitrospira oxidize nitrite to nitrate. NXR is responsible for producing almost all nitrate found in nature.

<span class="mw-page-title-main">5-Aminoimidazole ribotide</span> Chemical compound

5′-Phosphoribosyl-5-aminoimidazole is a biochemical intermediate in the formation of purine nucleotides via inosine-5-monophosphate, and hence is a building block for DNA and RNA. The vitamins thiamine and cobalamin also contain fragments derived from AIR. It is an intermediate in the adenine pathway and is synthesized from 5′-phosphoribosylformylglycinamidine by AIR synthetase.

<span class="mw-page-title-main">5′-Phosphoribosylformylglycinamidine</span> Chemical compound

5′-Phosphoribosylformylglycinamidine is a biochemical intermediate in the formation of purine nucleotides via inosine-5-monophosphate, and hence is a building block for DNA and RNA. The vitamins thiamine and cobalamin also contain fragments derived from FGAM.

<span class="mw-page-title-main">Protochlorophyllide</span> Chemical compound

Protochlorophyllide, or monovinyl protochlorophyllide, is an intermediate in the biosynthesis of chlorophyll a. It lacks the phytol side-chain of chlorophyll and the reduced pyrrole in ring D. Protochlorophyllide is highly fluorescent; mutants that accumulate it glow red if irradiated with blue light. In angiosperms, the later steps which convert protochlorophyllide to chlorophyll are light-dependent, and such plants are pale (chlorotic) if grown in the darkness. Gymnosperms, algae, and photosynthetic bacteria have another, light-independent enzyme and grow green in the darkness as well.

Methanocaldococcussp. FS406-22 is an archaea in the genus Methanocaldococcus. It is an anaerobic, piezophilic, diazotrophic, hyperthermophilic marine archaeon. This strain is notable for fixing nitrogen at the highest known temperature of nitrogen fixers recorded to date. The 16S rRNA gene of Methanocaldococcus sp. FS406-22, is almost 100% similar to that of Methanocaldococcus jannaschii, a non-nitrogen fixer.

<span class="mw-page-title-main">Chlorophyllide</span> Chemical compound

Chlorophyllide a and Chlorophyllide b are the biosynthetic precursors of chlorophyll a and chlorophyll b respectively. Their propionic acid groups are converted to phytyl esters by the enzyme chlorophyll synthase in the final step of the pathway. Thus the main interest in these chemical compounds has been in the study of chlorophyll biosynthesis in plants, algae and cyanobacteria. Chlorophyllide a is also an intermediate in the biosynthesis of bacteriochlorophylls.

Peter D. Karp is director of the Bioinformatics Research Group at SRI International in Menlo Park, California. Karp leads the development of the BioCyc database collection. BioCyc databases combine genome, metabolic pathway, and regulatory information for thousands of organisms.

SoyBase is a database created by the United States Department of Agriculture. It contains genetic information about soybeans. It includes genetic maps, information about Mendelian genetics and molecular data regarding genes and sequences. It was started in 1990 and is freely available to individuals and organizations worldwide.

References

  1. 1 2 Caspi, R.; Altman, T.; Dreher, K.; Fulcher, C. A.; Subhraveti, P.; Keseler, I. M.; Kothari, A.; Krummenacker, M.; Latendresse, M.; Mueller, L. A.; Ong, Q.; Paley, S.; Pujar, A.; Shearer, A. G.; Travers, M.; Weerasinghe, D.; Zhang, P.; Karp, P. D. (2011). "The Meta Cyc database of metabolic pathways and enzymes and the Bio Cyc collection of pathway/genome databases". Nucleic Acids Research. 40 (Database issue): D742–53. doi:10.1093/nar/gkr1014. PMC   3245006 . PMID   22102576.
  2. "BioCyc Pathway/Genome Database Collection". biocyc.org. Retrieved 2023-07-31.
  3. Home page of the SRI International
  4. Karp, Peter D.; Caspi, Ron (2011). "A survey of metabolic databases emphasizing the Meta Cyc family". Archives of Toxicology. 85 (9): 1015–33. doi:10.1007/s00204-011-0705-2. PMC   3352032 . PMID   21523460.
  5. Romero, Pedro; Wagg, Jonathan; Green, Michelle L; Kaiser, Dale; Krummenacker, Markus; Karp, Peter D (2004). "Computational prediction of human metabolic pathways from the complete human genome". Genome Biology. 6 (1): R2. doi: 10.1186/gb-2004-6-1-r2 . PMC   549063 . PMID   15642094.
  6. Keseler, I. M.; Collado-Vides, J.; Santos-Zavaleta, A.; Peralta-Gil, M.; Gama-Castro, S.; Muniz-Rascado, L.; Bonavides-Martinez, C.; Paley, S.; Krummenacker, M.; Altman, T.; Kaipa, P.; Spaulding, A.; Pacheco, J.; Latendresse, M.; Fulcher, C.; Sarker, M.; Shearer, A. G.; MacKie, A.; Paulsen, I.; Gunsalus, R. P.; Karp, P. D. (2010). "Eco Cyc: A comprehensive database of Escherichia coli biology". Nucleic Acids Research. 39 (Database issue): D583–90. doi:10.1093/nar/gkq1143. PMC   3013716 . PMID   21097882.
  7. Dal'Molin, C. G.; Quek, L. E.; Palfreyman, R. W.; Nielsen, L. K. (2011). "AlgaGEM--a genome-scale metabolic reconstruction of algae based on the Chlamydomonas reinhardtii genome". BMC Genomics. 12 (Suppl 4): S5. doi: 10.1186/1471-2164-12-S4-S5 . PMC   3287588 . PMID   22369158.
  8. Chowdhury, S; Hobbs, C. A.; MacLeod, S. L.; Cleves, M. A.; Melnyk, S; James, S. J.; Hu, P; Erickson, S. W. (2012). "Associations between maternal genotypes and metabolites implicated in congenital heart defects". Molecular Genetics and Metabolism. 107 (3): 596–604. doi:10.1016/j.ymgme.2012.09.022. PMC   3523122 . PMID   23059056.