BacMap

Last updated
BacMap
Content
DescriptionA database of annotated bacterial genomes and their chromosome/genome maps
Data types
captured
Gene sequence data, protein sequence data, general gene and protein annotation, gene positions, general genome/proteome statistics, taxonomic and phenotypic information, bacterial chromosome maps (images)
Contact
Research center University of Alberta
Laboratory Dr. David Wishart
Primary citation [1] [2]
Access
Website http://wishart.biology.ualberta.ca/BacMap/ (version 1.0); http://bacmap.wishartlab.com/ (version 2.0)
Miscellaneous
Data release
frequency
Updated every 2-3 months
Curation policyManually curated

BacMap is a freely available web-accessible database containing fully annotated, fully zoomable and fully searchable chromosome maps from more than 2500 prokaryotic (archaebacterial and eubacterial) species. [1] BacMap was originally developed in 2005 to address the challenges of viewing and navigating through the growing numbers of bacterial genomes that were being generated through large-scale sequencing efforts. Since it was first introduced, the number of bacterial genomes in BacMap has grown by more than 15X. Essentially BacMap functions as an on-line visual atlas of microbial genomes. All of the genome annotations in BacMap were generated through the BASys genome annotation system. [3] BASys is a widely used microbial annotation infrastructure that performs comprehensive bionformatic analyses on raw (or labeled) bacterial genome sequence data. All of the genome (chromosome) maps in BacMap were constructed using the program known as CGView. [4] CGView is a popular visualization program for generating interactive, web-compatible circular chromosome maps (Fig. 1). Each chromosome map in BacMap is extensively hyperlinked and each chromosome image can be interactively navigated, expanded and rotated using navigation buttons or hyperlinks. All identified genes in a BacMap chromosome map are colored according to coding directions and when sufficiently zoomed-in, gene labels are visible. Each gene label on a BacMap genome map is also hyperlinked to a 'gene card' (Fig. 2). The gene cards provide detailed information about the corresponding DNA and protein sequences. Each genome map in BacMap is searchable via BLAST and a gene name/synonym search.

Chromosome DNA molecule containing genetic material of a cell

A chromosome is a deoxyribonucleic acid (DNA) molecule with part or all of the genetic material (genome) of an organism. Most eukaryotic chromosomes include packaging proteins which, aided by chaperone proteins, bind to and condense the DNA molecule to prevent it from becoming an unmanageable tangle.

Genome entirety of an organisms hereditary information; genome of organism (encoded by the genomic DNA) is the (biological) information of heredity which is passed from one generation of organism to the next; is transcribed to produce various RNAs

In the fields of molecular biology and genetics, a genome is the genetic material of an organism. It consists of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics.

BASys is a freely available web server that can be used to perform automated, comprehensive annotation of bacterial genomes. With the advent of next generation DNA sequencing it is now possible to sequence the complete genome of a bacterium within a single day. This has led to an explosion in the number of fully sequenced microbes. In fact, as of 2013, there were more than 2700 fully sequenced bacterial genomes deposited with GenBank. However, a continuing challenge with microbial genomics is finding the resources or tools for annotating the large number of newly sequenced genomes. BASys was developed in 2005 in anticipation of these needs. In fact, BASys was the world’s first publicly accessible microbial genome annotation web server. Because of its widespread popularity, the BASys server was updated in 2011 through the addition of multiple server nodes to handle the large number of queries it was receiving.

Contents



Because of the growing interest in metagenomics and large-scale bacterial genome analysis, BacMap was extensively updated in 2012. [2] With the latest update, all of BacMap’s bacterial genome maps now have separate prophage genome maps as well as separate tRNA and rRNA maps. Each bacterial chromosome entry in BacMap now contains graphs and tables on a variety of gene and protein statistics. All of the bacterial species listed in BacMap now have bacterial 'biography' cards, with corresponding information on the microbe’s taxonomy, phenotypic traits, other descriptions and electron microscopy or other high-resolution images of the microbe itself. BacMap also has a number of updated data browsing and text searching tools that allow filtering, sorting and more facile display of the chromosome maps and their contents.

Prophage

A prophage is a bacteriophage genome inserted and integrated into the circular bacterial DNA chromosome or existing as an extrachromosomal plasmid. This is a latent form of a phage, in which the viral genes are present in the bacterium without causing disruption of the bacterial cell. Pro means ''before'', so, prophage means the stage of a virus in the form of genome inserted into host DNA before attaining its real form inside host.

Phenotypic trait inherited biological feature

A phenotypic trait, simply trait, or character state is a distinct variant of a phenotypic characteristic of an organism; it may be either inherited or determined environmentally, but typically occurs as a combination of the two. For example, eye color is a character of an organism, while blue, brown and hazel are traits.

Scope and Access

All data in BacMap is non-proprietary or is derived from a non-proprietary source. It is freely accessible and available to anyone. In addition, nearly every data item is fully traceable and explicitly referenced to the original source. BacMap data is available through a public web interface and downloads.

See also

The nucleoid is an irregularly shaped region within the cell of a prokaryote that contains all or most of the genetic material, called genophore. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. The genome of prokaryotic organisms generally is a circular, double-stranded piece of DNA, of which multiple copies may exist at any time. The length of a genome widely varies, but generally is at least a few million base pairs. As in all cellular organisms, length of the DNA molecules of bacterial and archaeal chromosomes is very large compared to the dimensions of the cell, and the genomic DNA molecules must be compacted to fit.

Circular bacterial chromosome

A circular bacterial chromosome is a bacterial chromosome in the form of a molecule of circular DNA. Unlike the linear DNA of most eukaryotes, typical bacterial chromosomes are circular.

Functional genomics

Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data given by genomic and transcriptomic projects to describe gene functions and interactions. Unlike structural genomics, functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. Functional genomics attempts to answer questions about the function of DNA at the levels of genes, RNA transcripts, and protein products. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach.

Related Research Articles

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and are instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases.

The DrugBank database is a comprehensive, freely accessible, online database containing information on drugs and drug targets. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data with comprehensive drug target information. DrugBank uses a fair bit of content from Wikipedia. Wikipedia also often links to Drugbank.

KEGG biological database

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast.

MicrobesOnline

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

GeneMark is a generic name for a family of ab initio gene prediction programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction tool for annotation of the first completely sequenced bacterial genome of Haemophilus influenzae, and in 1996 for the first archaeal genome of Methanococcus jannaschii. The algorithm introduced inhomogeneous three-periodic Markov chain models of protein-coding DNA sequence that became standard in gene prediction as well as Bayesian approach to gene prediction in two DNA strands simultaneously. Species specific parameters of the models were estimated from training sets of sequences of known type. The major step of the algorithm computes for a given DNA fragment posterior probabilities of either being "protein-coding" in each of six possible reading frames or being "non-coding". Original GeneMark is an HMM-like algorithm; it can be viewed as approximation to known in the HMM theory posterior decoding algorithm for appropriately defined HMM.

The Vertebrate Genome Annotation (VEGA) database is a biological database dedicated to assisting researchers in locating specific areas of the genome and annotating genes or regions of vertebrate genomes. The VEGA browser is based on Ensembl web code and infrastructure and provides a public curation of known vertebrate genes for the scientific community. The VEGA website is updated frequently to maintain the most current information about vertebrate genomes and attempts to present consistently high-quality annotation of all its published vertebrate genomes or genome regions. VEGA was developed by the Wellcome Trust Sanger Institute and is in close association with other annotation databases, such as ZFIN, the Havana Group and GenBank. Manual annotation is currently more accurate at identifying splice variants, pseudogenes, polyadenylation features, non-coding regions and complex gene arrangements than automated methods.

PlasMapper is a freely available web server that automatically generates and annotates high-quality circular plasmid maps. It is a particularly useful online service for molecular biologists wishing to generate plasmid maps without having to purchase or maintain expensive, commercial software. PlasMapper accepts plasmid/vector DNA sequence as input and uses sequence pattern matching and BLAST sequence alignment to automatically identify and label common promoters, terminators, cloning sites, restriction sites, reporter genes, affinity tags, selectable marker genes, origins of replication and open reading frames. PlasMapper then reformats and presents the identified features in both a simple textual form and as high-resolution, multicolored image.

PDBsum is a database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank. The original version of the database was developed around 1995 by Roman Laskowski and collaborators at University College London. As of 2014, PDBsum is maintained by Laskowski and collaborators in the laboratory of Janet Thornton at the European Bioinformatics Institute (EBI).

Human Metabolome Database database of human metabolites

The Human Metabolome Database (HMDB) is a comprehensive, high-quality, freely accessible, online database of small molecule metabolites found in the human body. Created by the Human Metabolome Project funded by Genome Canada. One of the first dedicated metabolomics databases, the HMDB facilitates human metabolomics research, including the identification and characterization of human metabolites using NMR spectroscopy, GC-MS spectrometry and LC/MS spectrometry. To aid in this discovery process, the HMDB contains three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data. The chemical data includes 41,514 metabolite structures with detailed descriptions along with nearly 10,000 NMR, GC-MS and LC/MS spectra.

Toxin and Toxin-Target Database online database of compounds toxic to human

The Toxin and Toxin-Target Database (T3DB), also known as the Toxic Exposome Database, is a freely accessible online database of common substances that are toxic to humans, along with their protein, DNA or organ targets. The database currently houses nearly 3,700 toxic compounds or poisons described by nearly 42,000 synonyms. This list includes various groups of toxins, including common pollutants, pesticides, drugs, food toxins, household and industrial/workplace toxins, cigarette toxins, and uremic toxins. These toxic substances are linked to 2,086 corresponding protein/DNA target records. In total there are 42,433 toxic substance-toxin target associations. Each toxic compound record (ToxCard) in T3DB contains nearly 100 data fields and holds information such as chemical properties and descriptors, mechanisms of action, toxicity or lethal dose values, molecular and cellular interactions, medical information, NMR an MS spectra, and up- and down-regulated genes. This information has been extracted from over 18,000 sources, which include other databases, government documents, books, and scientific literature.

The Small Molecule Pathway Database (SMPDB) is a comprehensive, high-quality, freely accessible, online database containing more than 600 small molecule pathways found in humans. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology. It is able to do so, in part, by providing colorful, detailed, fully searchable, hyperlinked diagrams of five types of small molecule pathways: 1) general human metabolic pathways; 2) human metabolic disease pathways; 3) human metabolite signaling pathways; 4) drug-action pathways and 5) drug metabolism pathways. SMPDB pathways may be navigated, viewed and zoomed interactively using a Google Maps-like interface. All SMPDB pathways include information on the relevant organs, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Each small molecule in SMPDB is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein or enzyme complex is hyperlinked to UniProt. Additionally, all SMPDB pathways are accompanied with detailed descriptions and references, providing an overview of the pathway, condition or processes depicted in each diagram. Users can browse the SMPDB or search its contents by text searching, sequence searching, or chemical structure searching. More powerful queries are also possible including searching with lists of gene or protein names, drug names, metabolite names, GenBank IDs, Swiss-Prot IDs, Agilent or Affymetrix microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein concentration data can also be visualized through SMPDB's mapping interface.

MetaboAnalyst is a set of online tools for metabolomic data analysis and interpretation, created by members of the Wishart Research Group at the University of Alberta. It was first released in May 2009 and version 2.0 was released in January 2012. MetaboAnalyst provides a variety of analysis methods that have been tailored for metabolomic data. These methods include metabolomic data processing, normalization, multivariate statistical analysis, and data annotation. The current version is focused on biomarker discovery and classification.

The CyberCell Database (CCDB) is a freely available, web-accessible database that provides quantitative genomic, proteomic as well metabolomic data on Escherichia coli. Escherichia coli is perhaps the best-studied bacterium on the planet and has been the organism of choice for several international efforts in cell simulation. These cell simulation efforts require up-to-date web-accessible resources that provide comprehensive, non-redundant, and quantitative data on this bacterium. The intent of CCDB is to facilitate the collection, revision, coordination and storage of the key information required for in silico E. coli simulation.

CGView is a freely available downloadable Java software program, applet and API for generating colorful, zoomable, hyperlinked, richly annotated images of circular genomes such as bacterial chromosomes, mitochondrial DNA and plasmids. It is commonly used in bacterial sequence annotation pipelines to generate visual output suitable for the web. It has also been used in a variety of popular web servers and databases (BacMap).

METAGENassist is a freely available web server for comparative metagenomic analysis. Comparative metagenomic studies involve the large-scale comparison of genomic or taxonomic census data from bacterial samples across different environments. Historically this has required a sound knowledge of statistics, computer programming, genetics and microbiology. As a result, only a small number of researchers are routinely able to perform comparative metagenomic studies. To circumvent these limitations, METAGENassist was developed to allow metagenomic analyses to be performed by non-specialists, easily and intuitively over the web. METAGENassist is particularly notable for its rich graphical output and its extensive database of bacterial phenotypic information.

BacDive scientific database for bacteria

BacDive is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity.

References

  1. 1 2 Stothard P, Van Domselaar G, Shrivastava S, Guo A, O'Neill B, Cruz J, Ellison M, Wishart DS (2005). "BacMap: an interactive picture atlas of annotated bacterial genomes". Nucleic Acids Res. 33 (Database issue): D317–20. doi:10.1093/nar/gki075. PMC   540029 . PMID   15608206.
  2. 1 2 Cruz J, Liu Y, Liang Y, Zhou Y, Wilson M, Dennis JJ, Stothard P, Van Domselaar G, Wishart DS (2012). "BacMap: an up-to-date electronic atlas of annotated bacterial genomes". Nucleic Acids Res. 40 (Database issue): D599–604. doi:10.1093/nar/gkr1105. PMC   3245156 . PMID   22135301.
  3. Van Domselaar, GH; Stothard P; Shrivastava S; Cruz JA; Guo A; Dong X; Lu P; Szafron D; Greiner R; Wishart DS. (2005). "BASys: a web server for automated bacterial genome annotation". Nucleic Acids Res. 33 (Web Server issue): W455–9. doi:10.1093/nar/gki593. PMC   1160269 . PMID   15980511.
  4. Stothard, P; Wishart DS. (2005). "Circular genome visualization and exploration using CGView". Bioinformatics. 21 (4): 537–9. doi:10.1093/bioinformatics/bti054. PMID   15479716.