CGView

Last updated
CGView
Content
DescriptionFor visualizing circular genomes
Data types
captured
Data input: Genomic sequences with annotations in XML, tab delaminated format, or the NCBI ptt format. Data output: Static or interactive images of genomic maps.
Contact
Research center University of Alberta
Laboratory Dr. Paul Stothard & Dr. David Wishart
Primary citation [1]
Release date2004
Access
Website wishart.biology.ualberta.ca/cgview/xml_overview.html
Miscellaneous
Data release
frequency
Last updated on 2012

CGView (Circular Genome Viewer) is a freely available downloadable Java software program, applet and API (application programming interface) for generating colorful, zoomable, hyperlinked, richly annotated images of circular genomes such as bacterial chromosomes, mitochondrial DNA and plasmids. [1] [2] [3] It is commonly used in bacterial sequence annotation pipelines to generate visual output suitable for the web. It has also been used in a variety of popular web servers (the CGView webserver, PlasMapper, BASys) and databases (BacMap).

Contents

Overview

More than 4000 bacterial genomes and thousands of plasmid genomes have been sequenced thanks to the advance in DNA sequencing technology. CGView was developed to address the specialized needs for visualizing and annotating circular genomes, such as bacterial, plasmid, chloroplast, mitochondrial DNA sequences. Once installed, the CGView program accepts a number of different file formats where feature data and rendering information can be XML file, a tab delimited file, or an NCBI ptt file. CGView then converts the input into a graphical map in various (PNG, JPG, or SVG) image formats that can include labels, titles, legends and footnotes. The images can be static, interactive, or poster-sized images for printing or for embedding into web pages.


Technology and Accessibility

CGView is written in the Java programming language. It is available as a downloadable Java application package as well as an applet and an API. The applet package can be used to embed interactive maps into web pages. The API can be used to incorporate CGView into another Java applications. A CGView server has recently been developed.

See also

Related Research Articles

Bioinformatics Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

A bacterial artificial chromosome (BAC) is a DNA construct, based on a functional fertility plasmid, used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell division. The bacterial artificial chromosome's usual insert size is 150–350 kbp. A similar cloning vector called a PAC has also been produced from the DNA of P1 bacteriophage.

Yeast artificial chromosome

Yeast artificial chromosomes (YACs) are genetically engineered chromosomes derived from the DNA of the yeast, Saccharomyces cerevisiae, which is then ligated into a bacterial plasmid. By inserting large fragments of DNA, from 100–1000 kb, the inserted sequences can be cloned and physically mapped using a process called chromosome walking. This is the process that was initially used for the Human Genome Project, however due to stability issues, YACs were abandoned for the use of Bacterial artificial chromosomes (BAC). Beginning with the initial research of the Rankin et al., Strul et al., and Hsaio et al., the inherently fragile chromosome was stabilized by discovering the necessary autonomously replicating sequence (ARS); a refined YAC utilizing this data was described in 1983 by Murray et al. The primary components of a YAC are the ARS, centromere, and telomeres from S. cerevisiae. Additionally, selectable marker genes, such as antibiotic resistance and a visible marker, are utilized to select transformed yeast cells. Without these sequences, the chromosome will not be stable during extracellular replication, and would not be distinguishable from colonies without the vector.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

Biopython Collection of open-source Python software tools for computational biology

The Biopython project is an open-source collection of non-commercial Python tools for computational biology and bioinformatics, created by an international association of developers. It contains classes to represent biological sequences and sequence annotations, and it is able to read and write to a variety of file formats. It also allows for a programmatic means of accessing online databases of biological information, such as those at NCBI. Separate modules extend Biopython's capabilities to sequence alignment, protein structure, population genetics, phylogenetics, sequence motifs, and machine learning. Biopython is one of a number of Bio* projects designed to reduce code duplication in computational biology.

Ensembl genome database project gene sequence database

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project. Ensembl aims to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

Extrachromosomal DNA is any DNA that is found off the chromosomes, either inside or outside the nucleus of a cell. Most DNA in an individual genome is found in chromosomes contained in the nucleus. Multiple forms of extrachromosomal DNA exist and serve important biological functions, e.g. they can play a role in disease, such as ecDNA in cancer.

A genomic library is a collection of the total genomic DNA from a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy amplification and retrieval of specific clones from the library for analysis.

Dot plot (bioinformatics) plot

In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot.

Z curve

The Z curve method is a bioinformatics algorithm for genome analysis. The Z-curve is a three-dimensional curve that constitutes a unique representation of a DNA sequence, i.e., for the Z-curve and the given DNA sequence each can be uniquely reconstructed from the other. The resulting curve has a zigzag shape, hence the name Z-curve.

The Viral Bioinformatics Resource Center (VBRC) is an online resource providing access to a database of curated viral genomes and a variety of tools for bioinformatic genome analysis. This resource was one of eight BRCs funded by NIAID with the goal of promoting research against emerging and re-emerging pathogens, particularly those seen as potential bioterrorism threats. The VBRC is now supported by Dr. Chris Upton at the University of Victoria.

Integrated Genome Browser bioinformatics software

Integrated Genome Browser (IGB) is an open-source genome browser, a visualization tool used to observe biologically-interesting patterns in genomic data sets, including sequence data, gene models, alignments, and data from DNA microarrays.

The UCSC Genome Browser is an on-line, and downloadable, genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

PATRIC is the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools. PATRIC sharpens and hones the scope of available bacterial phylogenomic data from numerous sources specifically for the bacterial research community, in order to save biologists time and effort when conducting comparative analyses. The freely available PATRIC platform provides an interface for biologists to discover data and information and conduct comprehensive comparative genomics and other analyses in a one-stop shop. PATRIC, a project of Virginia Tech’s Cyberinfrastructure Division, is funded by the National Institutes of Allergy and Infectious Diseases (NIAID), a component of the National Institutes of Health (NIH).

PlasMapper is a freely available web server that automatically generates and annotates high-quality circular plasmid maps. It is a particularly useful online service for molecular biologists wishing to generate plasmid maps without having to purchase or maintain expensive, commercial software. PlasMapper accepts plasmid/vector DNA sequence as input and uses sequence pattern matching and BLAST sequence alignment to automatically identify and label common promoters, terminators, cloning sites, restriction sites, reporter genes, affinity tags, selectable marker genes, origins of replication and open reading frames. PlasMapper then reformats and presents the identified features in both a simple textual form and as high-resolution, multicolored image.

In silico PCR

In silico PCR refers to computational tools used to calculate theoretical polymerase chain reaction (PCR) results using a given set of primers (probes) to amplify DNA sequences from a sequenced genome or transcriptome.

BASys is a freely available web server that can be used to perform automated, comprehensive annotation of bacterial genomes. With the advent of next generation DNA sequencing it is now possible to sequence the complete genome of a bacterium within a single day. This has led to an explosion in the number of fully sequenced microbes. In fact, as of 2013, there were more than 2700 fully sequenced bacterial genomes deposited with GenBank. However, a continuing challenge with microbial genomics is finding the resources or tools for annotating the large number of newly sequenced genomes. BASys was developed in 2005 in anticipation of these needs. In fact, BASys was the world’s first publicly accessible microbial genome annotation web server. Because of its widespread popularity, the BASys server was updated in 2011 through the addition of multiple server nodes to handle the large number of queries it was receiving.

BacMap is a freely available web-accessible database containing fully annotated, fully zoomable and fully searchable chromosome maps from more than 2500 prokaryotic species. BacMap was originally developed in 2005 to address the challenges of viewing and navigating through the growing numbers of bacterial genomes that were being generated through large-scale sequencing efforts. Since it was first introduced, the number of bacterial genomes in BacMap has grown by more than 15X. Essentially BacMap functions as an on-line visual atlas of microbial genomes. All of the genome annotations in BacMap were generated through the BASys genome annotation system. BASys is a widely used microbial annotation infrastructure that performs comprehensive bionformatic analyses on raw bacterial genome sequence data. All of the genome (chromosome) maps in BacMap were constructed using the program known as CGView. CGView is a popular visualization program for generating interactive, web-compatible circular chromosome maps. Each chromosome map in BacMap is extensively hyperlinked and each chromosome image can be interactively navigated, expanded and rotated using navigation buttons or hyperlinks. All identified genes in a BacMap chromosome map are colored according to coding directions and when sufficiently zoomed-in, gene labels are visible. Each gene label on a BacMap genome map is also hyperlinked to a 'gene card'. The gene cards provide detailed information about the corresponding DNA and protein sequences. Each genome map in BacMap is searchable via BLAST and a gene name/synonym search.

METAGENassist is a freely available web server for comparative metagenomic analysis. Comparative metagenomic studies involve the large-scale comparison of genomic or taxonomic census data from bacterial samples across different environments. Historically this has required a sound knowledge of statistics, computer programming, genetics and microbiology. As a result, only a small number of researchers are routinely able to perform comparative metagenomic studies. To circumvent these limitations, METAGENassist was developed to allow metagenomic analyses to be performed by non-specialists, easily and intuitively over the web. METAGENassist is particularly notable for its rich graphical output and its extensive database of bacterial phenotypic information.

The Actinobacteriophage database, more commonly known as PhagesDB, is a database-backed website that gathers and shares information related to the discovery, characterization and genomics of viruses that prefer to infect Actinobacterial hosts. It is a bioinformatics tool that is used worldwide to compare multiple phages and their genomic annotations. Up to recent dates, there have been more than 8,000 bacteriophages, including over 1,600 with already sequenced genomes, have been entered into the database. It is an addition to the wide range of priorly existing bioinformatic tools, like NCBI. It provides results of already sequenced phage genomes and aims to allow access to drafted phage genomes to provide a larger spectrum of information.

References

  1. 1 2 Stothard, P; Wishart DS (2005). "Circular genome visualization and exploration using CGView". Bioinformatics. 21 (4): 537–9. doi: 10.1093/bioinformatics/bti054 . PMID   15479716.
  2. Grant, JR; Stothard P (2008). "The CGView Server: a comparative genomics tool for circular genomes". Nucleic Acids Res. 36 (Web Server issue): W181–4. doi:10.1093/nar/gkn179. PMC   2447734 . PMID   18411202.
  3. Grant, JR; Arantes AS; Stothard P. (2012). "Comparing thousands of circular genomes using the CGView Comparison Tool". BMC Genomics. 13: 202. doi:10.1186/1471-2164-13-202. PMC   3469350 . PMID   22621371.