Viral Bioinformatics Resource Center

Last updated

The Viral Bioinformatics Resource Center (VBRC) is an online resource providing access to a database of curated viral genomes and a variety of tools for bioinformatic genome analysis. [1] This resource was one of eight BRCs (Bioinformatics Resource Centers) funded by NIAID with the goal of promoting research against emerging and re-emerging pathogens, particularly those seen as potential bioterrorism threats. The VBRC is now supported by Dr. Chris Upton [2] at the University of Victoria.

Contents

The curated VBRC database contains all publicly available genomic sequences for poxviruses and African Swine Fever Viruses (ASFV). A unique aspect of this resource relative to other genomic databases is its grouping of all annotated genes into ortholog groups (i.e. protein families) based on pre-run BLASTP sequence similarity searches.

The curated database is accessed through VOCS (Viral Orthologous Clusters), a downloadable Java-based user interface, and acts as the central information source for other programs of the VBRC workbench. These programs serve a variety of bioinformatic analysis functions (whole- or subgenome alignments, genome display, and several types of gene/protein sequence analysis). The majority of these tools are programmed to take user-supplied input as well.

Virus families covered in the VBRC database

The VBRC covers the following viruses:

Organization of the VBRC database

The VBRC database stores viral bioinformatic data on three levels:

  1. Whole genomes. This level contains information about the virus species or isolate and its entire genomic sequence.
  2. Annotated genes. This level contains all the predicted ORFs (open reading frames) in a particular virus genome, together with their DNA and (translated) protein sequences.
  3. Ortholog groups (families). This level is a distinguishing feature of the VBRC database. Each annotated gene, after it has been entered into the database, is subjected to BLASTP searching against all other genes already in the database. [3] Based on the search results, it is either assigned to a pre-existing ortholog group or placed in a newly created ortholog group of its own. The goal of this level is to "allow for quick comparison of similar genes across a given virus family." [4] [ self-published source? ]

Central Tools Provided by VBRC

VBRC provides researchers with a wide variety of database-linked tools. Of these, the central four programs are VOCs, VGO, BBB, and JDotter.

  1. VOCs (Viral Orthologous Clusters)
    VOCs is the main database access interface. Users can search the available data by a number of criteria related to genome, gene, or ortholog group characteristics. Search results are displayed in table format; from here the user may obtain further information about a particular database entry, or launch a VOCs-linked tool (see below) for analysis of selected data. Additional analysis tools such as BLAST searches, genome maps, genome or gene alignment, phylogenetic trees, etc. are provided. [5]
  2. VGO (Viral Genome Organizer)
    VGO is a Java-based interface used for viewing and searching viral genome sequences. [6] Together with a graphical representation of the selected VBRC (or user-supplied) genome, the program displays information relevant to a genome of interest, including its genes, ORFs and start/stop codons. Tools are provided allowing the user to perform regular expression, a fuzzy motif, and masslist searches. VGO can also be used to identify related genes across multiple sequences.
  3. BBB (Base-by-Base)
    Base-By-Base is a platform-independent (Java-based), whole-genome pairwise and multiple alignment editor. [7] [8] [9] The program highlights differences between consecutive pairs of sequences within an alignment, thus allowing the user to survey a large alignment at a single-residue level. Annotations from the VBRC database or user-supplied files are displayed alongside each sequence.
    Although Base-By-Base was intended as an editor and viewer for alignments of highly similar sequences, it also generates multiple alignments using Clustal Omega, T-Coffee and MUSCLE. Edit functions are provided to allow users to fine-tune such alignments manually; users may also annotate genomes with comments or primer sequences.
  4. JDotter
    JDotter is a Java-based user interface providing VBRC-linked access to the Linux version of Dotter. JDotter can both access pre-processed dotplots of the genome and gene (DNA or protein) sequences available in the VBRC database, and take user input for generation of new dotplots. JDotter also interfaces with the curated database or the user-supplied file to display supplementary feature data such as gene annotations. [10]

Other Tools Provided by VBRC

VBRC provides a number of additional Java-based analysis tools on its website. The tools in this category are each designed to perform a very specific task (e.g. regular expression searches, DNA skew plotting) and, though they can be accessed as stand-alone programs with user-supplied input, most have increased utility when launched from the central VOCS application with VBRC-supplied data.

These additional tools are as follows:

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

In bioinformatics, sequence clustering algorithms attempt to group biological sequences that are somehow related. The sequences can be either of genomic, "transcriptomic" (ESTs) or protein origin. For proteins, homologous sequences are typically grouped into families. For EST data, clustering is important to group sequences originating from the same gene before the ESTs are assembled to reconstruct the original mRNA.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">Comparative genomics</span>

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.

<span class="mw-page-title-main">Dot plot (bioinformatics)</span>

In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot.

BLAT is a pairwise sequence alignment algorithm that was developed by Jim Kent at the University of California Santa Cruz (UCSC) in the early 2000s to assist in the assembly and annotation of the human genome. It was designed primarily to decrease the time needed to align millions of mouse genomic reads and expressed sequence tags against the human genome sequence. The alignment tools of the time were not capable of performing these operations in a manner that would allow a regular update of the human genome assembly. Compared to pre-existing tools, BLAT was ~500 times faster with performing mRNA/DNA alignments and ~50 times faster with protein/protein alignments.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

Pathema was one of the eight bioinformatics resource centers funded by the National Institute of Allergy and Infectious Diseases (NIAID), a component of the National Institute of Health (NIH), which is an agency of the United States Department of Health and Human Services.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

<span class="mw-page-title-main">OrthoDB</span>

OrthoDB presents a catalog of orthologous protein-coding genes across vertebrates, arthropods, fungi, plants, and bacteria. Orthology refers to the last common ancestor of the species under consideration, and thus OrthoDB explicitly delineates orthologs at each major radiation along the species phylogeny. The database of orthologs presents available protein descriptors, together with Gene Ontology and InterPro attributes, which serve to provide general descriptive annotations of the orthologous groups, and facilitate comprehensive orthology database querying. OrthoDB also provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and gene intron-exon architectures.

The Virus Pathogen Database and Analysis Resource (ViPR) is an integrative and comprehensive publicly available database and analysis resource to search, analyze, visualize, save and share data for viral pathogens in the U.S. National Institute of Allergy and Infectious Diseases (NIAID) Category A-C Priority Pathogen lists for biodefense research, and other viral pathogens causing emerging/reemerging infectious diseases. ViPR is one of the five Bioinformatics Resource Centers (BRC) funded by NIAID, a component of the National Institutes of Health (NIH), which is an agency of the United States Department of Health and Human Services.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

References

  1. "VBRC". Viral Bioinformatics Resource Center. Dr. Chris Upton.
  2. Upton, Chris. "Professor of Biochemistry and Microbiology".
  3. Upton, C.; Slack, S; Hunter, AL; Ehlers, A; Roper, RL (Jul 2003). "Poxvirus Orthologous Clusters: toward Defining the Minimum Essential Poxvirus Genome". Journal of Virology. 77 (13): 7590–600. doi:10.1128/JVI.77.13.7590-7600.2003. ISSN   0022-538X. PMC   164831 . PMID   12805459.
  4. Upton, Chris (4 July 2008). "Bioinformatics Tools and their Applications in Virology" . Retrieved 4 September 2009.
  5. Ehlers, A.; Osborne, J.; Slack, S.; Roper, R. L.; Upton, C. (2002). "Poxvirus Orthologous Clusters (POCs)". Bioinformatics. 18 (11): 1544–5. doi:10.1093/bioinformatics/18.11.1544. PMID   12424130.
  6. Upton, C; Hogg, D; Perrin, D; Boone, M; Harris, NL (Sep 2000). "Viral genome organizer: a system for analyzing complete viral genomes". Virus Research. 70 (1–2): 55–64. doi:10.1016/S0168-1702(00)00210-0. ISSN   0168-1702. PMID   11074125.
  7. Brodie, Ryan; Smith, AJ; Roper, RL; Tcherepanov, V; Upton, C (Jul 2004). "Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments". BMC Bioinformatics. 5: 96. doi: 10.1186/1471-2105-5-96 . PMC   481056 . PMID   15253776.
  8. Shin-Lin Tu; Jeannette P. Staheli; Colum McClay; Kathleen McLeod; Timothy M. Rose; Chris Upton (2018). "Base-By-Base Version 3: New Comparative Tools for Large Virus Genomes". Viruses. 10 (11): 637. doi: 10.3390/v10110637 . PMC   6265842 . PMID   30445717.
  9. Hillary, William; Lin, Song-Han; Upton, Chris (2011). "Base-By-Base version 2: single nucleotide-level analysis of whole viral genome alignments". Microbial Informatics and Experimentation. 1 (1): 2. doi: 10.1186/2042-5783-1-2 . PMC   3348662 . PMID   22587754.
  10. Brodie, R.; Roper, RL; Upton, C (Jan 2004). "JDotter: a Java interface to multiple dotplots generated by dotter". Bioinformatics. 20 (2): 279–81. doi: 10.1093/bioinformatics/btg406 . ISSN   1367-4803. PMID   14734323.
  11. Thomas, Jamie M; Horspool, D; Brown, G; Tcherepanov, V; Upton, C (Jan 2007). "GraphDNA: a Java program for graphical display of DNA composition analyses". BMC Bioinformatics. 8: 21. doi: 10.1186/1471-2105-8-21 . PMC   1783863 . PMID   17244370.
  12. Tcherepanov, Vasily; Ehlers, A; Upton, C (Jun 2006). "Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome". BMC Genomics. 7: 150. doi: 10.1186/1471-2164-7-150 . PMC   1534038 . PMID   16772042.

Further reading

  1. Da Silva, Melissa; Upton, Chris (2012). Vaccinia Virus and Poxvirology. Methods in Molecular Biology. Vol. 890. Melissa Da Silva and Chris Upton. pp. 233–258. doi:10.1007/978-1-61779-876-4_14. ISBN   978-1-61779-875-7. PMID   22688771.
  2. Ghedin, Elodie; Upton, Chris (2011). "It's a small world after all — viral genomics and the global dominance of viruses". Current Opinion in Virology. 1 (4): 280–281. doi:10.1016/j.coviro.2011.08.001. PMID   22440784.
  3. Amgarten, Deyvid; Upton, Chris (2018). Comparative Genomics. Methods in Molecular Biology. Vol. 1704. Deyvid Amgarten and Chris Upton. pp. 401–417. doi:10.1007/978-1-4939-7463-4_15. ISBN   978-1-4939-7461-0. PMID   29277875.