UTOPIA (bioinformatics tools)

Last updated

UTOPIA
Developer(s) Steve Pettifer,
Terri Attwood,
David Parry-Smith,
D.N.Perkins,
A.W. Payne,
A.D. Michie,
Phillip W.Lord,
J.N.Selley,
Phil McDermott,
James Marsh,
James Sinnott,
Dave Thorne,
Benjamin Blundell
Written inC++, Python
Operating system Linux, Mac and Windows
Website utopia.cs.manchester.ac.uk

UTOPIA (User-friendly Tools for Operating Informatics Applications) is a suite of free tools for visualising and analysing bioinformatics data. Based on an ontology-driven data model, it contains applications for viewing and aligning protein sequences, rendering complex molecular structures in 3D, and for finding and using resources such as web services and data objects. [1] [2] [3] [4] There are two major components, the protein analysis suite and UTOPIA documents.

Contents

Utopia Protein Analysis suite

The Utopia Protein Analysis suite is a collection of interactive tools for analysing protein sequence and protein structure. Up front are user-friendly and responsive visualisation applications, behind the scenes a sophisticated model that allows these to work together and hides much of the tedious work of dealing with file formats and web services. [1]

Utopia Documents

Utopia Documents brings a fresh new perspective to reading the scientific literature, combining the convenience and reliability of the Portable Document Format (pdf) with the flexibility and power of the web. [3] [5] [6]

History

Between 2003 and 2005 work on UTOPIA was funded via The e-Science North West Centre based at The University of Manchester by the Engineering and Physical Sciences Research Council, UK Department of Trade And Industry, and the European Molecular Biology Network (EMBnet). Since 2005 work continues under the EMBRACE European Network of Excellence.

UTOPIA's CINEMA (Colour INteractive Editor for Multiple Alignments), a tool for Sequence Alignment, is the latest incarnation of software originally developed at The University of Leeds to aid the analysis of G protein-coupled receptors (GPCRs). [7] SOMAP, [8] a Screen Oriented Multiple Alignment Procedure was developed in the late 1980s on the VMS computer operating system, used a monochrome text-based VT100 video terminal, and featured context-sensitive help and pulldown menus some time before these were standard operating system features.

SOMAP was followed by a Unix tool called VISTAS [9] (VIsualizing STructures And Sequences) which included the ability to render 3D molecular structure and generate plots and statistical representations of sequence properties.

The first tool under the CINEMA [10] banner developed at The University of Manchester was a Java-based applet launched via web pages, which is still available but is no longer maintained. A standalone Java version, called CINEMA-MX, [11] was also released but is no longer readily available.

A C++ version of CINEMA, called CINEMA5 was developed early on as part of the UTOPIA project, and was released as a stand-alone sequence alignment application. It has now been replaced by a version of the tool integrated with UTOPIA's other visualisation applications, and its name has reverted simply to CINEMA.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<i>Biochemical Journal</i> Academic journal

The Biochemical Journal is a peer-reviewed scientific journal which covers all aspects of biochemistry, as well as cell and molecular biology. It is published by Portland Press and was established in 1906.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

This list of structural comparison and alignment software is a compilation of software tools and web portals used in pairwise or multiple structural comparison and structural alignment.

Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.

<span class="mw-page-title-main">Dot plot (bioinformatics)</span>

In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot.

BLAT is a pairwise sequence alignment algorithm that was developed by Jim Kent at the University of California Santa Cruz (UCSC) in the early 2000s to assist in the assembly and annotation of the human genome. It was designed primarily to decrease the time needed to align millions of mouse genomic reads and expressed sequence tags against the human genome sequence. The alignment tools of the time were not capable of performing these operations in a manner that would allow a regular update of the human genome assembly. Compared to pre-existing tools, BLAT was ~500 times faster with performing mRNA/DNA alignments and ~50 times faster with protein/protein alignments.

In molecular biology, the PRINTS database is a collection of so-called "fingerprints": it provides both a detailed annotation resource for protein families, and a diagnostic tool for newly determined sequences. A fingerprint is a group of conserved motifs taken from a multiple sequence alignment - together, the motifs form a characteristic signature for the aligned protein family. The motifs themselves are not necessarily contiguous in sequence, but may come together in 3D space to define molecular binding sites or interaction surfaces. The particular diagnostic strength of fingerprints lies in their ability to distinguish sequence differences at the clan, superfamily, family and subfamily levels. This allows fine-grained functional diagnoses of uncharacterised sequences, allowing, for example, discrimination between family members on the basis of the ligands they bind or the proteins with which they interact, and highlighting potential oligomerisation or allosteric sites.

<span class="mw-page-title-main">Structured digital abstract</span>

A Structured Digital Abstract (SDA) is a method of describing relationships between biological entities in a structured, but human-readable, format. It is added below the abstract of scientific articles published in FEBS Letters and FEBS Journal. Current SDAs describe protein-protein interactions.

Simple Modular Architecture Research Tool (SMART) is a biological database that is used in the identification and analysis of protein domains within protein sequences. SMART uses profile-hidden Markov models built from multiple sequence alignments to detect protein domains in protein sequences. The most recent release of SMART contains 1,204 domain models. Data from SMART was used in creating the Conserved Domain Database collection and is also distributed as part of the InterPro database. The database is hosted by the European Molecular Biology Laboratory in Heidelberg.

The NucleaRDB is a database of nuclear receptors. It contains data about the sequences, ligand binding constants and mutations of those proteins.

Utopia Documents is a semantic, scientific, web-enabled PDF reader that is part of the Utopia toolset. Utopia Documents can be downloaded for free.

<span class="mw-page-title-main">Terri Attwood</span> British bioinformatics researcher

Teresa K. Attwood is a professor of Bioinformatics in the Department of Computer Science and School of Biological Sciences at the University of Manchester and a visiting fellow at the European Bioinformatics Institute (EMBL-EBI). She held a Royal Society University Research Fellowship at University College London (UCL) from 1993 to 1999 and at the University of Manchester from 1999 to 2002.

<span class="mw-page-title-main">Steve Pettifer</span>

Stephen Robert Pettifer is a Professor in the Department of Computer Science at the University of Manchester in England.

In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches.

References

  1. 1 2 Pettifer, S. R.; Sinnott, J. R.; Attwood, T. K. (2004). "UTOPIA—User-Friendly Tools for Operating Informatics Applications". Comparative and Functional Genomics. 5 (1): 56–60. doi:10.1002/cfg.359. PMC   2447318 . PMID   18629035.
  2. McDermott, P.; Sinnott, J.; Thorne, D.; Pettifer, S.; Attwood, T. (2006). "An Architecture for Visualisation and Interactive Analysis of Proteins". Fourth International Conference on Coordinated & Multiple Views in Exploratory Visualization (CMV'06). p. 55. doi:10.1109/CMV.2006.3. ISBN   978-0-7695-2605-8.
  3. 1 2 Attwood, T. K.; Kell, D. B.; McDermott, P.; Marsh, J.; Pettifer, S. R.; Thorne, D. (2009). "Calling International Rescue: Knowledge lost in literature and data landslide!". Biochemical Journal. 424 (3): 317–333. doi:10.1042/BJ20091474. PMC   2805925 . PMID   19929850.
  4. Pettifer, S.; Thorne, D.; McDermott, P.; Marsh, J.; Villéger, A.; Kell, D. B.; Attwood, T. K. (2009). "Visualising biological data: A semantic approach to tool and database integration". BMC Bioinformatics. 10: S19. doi:10.1186/1471-2105-10-S6-S19. PMC   2697642 . PMID   19534744.
  5. Attwood, T. K.; Kell, D. B.; McDermott, P.; Marsh, J.; Pettifer, S. R.; Thorne, D. (2010). "Utopia documents: Linking scholarly literature with research data". Bioinformatics. 26 (18): i568–i574. doi:10.1093/bioinformatics/btq383. PMC   2935404 . PMID   20823323.
  6. Pettifer, S.; McDermott, P.; Marsh, J.; Thorne, D.; Villeger, A.; Attwood, T. K. (2011). "Ceci n'est pas un hamburger: Modelling and representing the scholarly article". Learned Publishing. 24 (3): 207. doi:10.1087/20110309.
  7. Vroling, B.; Thorne, D.; McDermott, P.; Attwood, T. K.; Vriend, G.; Pettifer, S. (2011). "Integrating GPCR-specific information with full text articles". BMC Bioinformatics. 12: 362. doi:10.1186/1471-2105-12-362. PMC   3179973 . PMID   21910883.
  8. Parry-Smith, D. J.; Attwood, T. K. (1991). "SOMAP: A novel interactive approach to multiple protein sequences alignment". Bioinformatics. 7 (2): 233. doi:10.1093/bioinformatics/7.2.233. PMID   2059849.
  9. Perkins, D. N.; Attwood, T. K. (1995). "VISTAS: A package for VIsualizing STructures and sequences of proteins". Journal of Molecular Graphics. 13 (1): 73–75, 62. doi:10.1016/0263-7855(94)00013-I. PMID   7794837.
  10. Parry-Smith, D. J.; Payne, A. W. R.; Michie, A. D.; Attwood, T. K. (1998). "CINEMA—a novel Colour INteractive Editor for Multiple Alignments". Gene. 221 (1): GC57–GC63. doi:10.1016/S0378-1119(97)00650-1. PMID   9852962.
  11. Lord, P. W.; Selley, J. N.; Attwood, T. K. (2002). "CINEMA-MX: A modular multiple alignment editor". Bioinformatics. 18 (10): 1402–1403. doi: 10.1093/bioinformatics/18.10.1402 . PMID   12376388.