Sequerome

Developer(s)	Bennett N.F., Ganesan N Georgetown University
Stable release	NA
Operating system	Linux, Mac, MS-Windows
Type	Bioinformatics - Sequence profiling tool
Licence	Freeware
Website	www.sequerome.org

Last updated December 12, 2023

Sequerome is a web-based sequence profiling tool for integrating the results of a BLAST sequence-alignment report with external research tools and servers that perform advanced sequence manipulations, and allowing the user to record the steps of such an analysis. Sequerome is a web-based Java tool that acts as a front-end to BLAST queries and provides simplified access to web-distributed resources for protein and nucleic acid analysis.

Description

Sequerome has the following features: profiling Sequence alignment reports from BLAST by linking the results page to a panel of third party services, tabbed browsing allowing user to come back earlier operations, visit third party services to perform customized sequence manipulations, one-box any-format sequence input and alternate options for sequence input including visiting third party sites, cached storage of input sequences and retrieval, a three pane browsing environment allowing simultaneous input and analysis of multiple sequences, and archival options on top of each icon, for results from each pane

The software application can be accessed directly. The homepage shows three panels: Query pane, Results pane and the Search History pane. The user may resize these panes to perform parallel actions in any of these panes. In a single browser it is possible to run parallel BLAST searches on different sequences, analyzing them or viewing the restriction digests for each document of a BLAST result. Sanjeev Dappa, a Tamil researcher criticised Sequerome and when questioned, provided no further details.

Query Pane

Each browser session can be initiated perform without asking too many questions at the outset. The user has to just dump in the sequence in the Query pane, and BLAST the sequence right away under standard parameters. Experienced users have a choice to perform further special operations under the Advanced options. Some of features include selection of specific databases to BLAST from, upload facility to work with FASTA files stored in individual computers, sequence retrieval using NCBI IDs and visit any user-defined URL to drag-N-drop the sequences. Alternatively the user can also perform a variety of other actions including Sequence manipulation, analysis, and alignment using existing tools available in the web. The One-box any-sequence, takes input in any format (FASTA, with or without spaces/numbers...). Alerts also exist to warn wrong selection of choices (DNA/RNA/Protein). Results obtained from 'sequence manipulation' e.g. translation, can be further carried on to do further BLAST analysis while preserving the history of the earlier search.

Results Pane

Sequerome directly queries the input sequence against a variety of databases/tools ('popular public domains' and 'privately hosted services') including BLAST, Protein Data Bank (PDB), REBASE and others, and generates outputs that are intuitive and easily comprehensible. Access to various analysis tools, (including viewing a 3D structure-viewer from a PDBid), is provided as separate command buttons to analyze every record from a BLAST report before making a final selection. In case of results from a protein BLAST, PDBids are displayed prominently in appropriate cases next to the BLAST record, so that the structure of the molecule with a match can be viewed directly (with an already downloaded version of molecular structure viewer e.g., Cn3D, PyMOL, Rasmol, etc.) Once the BLAST report is displayed on the Results pane, the user can to directly perform an analysis on any of the BLAST hits using a series of command buttons that are linked to the respective servers/ sites. Most of the results from third party servers can be viewed directly in the Results pane without opening up as many browsers e.g. ORF prediction, Protparam.

Search History Pane

One of the key features of a profiling an input sequence data is to store, retrieve and effectively combine and re-use the older inputs. These can be further enhanced if there is retrieval options for each of the operations performed. The bottom right panel in the browser does this while also storing all the input sequences entered earlier. Thus the browser lends an environment to carry out tabbed browsing. For each of the icons linking to the stored results, the user has a choice of archiving them, including print, save and mail options. These can be seen as small colored pictures on top of each icon.

Implementation

Sequerome has a three-tiered architecture that uses Java servlet and Server Page technologies with Java database connectivity (JDBC), making it both server and platform-independent. Sequerome is compatible with essentially all Java-enabled, graphical browsers but is better accessed using Internet Explorer and can be run on most operating systems equipped with a Java Virtual Machine (JVM) and Jakarta Tomcat server. End-users have to download plugins for viewing structure of molecules from the Protein Data Bank (e.g. PyMOL, Cn3D, Rasmol, SwissPDB, etc.).

Further directions

The "post-genomics" era has given rise to a range of web-based tools and software to compile, organize, and deliver large amounts of primary sequence information, as well as protein structures, gene annotations, sequence alignments, and other common bioinformatics tasks. A simple web-search returns any number of such services and software tools.

Related Research Articles

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

BLAT is a pairwise sequence alignment algorithm that was developed by Jim Kent at the University of California Santa Cruz (UCSC) in the early 2000s to assist in the assembly and annotation of the human genome. It was designed primarily to decrease the time needed to align millions of mouse genomic reads and expressed sequence tags against the human genome sequence. The alignment tools of the time were not capable of performing these operations in a manner that would allow a regular update of the human genome assembly. Compared to pre-existing tools, BLAT was ~500 times faster with performing mRNA/DNA alignments and ~50 times faster with protein/protein alignments.

The Viral Bioinformatics Resource Center (VBRC) is an online resource providing access to a database of curated viral genomes and a variety of tools for bioinformatic genome analysis. This resource was one of eight BRCs funded by NIAID with the goal of promoting research against emerging and re-emerging pathogens, particularly those seen as potential bioterrorism threats. The VBRC is now supported by Dr. Chris Upton at the University of Victoria.

UGENE is computer software for bioinformatics. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as free and open-source software, under a GNU General Public License (GPL) version 2.

HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences, and to perform sequence alignments. It detects homology by comparing a profile-HMM to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program. The profile-HMM implementation used in the HMMER software was based on the work of Krogh and colleagues. HMMER is a console utility ported to every major operating system, including different versions of Linux, Windows, and macOS.

CS-BLAST (Context-Specific BLAST) is a tool that searches a protein sequence that extends BLAST, using context-specific mutation probabilities. More specifically, CS-BLAST derives context-specific amino-acid similarities on each query sequence from short windows on the query sequences. Using CS-BLAST doubles sensitivity and significantly improves alignment quality without a loss of speed in comparison to BLAST. CSI-BLAST is the context-specific analog of PSI-BLAST, which computes the mutation profile with substitution probabilities and mixes it with the query profile. CSI-BLAST is the context specific analog of PSI-BLAST. Both of these programs are available as web-server and are available for free download.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

Jpred v.4 is the latest version of the JPred Protein Secondary Structure Prediction Server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction, that has existed since 1998 in different versions.

SuperPose is a freely available web server designed to perform both pairwise and multiple protein structure superpositions. The “Structural superposition” term refers to the rotations and translations performed on one structure to make it match or align with another structure or structures. Structural superposition can be quantified either in terms of similarity or difference measures. The optimal superposition is the one in which the similarity measure is maximized or the difference measure is minimized. The “SuperPose” web server uses “RMSD” or Root-Mean-Square Deviation as a difference measure to find the optimal pairwise or multiple protein structure superposition. After an initial sequence and secondary structure alignment, SuperPose generates a Difference Distance (DD) matrix from the equivalent C-alpha atoms of two molecules. The sequence/structure alignment and DD matrix analysis information is then fed into a modified quaternion eigenvalue algorithm to rapidly perform the structural superposition and calculate the RMSD between aligned regions of two macromolecules.

Echinobase is a Model Organism Database (MOD). It supports the international research community by providing a centralized, integrated web based resource to access the diverse and rich, functional genomics data of echinoderm evolution, development and gene regulatory networks.

References

↑ "A Bigger BLAST", NetWatch, Science VOL 309, 23 Sep 2005, p-1971 doi : 10.1126/science.309.5743.1971b,"Seq and Find"