Minimotif Miner

Last updated
Minimotif Miner
Database.png
Content
Descriptiondatabase expansion and significantly improved reduction of false-positive predictions from consensus sequences.
Contact
LaboratorySanguthevar Rajasekaran and Martin R. Schiller
AuthorsMi, Tian; Merlin Jerlin Camilus, Deverasetty Sandeep, Gryk Michael R, Bill Travis J, Brooks Andrew W, Lee Logan Y, Rathnayake Viraj, Ross Christian A, Sargeant David P, Strong Christy L, Watts Paula, Rajasekaran Sanguthevar, Schiller Martin R
Primary citationMi, Tian; Merlin Jerlin Camilus, Deverasetty Sandeep, Gryk Michael R, Bill Travis J, Brooks Andrew W, Lee Logan Y, Rathnayake Viraj, Ross Christian A, Sargeant David P, Strong Christy L, Watts Paula, Rajasekaran Sanguthevar, Schiller Martin R (2012) [1]
Release date2011
Access
Website http://mnm.engr.uconn.edu http://minimotifminer.org

Minimotif Miner is a program and database designed to identify minimotifs in any protein. [2] [3] [4] Minimotifs are short, contiguous peptide sequences that are known to have a function in at least one protein. Minimotifs are also called sequence motifs or short linear motifs or SLiMs. These are generally restricted to one secondary structure element and are less than 15 amino acids in length.

Contents

Description

Functions can be binding motifs that bind another macromolecule or small compound, that induce a covalent modification of minimotif, or are involved in the protein trafficking of the protein containing the minimotif. The basic premise of Minimotif Miner is that is a short peptide sequence is known to have a function in one protein, may have a similar function in another query protein. The current release of the MnM 3.0 database has ~300,000 minimotifs and can be searched at the website.

There are two workflows that are of interest to scientists that use Minimotif Miner 1) Entering any query protein into Minimotif Miner returns a table with a list of minimotif sequence and functions that have a sequence pattern match with the protein query sequence. These provide potential new functions in the protein query. 2) By using the view single nucleotide polymorphism (SNP) function, SNPs from dbSNP are mapped in the sequence window. A user can select any set of the SNPs and then identify any minimotif that is introduced or eliminated by the SNP or mutation. This helps to identify minimotifs involved in generating organism diversity or those that may be associated with a disease.

Typical results of MnM predict more than 50 new minimotifs for a protein query. A major limitation in this type of analysis is that the low sequence complexity of short minimotifs produces false positive predictions where the sequence occurs in a protein by random chance and not because it contains the predicted function. MnM 3.0 introduces a library of advanced heuristics and filters, which enable vast reduction of false positive predictions. These filters use minimotif complexity, protein surface location, molecular processes, cellular processes, protein-protein interactions, and genetic interactions. We recently combined all of these heuristics into a single, compound filter which makes significant progress toward solving this problem with high accuracy of minimotif prediction as measured by a performance benchmarking study which evaluated both sensitivity and specificity.

See also

Related Research Articles

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences such as calculating the distance cost between strings in a natural language, or to display financial data.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; it is important in medicine and biotechnology.

In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an N-glycosylation site motif can be defined as Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue.

<span class="mw-page-title-main">Protein family</span> Group of evolutionarily-related proteins

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Protein structure</span> Three-dimensional arrangement of atoms in an amino acid-chain molecule

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

Protein–protein interaction prediction is a field combining bioinformatics and structural biology in an attempt to identify and catalog physical interactions between pairs or groups of proteins. Understanding protein–protein interactions is important for the investigation of intracellular signaling pathways, modelling of protein complex structures and for gaining insights into various biochemical processes.

The Eukaryotic Linear Motif (ELM) resource is a computational biology resource for investigating short linear motifs (SLiMs) in eukaryotic proteins. It is currently the largest collection of linear motif classes with annotated and experimentally validated linear motif instances.

<span class="mw-page-title-main">Short linear motif</span>

In molecular biology short linear motifs (SLiMs), linear motifs or minimotifs are short stretches of protein sequence that mediate protein–protein interaction.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

HIVToolbox is an internet application that helps scientists identify and develop new hypotheses as well as facilitate interpretation of experiments, with the goal of helping scientists better understand HIV, the virus that causes AIDS, and to identify new potential drug targets.

The HH-suite is an open-source software package for sensitive protein sequence searching. It contains programs that can search for similar protein sequences in protein sequence databases. Sequence searches are a standard tool in modern biology with which the function of unknown proteins can be inferred from the functions of proteins with similar sequences. HHsearch and HHblits are two main programs in the package and the entry point to its search function, the latter being a faster iteration. HHpred is an online server for protein structure prediction that uses homology information from HH-suite.

PredictProtein (PP) is an automatic service that searches up-to-date public sequence databases, creates alignments, and predicts aspects of protein structure and function. Users send a protein sequence and receive a single file with results from database comparisons and prediction methods. PP went online in 1992 at the European Molecular Biology Laboratory; since 1999 it has operated from Columbia University and in 2009 it moved to the Technische Universität München. Although many servers have implemented particular aspects, PP remains the most widely used public server for structure prediction: over 1.5 million requests from users in 104 countries have been handled; over 13000 users submitted 10 or more different queries. PP web pages are mirrored in 17 countries on 4 continents. The system is optimized to meet the demands of experimentalists not experienced in bioinformatics. This implied that we focused on incorporating only high-quality methods, and tried to collate results omitting less reliable or less important ones.

In the field of computational biology, a planted motif search (PMS) also known as a (l, d)-motif search (LDMS) is a method for identifying conserved motifs within a set of nucleic acid or peptide sequences.

<span class="mw-page-title-main">Multiple Epidermal Growth Factor-like Domains 8</span> Protein-coding gene in the species Homo sapiens

Megf8 also known as Multiple Epidermal Growth Factor-like Domains 8, is a protein coding gene that encodes a single pass membrane protein, known to participate in developmental regulation and cellular communication. It is located on chromosome 19 at the 49th open reading frame in humans (19q13.2). There are two isoform constructs known for MEGF8, which differ by a 67 amino acid indel. The isoform 2 splice version is 2785 amino acids long, and predicted to be 296.6 kdal in mass. Isoform 1 is composed of 2845 amino acids and predicted to weigh 303.1 kdal. Using BLAST searches, orthologs were found primarily in mammals, but MEGF8 is also conserved in invertebrates and fishes, and rarely in birds, reptiles, and amphibians. A notably important paralog to multiple epidermal growth factor-like domains 8 is ATRNL1, which is also a single pass transmembrane protein, with several of the same key features and motifs as MEGF8, as indicated by Simple Modular Architecture Research Tool (SMART) which is hosted by the European Molecular Biology Laboratory located in Heidelberg, Germany. MEGF8 has been predicted to be a key player in several developmental processes, such as left-right patterning and limb formation. Currently, researchers have found MEGF8 SNP mutations to be the cause of Carpenter syndrome subtype 2.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

LOC105377021 is a protein which in humans is encoded by the LOC105377021 gene. LOC105377021 exhibits expressional pathology related to breast cancer, specifically triple negative breast cancer. LOC105377021 contains a serine rich region in addition to predicted alpha helix motifs.

References

  1. Mi, Tian; Merlin Jerlin Camilus; Deverasetty Sandeep; Gryk Michael R; Bill Travis J; Brooks Andrew W; Lee Logan Y; Rathnayake Viraj; Ross Christian A; Sargeant David P; Strong Christy L; Watts Paula; Rajasekaran Sanguthevar; Schiller Martin R (Jan 2012). "Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences". Nucleic Acids Res. 40 (1). England: D252–60. doi:10.1093/nar/gkr1189. PMC   3245078 . PMID   22146221.
  2. Schiller, Martin R. (2007). "Minimotif Miner: A Computational Tool to Investigate Protein Function, Disease, and Genetic Diversity". Current Protocols in Protein Science. 48: 2.12.1–2.12.14. doi:10.1002/0471140864.ps0212s48. ISBN   978-0-471-14086-3. PMID   18429315. S2CID   10406520.
  3. Rajasekaran, Sanguthevar; Balla, Sudha; Gradie, Patrick; Gryk, Michael R.; Kadaveru, Krishna; Kundeti, Vamsi; MacIejewski, Mark W.; Mi, Tian; et al. (2009). "Minimotif miner 2nd release: a database and web system for motif search". Nucleic Acids Research. 37 (Database issue): D185–90. doi:10.1093/nar/gkn865. PMC   2686579 . PMID   18978024.
  4. Balla, Sudha; Thapar, Vishal; Verma, Snigda; Luong, ThaiBinh; Faghri, Tanaz; Huang, Chun-Hsi; Rajasekaran, Sanguthevar; del Campo, Jacob J; Shinn, Jessica H; Mohler, William A; Maciejewski, Mark W; Gryk, Michael R; Piccirillo, Bryan; Schiller, Stanley R; Schiller, Martin R (2006). "Minimotif Miner, a tool for investigating protein function". Nature Methods. 3 (3): 175–177. doi:10.1038/nmeth856. PMID   16489333. S2CID   15571142.

Further reading