MutationTaster

Last updated
MutationTaster
Content
DescriptionIn silico tool to predict the disease-causing potential of DNA variants
Contact
Research center Charité Berlin
AuthorsJana Marie Schwarz and Dominik Seelow
Primary citation PMID   24681721
Access
Website www.mutationtaster.org

MutationTaster is a free web-based application to evaluate DNA sequence variants for their disease-causing potential. The software performs a battery of in silico tests to estimate the impact of the variant on the gene product / protein. Tests are made on both, protein and DNA level, MutationTaster is hence not limited to substitutions of single amino acids but can also handle synonymous or intronic variants. [1] [2]

Contents

Background

Many genetic disorders can be caused by mutation of a single gene. New sequencing techniques however, have shown that a single individual can have up to 3.5 million alterations in the whole genome, most of which do not have a detrimental health effect. [3] The challenge of prediction tools is thus to filter harmless mutations from disease-causing ones. It is important to note that these tools are not designed to predict sources of complex diseases such as cancer. The latter usually do not have a monogenic causation but are caused by multiple gene defects that develop cumulatively into a disease.

Approach and tests

Mutation Taster is written in Perl and can process Next-generation sequencing data of all major platforms (Roche 454, Illumina Genome Analyzer and ABI SOLiD). The program first discards mutations that are known, harmless polymorphisms by comparison with the integrated databases. The remaining SNPs (Single-nucleotide polymorphism) are tested according to the gene alteration they are causing:

Multiple tests are performed to determine the nature of the given SNP. These tests comprise (among others):

Integrated data sources (among others):

The single results are then assessed by a Naive Bayes classifier which decides whether or not their combined effect might be deleterious for the protein. The 'raw' accuracy of MutationTaster is about 90%, with the inclusion of knowledge about common (harmless) polymorphisms and known disease mutations, the actual rate of correct classifications is much higher. The test output explains if the alteration is a known or predicted harmless or disease-causing mutation and gives detailed information about the mutation.

Importantly, the predictions of clinical effects of mutations suffer from a lack of specificity, which appears to be the common constraint of all recently used prediction methods, including those named above. Despite that, predictions mediated by these methods are associated with nearly absolute sensitivity. The outcomes of prediction methods are often uncritically used, particularly by non-specialists in the field. [4]

Development

Development of MutationTaster has started in 2007, the software is available online since 2009. MutationTaster is hosted at the Charité Berlin and its current developers are Olivia Ebner, Daniela Hombach, Markus Schülke, Jana Marie Schwarz, Dominik Seelow. Current efforts are focused on integrating mutations that do not directly alter protein coding genes but have an effect on gene regulation and expression.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">Alternative splicing</span> Process by which a gene can code for multiple proteins

Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be included within or excluded from the final, processed messenger RNA (mRNA) produced from that gene. This means the exons are joined in different combinations, leading to different (alternative) mRNA strands. Consequently, the proteins translated from alternatively spliced mRNAs usually contain differences in their amino acid sequence and, often, in their biological functions.

<span class="mw-page-title-main">Nucleic acid sequence</span> Succession of nucleotides in a nucleic acid

A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nucleotides. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, with its double helix, there are two possible directions for the notated sequence; of these two, the sense strand is used. Because nucleic acids are normally linear (unbranched) polymers, specifying the sequence is equivalent to defining the covalent structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the primary structure.

<span class="mw-page-title-main">Molecular genetics</span> Scientific study of genes at the molecular level

Molecular genetics is a branch of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the structure and/or function of genes in an organism's genome using genetic screens. 

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome that is present in a sufficiently large fraction of considered population.

<span class="mw-page-title-main">Frameshift mutation</span> Mutation that shifts codon alignment

A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature of gene expression by codons, the insertion or deletion can change the reading frame, resulting in a completely different translation from the original. The earlier in the sequence the deletion or insertion occurs, the more altered the protein. A frameshift mutation is not the same as a single-nucleotide polymorphism in which a nucleotide is replaced, rather than inserted or deleted. A frameshift mutation will in general cause the reading of the codons after the mutation to code for different amino acids. The frameshift mutation will also alter the first stop codon encountered in the sequence. The polypeptide being created could be abnormally short or abnormally long, and will most likely not be functional.

<span class="mw-page-title-main">Point mutation</span> Replacement, insertion, or deletion of a single DNA or RNA nucleotide

A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences that are moderately predictable based upon the specifics of the mutation. These consequences can range from no effect to deleterious effects, with regard to protein production, composition, and function.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

<span class="mw-page-title-main">Silent mutation</span> DNA mutation with no observable effect on an organisms phenotype

Silent mutations are mutations in DNA that do not have an observable effect on the organism's phenotype. They are a specific type of neutral mutation. The phrase silent mutation is often used interchangeably with the phrase synonymous mutation; however, synonymous mutations are not always silent, nor vice versa. Synonymous mutations can affect transcription, splicing, mRNA transport, and translation, any of which could alter phenotype, rendering the synonymous mutation non-silent. The substrate specificity of the tRNA to the rare codon can affect the timing of translation, and in turn the co-translational folding of the protein. This is reflected in the codon usage bias that is observed in many species. Mutations that cause the altered codon to produce an amino acid with similar functionality are often classified as silent; if the properties of the amino acid are conserved, this mutation does not usually significantly affect protein function.

Genetics, a discipline of biology, is the science of heredity and variation in living organisms.

<span class="mw-page-title-main">Insertion (genetics)</span> Type of mutation

In genetics, an insertion is the addition of one or more nucleotide base pairs into a DNA sequence. This can often happen in microsatellite regions due to the DNA polymerase slipping. Insertions can be anywhere in size from one base pair incorrectly inserted into a DNA sequence to a section of one chromosome inserted into another. The mechanism of the smallest single base insertion mutations is believed to be through base-pair separation between the template and primer strands followed by non-neighbor base stacking, which can occur locally within the DNA polymerase active site. On a chromosome level, an insertion refers to the insertion of a larger sequence into a chromosome. This can happen due to unequal crossover during meiosis.

<span class="mw-page-title-main">Conserved sequence</span> Similar DNA, RNA or protein sequences within genomes or among species

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.

<span class="mw-page-title-main">Exome sequencing</span> Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

<span class="mw-page-title-main">Gene polymorphism</span> Occurrence in an interbreeding population of two or more discontinuous genotypes

A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population. In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

References

  1. Schwarz, Jana Marie; Rödelsperger, Christian; Schuelke, Markus; Seelow, Dominik (2010-08-01). "MutationTaster evaluates disease-causing potential of sequence alterations". Nature Methods. 7 (8): 575–576. doi:10.1038/nmeth0810-575. ISSN   1548-7105. PMID   20676075. S2CID   26892938.
  2. Schwarz, Jana Marie; Cooper, David N; Schuelke, Markus; Seelow, Dominik (2014-03-28). "MutationTaster2: mutation prediction for the deep-sequencing age". Nature Methods. 11 (4): 361–362. doi:10.1038/nmeth.2890. ISSN   1548-7105. PMID   24681721. S2CID   19382079.
  3. Wheeler, David A. (2008-04-17). "The complete genome of an individual by massively parallel DNA sequencing". Nature. 452 (7189): 872–876. Bibcode:2008Natur.452..872W. doi: 10.1038/nature06884 . PMID   18421352.
  4. Simcikova D, Heneberg P (December 2019). "Refinement of evolutionary medicine predictions based on clinical evidence for the manifestations of Mendelian diseases". Scientific Reports. 9 (1): 18577. Bibcode:2019NatSR...918577S. doi:10.1038/s41598-019-54976-4. PMC   6901466 . PMID   31819097.