Genetic saturation

Last updated

Genetic saturation is the result of multiple substitutions at the same site in a sequence, or identical substitutions in different sequences, such that the apparent sequence divergence rate is lower than the actual divergence that has occurred. [1] When comparing two or more genetic sequences consisting of single nucleotides, differences in sequence observed are only differences in the final state of the nucleotide sequence. Single nucleotides that undergoing genetic saturation change multiple times, sometimes back to their original nucleotide or to a nucleotide common to the compared genetic sequence. Without genetic information from intermediate taxa, it is difficult to know how much, or if any saturation has occurred on an observed sequence. [2] Genetic saturation occurs most rapidly on fast-evolving sequences, such as the hypervariable region of mitochondrial DNA, or in short tandem repeats such as on the Y-chromosome. [3] [4]

Contents

In phylogenetics, saturation effects result in long branch attraction, where the most distant lineages have misleadingly short branch lengths. It also decreases phylogenetic information contained in the sequences. [5]

Phylogenetic saturation

Multiple substitutions

Multiple substitutions take place when single nucleotides undergo multiple changes before reaching their final nucleotide identity. A sequence is said to be saturated because mutation has acted multiple times upon nucleotides and observed change in sequence is, in fact, less than the historical change in sequence. [1]

Detection

It is possible to estimate the amount of saturation that a sequence might have undergone by estimating the substitution rate of a genetic sequence and how much time has passed since divergence. Divergence rates are estimated from a variety of sources including ancestral DNA, fossil records and biographical events. [6] This use of molecular clocks to determine divergence is controversial because of its potential for inaccuracy and assumptions made in the model (such as consistent mutation rate for all branches) and is used mostly as an estimation tool. [6] Genetic saturation can also be estimated by comparing the number of observed differences in nucleotide sequences between multiple pairs of species. The number of observed substitutions between sequences of different species can be compared to the number of inferred substitutions based on branch length to find the approximate point where the number of inferred substitutions surpasses the number of observed substitutions. [6] [7] This method can give researchers an idea of the level of saturation of a particular gene but is thought to underestimate the amount of saturation, especially for very large branch lengths. [2]

The effects of saturation can affect expected divergence times leading to inaccurate estimates. Genetic Saturation Image 001.png
The effects of saturation can affect expected divergence times leading to inaccurate estimates.

Impact on phylogenetics

In the field of molecular phylogenetics, the distances and relationships between species are investigated by looking at the DNA, RNA or amino acid sequences of an organism. When phylogenetic trees are constructed without considering possible saturation, the possibility of multiple substitutions can cause the distance between taxa to appear much smaller than the true distance. Multiple sequence alignment, a common technique to construct phylogenies, relies on the comparison of homologous sequences. It can easily be confounded by genetic saturation because the homologous loci under investigation show no indication whether or not more than one substitution on each nucleotide separates the taxa being described. [1] Substitution decreases the amount of phylogenetic information that can be contained in sequences, especially when deep branches are involved. This is particularly evident in studies examining arthropod groups. [8] Furthermore, saturation effects can lead to a gross underestimation of divergence time. This is mainly attributed to the randomization of the phylogenetic signal with the number of observed sequence mutations and substitutions. The effects of saturation can mask the true amount of divergence time leading to inaccurate phylogenetic trees. [1] [2]

Three possible phylogenetic trees derived from obtained genetic sequences of 4 different species when genetic saturation and parsimony is taken into account Phylogenetic Tree Saturation.png
Three possible phylogenetic trees derived from obtained genetic sequences of 4 different species when genetic saturation and parsimony is taken into account

The principle of parsimony in genetic saturation analysis

Parsimony plays a fundamental role in genetic saturation analysis. This principle gives preference to the simplest explanation that can explain the data. In regards to genetic saturation, parsimony means that the hypothesized relationship is one that has the smallest number of character changes. Using parsimony to analyze genetic saturation can lead to conflict when creating a phylogenetic tree. [7] When only sequence data is used, it is possible to come up with numerous phylogenetic trees with the same amount of parsimony.

Long branch attraction

Genetic saturation contributes to long-branch attraction in its ability to greatly mix up genetic code without easily observable associated phenotypic changes. Long branch attraction occurs when two relatively outgrouped taxa are seemingly closely linked. [1] The more substitution mutations, the more likely it is for previously dissimilar sequences to share nucleotides and as a result, show homology in phylogenetic tree calculations. Long-branch attraction due to saturation has been proposed to be the cause of links in ancient phylogenies and puts into question even some of the earliest relationships between eukaryotes, archaea, and eubacteria. [2]

Other uses of "Saturation" in genetics

Gene site saturation mutagenesis

Gene site saturation mutagenesis (GSSM) is mutagenesis technique of one or more codons in a gene to create a library of variants covering all other codons at that position. [9] It is used in biochemistry and protein engineering to explore the functions and characteristics of specific amino acid sequences. [9] This systemic identification of amino acid substitutions allows researchers to look at every possible variant of each position. This will provide crucial structural information about the protein of interest and will identify amino acid sequences that are more vital to the function of the protein. [9] [10]

The types of codon sets that can be used for GSSM, as well as the potential number of codons and amino acids that can come from it. Site Saturation Mutagenesis Image01.png
The types of codon sets that can be used for GSSM, as well as the potential number of codons and amino acids that can come from it.

Researchers often lean towards using a one-step PCR-based to explore the specific effects of different variations in an amino acid of interest within a protein with GSSM. [11] With a one-step PCR-based approached, researchers create a primer that has a corresponding sequence to the protein of interest at its two ends. Only one codon of a three codon amino acid sequence is substituted. [10]

The type of codon set, will determine the number of sequences that can be derived from GSSM. To determine which codon set to use, researchers will need to check the library quality on the DNA level, which means that massive sequence data is needed. If all 3 positions can be substituted for each of the four different nucleotides, researchers can code for all 20 amino acids. [10] Although it’s possible to code for all 20 amino acids, this is not the most efficient method. The most efficient method is to use an NNK codon degeneracy, also known as a limited codon set. [12] This method, will result in only 32 codons rather than 64. [10]

Advantages of GSSM

In comparison to other techniques, GSSM is able to offer unique advantages such as:

GSSM was able to open up a whole frontier in genetic research, as it revolutionized fundamental beliefs about DNA. Before GSSM, researchers mutated DNA through radiation or with various chemicals. Both of these methods are imprecise. [13]

Related Research Articles

<span class="mw-page-title-main">Genetic code</span> Rules by which information encoded within genetic material is translated into proteins

The genetic code is the set of rules used by living cells to translate information encoded within genetic material into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

Protein engineering is the process of developing useful or valuable proteins through the design and production of unnatural polypeptides, often by altering amino acid sequences found in nature. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It has been used to improve the function of many enzymes for industrial catalysis. It is also a product and services market, with an estimated value of $168 billion by 2017.

<span class="mw-page-title-main">Nucleic acid sequence</span> Succession of nucleotides in a nucleic acid

A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nucleotides. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, with its double helix, there are two possible directions for the notated sequence; of these two, the sense strand is used. Because nucleic acids are normally linear (unbranched) polymers, specifying the sequence is equivalent to defining the covalent structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the primary structure.

Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.

<span class="mw-page-title-main">Frameshift mutation</span> Mutation that shifts codon alignment

A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides in a DNA sequence that is not divisible by three. Due to the triplet nature of gene expression by codons, the insertion or deletion can change the reading frame, resulting in a completely different translation from the original. The earlier in the sequence the deletion or insertion occurs, the more altered the protein. A frameshift mutation is not the same as a single-nucleotide polymorphism in which a nucleotide is replaced, rather than inserted or deleted. A frameshift mutation will in general cause the reading of the codons after the mutation to code for different amino acids. The frameshift mutation will also alter the first stop codon encountered in the sequence. The polypeptide being created could be abnormally short or abnormally long, and will most likely not be functional.

<span class="mw-page-title-main">Point mutation</span> Replacement, insertion, or deletion of a single DNA or RNA nucleotide

A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences that are moderately predictable based upon the specifics of the mutation. These consequences can range from no effect to deleterious effects, with regard to protein production, composition, and function.

<span class="mw-page-title-main">Silent mutation</span> DNA mutation with no observable effect on an organisms phenotype

Silent mutations are mutations in DNA that do not have an observable effect on the organism's phenotype. They are a specific type of neutral mutation. The phrase silent mutation is often used interchangeably with the phrase synonymous mutation; however, synonymous mutations are not always silent, nor vice versa. Synonymous mutations can affect transcription, splicing, mRNA transport, and translation, any of which could alter phenotype, rendering the synonymous mutation non-silent. The substrate specificity of the tRNA to the rare codon can affect the timing of translation, and in turn the co-translational folding of the protein. This is reflected in the codon usage bias that is observed in many species. Mutations that cause the altered codon to produce an amino acid with similar functionality are often classified as silent; if the properties of the amino acid are conserved, this mutation does not usually significantly affect protein function.

<span class="mw-page-title-main">Insertion (genetics)</span> Type of mutation

In genetics, an insertion is the addition of one or more nucleotide base pairs into a DNA sequence. This can often happen in microsatellite regions due to the DNA polymerase slipping. Insertions can be anywhere in size from one base pair incorrectly inserted into a DNA sequence to a section of one chromosome inserted into another. The mechanism of the smallest single base insertion mutations is believed to be through base-pair separation between the template and primer strands followed by non-neighbor base stacking, which can occur locally within the DNA polymerase active site. On a chromosome level, an insertion refers to the insertion of a larger sequence into a chromosome. This can happen due to unequal crossover during meiosis.

<span class="mw-page-title-main">Substitution model</span> Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of DNA sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

In genetics, the Ka/Ks ratio, also known as ω or dN/dS ratio, is used to estimate the balance between neutral mutations, purifying selection and beneficial mutations acting on a set of homologous protein-coding genes. It is calculated as the ratio of the number of nonsynonymous substitutions per non-synonymous site (Ka), in a given period of time, to the number of synonymous substitutions per synonymous site (Ks), in the same period. The latter are assumed to be neutral, so that the ratio indicates the net balance between deleterious and beneficial mutations. Values of Ka/Ks significantly above 1 are unlikely to occur without at least some of the mutations being advantageous. If beneficial mutations are assumed to make little contribution, then Ka/Ks estimates the degree of evolutionary constraint.

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

Missense mRNA is a messenger RNA bearing one or more mutated codons that yield polypeptides with an amino acid sequence different from the wild-type or naturally occurring polypeptide. Missense mRNA molecules are created when template DNA strands or the mRNA strands themselves undergo a missense mutation in which a protein coding sequence is mutated and an altered amino acid sequence is coded for.

A nonsynonymous substitution is a nucleotide mutation that alters the amino acid sequence of a protein. Nonsynonymous substitutions differ from synonymous substitutions, which do not alter amino acid sequences and are (sometimes) silent mutations. As nonsynonymous substitutions result in a biological change in the organism, they are subject to natural selection.

Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

<span class="mw-page-title-main">Mutagenesis (molecular biology technique)</span>

In molecular biology, mutagenesis is an important laboratory technique whereby DNA mutations are deliberately engineered to produce libraries of mutant genes, proteins, strains of bacteria, or other genetically modified organisms. The various constituents of a gene, as well as its regulatory elements and its gene products, may be mutated so that the functioning of a genetic locus, process, or product can be examined in detail. The mutation may produce mutant proteins with interesting properties or enhanced or novel functions that may be of commercial use. Mutant strains may also be produced that have practical application or allow the molecular basis of a particular cell function to be investigated.

Sequence saturation mutagenesis (SeSaM) is a chemo-enzymatic random mutagenesis method applied for the directed evolution of proteins and enzymes. It is one of the most common saturation mutagenesis techniques. In four PCR-based reaction steps, phosphorothioate nucleotides are inserted in the gene sequence, cleaved and the resulting fragments elongated by universal or degenerate nucleotides. These nucleotides are then replaced by standard nucleotides, allowing for a broad distribution of nucleic acid mutations spread over the gene sequence with a preference to transversions and with a unique focus on consecutive point mutations, both difficult to generate by other mutagenesis techniques. The technique was developed by Professor Ulrich Schwaneberg at Jacobs University Bremen and RWTH Aachen University.

References

  1. 1 2 3 4 5 Philippe H, Brinkmann H, Lavrov DV, Littlewood DT, Manuel M, Wörheide G, Baurain D (March 2011). "Resolving difficult phylogenetic questions: why more sequences are not enough". PLOS Biology. 9 (3): e1000602. doi: 10.1371/journal.pbio.1000602 . PMC   3057953 . PMID   21423652.
  2. 1 2 3 4 Philippe H, Forterre P (October 1999). "The rooting of the universal tree of life is not reliable". Journal of Molecular Evolution. 49 (4): 509–23. Bibcode:1999JMolE..49..509P. doi:10.1007/PL00006573. PMID   10486008. S2CID   20350374.
  3. Henn BM, Gignoux CR, Feldman MW, Mountain JL (January 2009). "Characterizing the time dependency of human mitochondrial DNA mutation rate estimates". Molecular Biology and Evolution. 26 (1): 217–30. doi: 10.1093/molbev/msn244 . PMID   18984905.
  4. Ho SY, Phillips MJ, Cooper A, Drummond AJ (July 2005). "Time dependency of molecular rate estimates and systematic overestimation of recent divergence times". Molecular Biology and Evolution. 22 (7): 1561–8. doi: 10.1093/molbev/msi145 . PMID   15814826.
  5. Abylgazieva NA (2003-01-01). "[Case of "renal diabetes"]". Zdravookhranenie Kirgizii. 26 (3): 49–51. doi:10.1016/S1055-7903(02)00326-3. PMID   7903.
  6. 1 2 3 van Tuinen M, Dyke GJ (January 2004). "Calibration of galliform molecular clocks using multiple fossils and genetic partitions". Molecular Phylogenetics and Evolution. 30 (1): 74–86. doi:10.1016/S1055-7903(03)00164-7. PMID   15022759.
  7. 1 2 Dávalos LM, Perkins SL (May 2008). "Saturation and base composition bias explain phylogenomic conflict in Plasmodium". Genomics. 91 (5): 433–42. doi:10.1016/j.ygeno.2008.01.006. PMID   18313259.
  8. Sanders KL, Lee MS (April 20, 2009). "Arthropod molecular divergence times and the Cambrian origin of pentastomids". Systematics and Biodiversity. 8 (1): 63–74. doi:10.1080/14772000903562012. S2CID   84880682.
  9. 1 2 3 4 5 6 7 Zheng L, Baumann U, Reymond JL (August 2004). "An efficient one-step site-directed and site-saturation mutagenesis protocol". Nucleic Acids Research. 32 (14): e115. doi:10.1093/nar/gnh110. PMC   514394 . PMID   15304544.
  10. 1 2 3 4 Lopez P, Forterre P, Philippe H (October 1999). "The root of the tree of life in the light of the covarion model". Journal of Molecular Evolution. 49 (4): 496–508. Bibcode:1999JMolE..49..496L. doi:10.1007/pl00006572. PMID   10486007. S2CID   22835829.
  11. Li A, Acevedo-Rocha CG, Reetz MT (July 2018). "Boosting the efficiency of site-saturation mutagenesis for a difficult-to-randomize gene by a two-step PCR strategy". Applied Microbiology and Biotechnology. 102 (14): 6095–6103. doi:10.1007/s00253-018-9041-2. PMC   6013526 . PMID   29785500.
  12. Kretz KA, Richardson TH, Gray KA, Robertson DE, Tan X, Short JM (Aug 6, 2004). "Gene site saturation mutagenesis: a comprehensive mutagenesis approach". Protein Engineering. Methods in Enzymology. Vol. 388. pp. 3–11. doi:10.1016/S0076-6879(04)88001-7. ISBN   9780121827939. PMID   15289056.
  13. Smith I, Payne J, Keay B. "How Michael Smith put B.C.'s life sciences community on the map with a Nobel Prize 25 years ago". Vancouver Sun. Retrieved 24 September 2018.