Covarion

Last updated
Rate = 1Rate = 0
A1G1C1T1A0G0C0T0
A1αβγδ000
G1αγβ0δ00
C1βγα00δ0
T1γβα000δ
A0κδ000000
G00κδ00000
C000κδ0000
T0000κδ000


The method of covarions, or concomitantly variable codons, is a technique in computational phylogenetics that allows the hypothesized rate of molecular evolution at individual codons in a set of nucleotide sequences to vary in an autocorrelated manner. Under the covarion model, the rates of evolution on different branches of a hypothesized phylogenetic tree vary in an autocorrelated way, and the rates of evolution at different codon sites in an aligned set of DNA or RNA sequences vary in a separate but autocorrelated manner. This provides additional and more realistic constraints on evolutionary rates versus the simpler technique of allowing the rate of evolution on each branch to be selected randomly from a suitable probability distribution such as the gamma distribution. Covarions is a concrete form of the more general concept of heterotachy.

Developing a computational algorithm suitable for identifying sites with high evolutionary rates from a static dataset is a challenge due to the constraints of autocorrelation. The original statement of the method used a rough stochastic model of the evolutionary process designed to identify transiently high-variability codon sites. Abandoning the requirement that rates be autocorrelated on a given DNA or RNA molecule allows extension of substitution matrix methods to the covarion model.

The matrix at right represents a covarion-based modification to the three-parameter Kimura substitution model, where the vertical axis represents the original state and the horizontal axis the destination state. The two rates, 0 and 1, define a pair of mutation states; transitions can occur between state 0 and state 1 at any time, but nucleotides can only mutate in state 1. That is, the rate of mutation in state 0 is 0. Here α and β are the standard Kimura parameters for transition and transversion mutations, κδ is the rate of transition between a site being invariant (state 0) and variable (state 1), and δ is the rate of transition between a site being variable (state 1) and invariant (state 0). Because nucleotide sequences do not themselves reflect the difference between a 0 or 1 state, an observation of a given nucleotide is treated as ambiguous; that is, if a given site contains a C nucleotide, it is ambiguous between C0 and C1 states.

Related Research Articles

Genetic code Rules by which information encoded within genetic material is translated into proteins.

The genetic code is the set of rules used by living cells to translate information encoded within genetic material into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.

Molecular phylogenetics The branch of phylogeny that analyzes genetic, hereditary molecular differences

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominately in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

Molecular evolution process of change in the sequence composition of cellular molecules across generations

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

The neutral theory of molecular evolution holds that most evolutionary changes occur at the molecular level, and most of the variation within and between species, are due to random genetic drift of mutant alleles that are selectively neutral. The theory applies only for evolution at the molecular level, and is compatible with phenotypic evolution being shaped by natural selection as postulated by Charles Darwin. The neutral theory allows for the possibility that most mutations are deleterious, but holds that because these are rapidly removed by natural selection, they do not make significant contributions to variation within and between species at the molecular level. A neutral mutation is one that does not affect an organism's ability to survive and reproduce. The neutral theory assumes that most mutations that are not deleterious are neutral rather than beneficial. Because only a fraction of gametes are sampled in each generation of a species, the neutral theory suggests that a mutant allele can arise within a population and reach fixation by chance, rather than by selective advantage.

Codon usage bias A genetic bias towards the preferential use of one of the redundant codons that encode the same amino acid over the others

Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation.

Molecular clock Technique to deduce the time in prehistory when two or more life forms diverged

The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins. The benchmarks for determining the mutation rate are often fossil or archaeological dates. The molecular clock was first tested in 1962 on the hemoglobin protein variants of various animals, and is commonly used in molecular evolution to estimate times of speciation or radiation. It is sometimes called a gene clock or an evolutionary clock.

In bioinformatics and evolutionary biology, a substitution matrix either describes the rate at which a character in a nucleotide sequence or a protein sequence changes to other character states over evolutionary time or it describes the log odds of finding two specific character states aligned. It is an application of a stochastic matrix. Substitution matrices are usually seen in the context of amino acid or DNA sequence alignments, where the similarity between sequences depends on their divergence time and the substitution rates as represented in the matrix.

Substitution model Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of DNA sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules represented as sequence of symbols. Substitution models is used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants since they can be used predict the frequencies of site pattern frequencies given a tree topology. Substitution models are necessary to simulate sequence data for a group of organisms related by a specific tree.

Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms.

Multiple sequence alignment Alignment of more than two molecular sequence

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

In genetics, the Ka/Ks ratio, also known as ω or dN/dS ratio, is used to estimate the balance between neutral mutations, purifying selection and beneficial mutations acting on a set of homologous protein-coding genes. It is calculated as the ratio of the number of nonsynonymous substitutions per non-synonymous site (Ka), in a given period of time, to the number of synonymous substitutions per synonymous site (Ks), in the same period. The latter are assumed to be neutral, so that the ratio indicates the net balance between deleterious and beneficial mutations. Values of Ka/Ks significantly above 1 are unlikely to occur without at least some of the mutations being advantageous. If beneficial mutations are assumed to make little contribution, then Ks estimates the degree of evolutionary constraint.

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will either be lost or will replace all other alleles of the gene. This loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

Masatoshi Nei

Masatoshi Nei is a Japanese-born American evolutionary biologist currently affiliated with the Department of Biology at Temple University as a Carnell Professor. He was, until recently, Evan Pugh Professor of Biology at Pennsylvania State University and Director of the Institute of Molecular Evolutionary Genetics; he was there from 1990 to 2015.

A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses. In particular, they are used during the calculation of likelihood of a tree and they are used to estimate the evolutionary distance between sequences from the observed differences between the sequences.

A nonsynonymous substitution is a nucleotide mutation that alters the amino acid sequence of a protein. Nonsynonymous substitutions differ from synonymous substitutions, which do not alter amino acid sequences and are (sometimes) silent mutations. As nonsynonymous substitutions result in a biological change in the organism, they are subject to natural selection.

Molecular Evolutionary Genetics Analysis Software for statistical analysis of molecular evolution

Molecular Evolutionary Genetics Analysis (MEGA) is computer software for conducting statistical analysis of molecular evolution and for constructing phylogenetic trees. It includes many sophisticated methods and tools for phylogenomics and phylomedicine. It is licensed as proprietary freeware. The project for developing this software was initiated by the leadership of Masatoshi Nei in his laboratory at the Pennsylvania State University in collaboration with his graduate student Sudhir Kumar and postdoctoral fellow Koichiro Tamura. Nei wrote a monograph (pp. 130) outlining the scope of the software and presenting new statistical methods that were included in MEGA. The entire set of computer programs was written by Kumar and Tamura. The personal computers then lacked the ability to send the monograph and software electronically, so they were delivered by postal mail. From the start, MEGA was intended to be easy-to-use and include solid statistical methods only.

Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

T-REX(website) is a freely available webserver, developed at the department of Computer Science of the Université du Québec à Montréal, dedicated to the inference, validation and visualization of phylogenetic trees and phylogenetic networks. The T-REX web server allows the users to perform several popular methods of phylogenetic analysis as well as some new phylogenetic applications for inferring, drawing and validating phylogenetic trees and networks.

The rate of evolution is quantified as the speed of genetic or morphological change in a lineage over a period of time. The speed at which a molecular entity evolves is of considerable interest in evolutionary biology since determining the evolutionary rate is the first step in characterizing its evolution. Calculating rates of evolutionary change is also useful when studying phenotypic changes in phylogenetic comparative biology. In either case, it can be beneficial to consider and compare both genomic data and paleontological data, especially in regards to estimating the timing of divergence events and establishing geological time scales.

Genetic saturation is the result of multiple substitutions at the same site in a sequence, or identical substitutions in different sequences, such that the apparent sequence divergence rate is lower than the actual divergence that has occurred. When comparing two or more genetic sequences consisting of single nucleotides, differences in sequence observed are only differences in the final state of the nucleotide sequence. Single nucleotides that undergoing genetic saturation change multiple times, sometimes back to their original nucleotide or to a nucleotide common to the compared genetic sequence. Without genetic information from intermediate taxa, it is difficult to know how much, or if any saturation has occurred on an observed sequence. Genetic saturation occurs most rapidly on fast-evolving sequences, such as the hypervariable region of mitochondrial DNA, or in short tandem repeats such as on the Y-chromosome.

References