Part of a series on |
Evolutionary biology |
---|
The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins.
The notion of the existence of a so-called "molecular clock" was first attributed to Émile Zuckerkandl and Linus Pauling who, in 1962, noticed that the number of amino acid differences in hemoglobin between different lineages changes roughly linearly with time, as estimated from fossil evidence. [1] They generalized this observation to assert that the rate of evolutionary change of any specified protein was approximately constant over time and over different lineages (known as the molecular clock hypothesis).
The genetic equidistance phenomenon was first noted in 1963 by Emanuel Margoliash, who wrote: "It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein." [2] For example, the difference between the cytochrome c of a carp and a frog, turtle, chicken, rabbit, and horse is a very constant 13% to 14%. Similarly, the difference between the cytochrome c of a bacterium and yeast, wheat, moth, tuna, pigeon, and horse ranges from 64% to 69%. Together with the work of Emile Zuckerkandl and Linus Pauling, the genetic equidistance result led directly to the formal postulation of the molecular clock hypothesis in the early 1960s. [3]
Similarly, Vincent Sarich and Allan Wilson in 1967 demonstrated that molecular differences among modern Primates in albumin proteins showed that approximately constant rates of change had occurred in all the lineages they assessed. [4] The basic logic of their analysis involved recognizing that if one species lineage had evolved more quickly than a sister species lineage since their common ancestor, then the molecular differences between an outgroup (more distantly related) species and the faster-evolving species should be larger (since more molecular changes would have accumulated on that lineage) than the molecular differences between the outgroup species and the slower-evolving species. This method is known as the relative rate test. Sarich and Wilson's paper reported, for example, that human ( Homo sapiens ) and chimpanzee ( Pan troglodytes ) albumin immunological cross-reactions suggested they were about equally different from Ceboidea (New World Monkey) species (within experimental error). This meant that they had both accumulated approximately equal changes in albumin since their shared common ancestor. This pattern was also found for all the primate comparisons they tested. When calibrated with the few well-documented fossil branch points (such as no Primate fossils of modern aspect found before the K-T boundary), this led Sarich and Wilson to argue that the human-chimp divergence probably occurred only ~4–6 million years ago. [5]
The observation of a clock-like rate of molecular change was originally purely phenomenological. Later, the work of Motoo Kimura [6] developed the neutral theory of molecular evolution, which predicted a molecular clock. Let there be N individuals, and to keep this calculation simple, let the individuals be haploid (i.e. have one copy of each gene). Let the rate of neutral mutations (i.e. mutations with no effect on fitness) in a new individual be . The probability that this new mutation will become fixed in the population is then 1/N, since each copy of the gene is as good as any other. Every generation, each individual can have new mutations, so there are N new neutral mutations in the population as a whole. That means that each generation, new neutral mutations will become fixed. If most changes seen during molecular evolution are neutral, then fixations in a population will accumulate at a clock-rate that is equal to the rate of neutral mutations in an individual.
To use molecular clocks to estimate divergence times, molecular clocks need to be "calibrated". This is because molecular data alone does not contain any information on absolute times. For viral phylogenetics and ancient DNA studies—two areas of evolutionary biology where it is possible to sample sequences over an evolutionary timescale—the dates of the intermediate samples can be used to calibrate the molecular clock. However, most phylogenies require that the molecular clock be calibrated using independent evidence about dates, such as the fossil record. [7] There are two general methods for calibrating the molecular clock using fossils: node calibration and tip calibration. [8]
Sometimes referred to as node dating, node calibration is a method for time-scaling phylogenetic trees by specifying time constraints for one or more nodes in the tree. Early methods of clock calibration only used a single fossil constraint (e.g. non-parametric rate smoothing), [9] but newer methods (BEAST [10] and r8s [11] ) allow for the use of multiple fossils to calibrate molecular clocks. The oldest fossil of a clade is used to constrain the minimum possible age for the node representing the most recent common ancestor of the clade. However, due to incomplete fossil preservation and other factors, clades are typically older than their oldest fossils. [8] In order to account for this, nodes are allowed to be older than the minimum constraint in node calibration analyses. However, determining how much older the node is allowed to be is challenging. There are a number of strategies for deriving the maximum bound for the age of a clade including those based on birth-death models, fossil stratigraphic distribution analyses, or taphonomic controls. [12] Alternatively, instead of a maximum and a minimum, a probability density can be used to represent the uncertainty about the age of the clade. These calibration densities can take the shape of standard probability densities (e.g. normal, lognormal, exponential, gamma) that can be used to express the uncertainty associated with divergence time estimates. [10] Determining the shape and parameters of the probability distribution is not trivial, but there are methods that use not only the oldest fossil but a larger sample of the fossil record of clades to estimate calibration densities empirically. [13] Studies have shown that increasing the number of fossil constraints increases the accuracy of divergence time estimation. [14]
Sometimes referred to as tip dating, tip calibration is a method of molecular clock calibration in which fossils are treated as taxa and placed on the tips of the tree. This is achieved by creating a matrix that includes a molecular dataset for the extant taxa along with a morphological dataset for both the extinct and the extant taxa. [12] Unlike node calibration, this method reconstructs the tree topology and places the fossils simultaneously. Molecular and morphological models work together simultaneously, allowing morphology to inform the placement of fossils. [8] Tip calibration makes use of all relevant fossil taxa during clock calibration, rather than relying on only the oldest fossil of each clade. This method does not rely on the interpretation of negative evidence to infer maximum clade ages. [12]
Demographic changes in populations can be detected as fluctuations in historical coalescent effective population size from a sample of extant genetic variation in the population using coalescent theory. [15] [16] [17] Ancient population expansions that are well documented and dated in the geological record can be used to calibrate a rate of molecular evolution in a manner similar to node calibration. However, instead of calibrating from the known age of a node, expansion calibration uses a two-epoch model of constant population size followed by population growth, with the time of transition between epochs being the parameter of interest for calibration. [18] [19] Expansion calibration works at shorter, intraspecific timescales in comparison to node calibration, because expansions can only be detected after the most recent common ancestor of the species in question. Expansion dating has been used to show that molecular clock rates can be inflated at short timescales [18] (< 1 MY) due to incomplete fixation of alleles, as discussed below [20] [21]
This approach to tip calibration goes a step further by simultaneously estimating fossil placement, topology, and the evolutionary timescale. In this method, the age of a fossil can inform its phylogenetic position in addition to morphology. By allowing all aspects of tree reconstruction to occur simultaneously, the risk of biased results is decreased. [8] This approach has been improved upon by pairing it with different models. One current method of molecular clock calibration is total evidence dating paired with the fossilized birth-death (FBD) model and a model of morphological evolution. [22] The FBD model is novel in that it allows for "sampled ancestors", which are fossil taxa that are the direct ancestor of a living taxon or lineage. This allows fossils to be placed on a branch above an extant organism, rather than being confined to the tips. [23]
Bayesian methods can provide more appropriate estimates of divergence times, especially if large datasets—such as those yielded by phylogenomics—are employed. [24]
Sometimes only a single divergence date can be estimated from fossils, with all other dates inferred from that. Other sets of species have abundant fossils available, allowing the hypothesis of constant divergence rates to be tested. DNA sequences experiencing low levels of negative selection showed divergence rates of 0.7–0.8% per Myr in bacteria, mammals, invertebrates, and plants. [25] In the same study, genomic regions experiencing very high negative or purifying selection (encoding rRNA) were considerably slower (1% per 50 Myr).
In addition to such variation in rate with genomic position, since the early 1990s variation among taxa has proven fertile ground for research too, [26] even over comparatively short periods of evolutionary time (for example mockingbirds [27] ). Tube-nosed seabirds have molecular clocks that on average run at half speed of many other birds, [28] possibly due to long generation times, and many turtles have a molecular clock running at one-eighth the speed it does in small mammals, or even slower. [29] Effects of small population size are also likely to confound molecular clock analyses. Researchers such as Francisco J. Ayala have more fundamentally challenged the molecular clock hypothesis. [30] [31] [32] According to Ayala's 1999 study, five factors combine to limit the application of molecular clock models:
Molecular clock users have developed workaround solutions using a number of statistical approaches including maximum likelihood techniques and later Bayesian modeling. In particular, models that take into account rate variation across lineages have been proposed in order to obtain better estimates of divergence times. These models are called relaxed molecular clocks [33] because they represent an intermediate position between the 'strict' molecular clock hypothesis and Joseph Felsenstein's many-rates model [34] and are made possible through MCMC techniques that explore a weighted range of tree topologies and simultaneously estimate parameters of the chosen substitution model. It must be remembered that divergence dates inferred using a molecular clock are based on statistical inference and not on direct evidence.
The molecular clock runs into particular challenges at very short and very long timescales. At long timescales, the problem is saturation. When enough time has passed, many sites have undergone more than one change, but it is impossible to detect more than one. This means that the observed number of changes is no longer linear with time, but instead flattens out. Even at intermediate genetic distances, with phylogenetic data still sufficient to estimate topology, signal for the overall scale of the tree can be weak under complex likelihood models, leading to highly uncertain molecular clock estimates. [35]
At very short time scales, many differences between samples do not represent fixation of different sequences in the different populations. Instead, they represent alternative alleles that were both present as part of a polymorphism in the common ancestor. The inclusion of differences that have not yet become fixed leads to a potentially dramatic inflation of the apparent rate of the molecular clock at very short timescales. [21] [36]
The molecular clock technique is an important tool in molecular systematics, macroevolution, and phylogenetic comparative methods. Estimation of the dates of phylogenetic events, including those not documented by fossils, such as the divergences between living taxa has allowed the study of macroevolutionary processes in organisms that had limited fossil records. Phylogenetic comparative methods rely heavily on calibrated phylogenies.
Eulipotyphla is an order of mammals suggested by molecular methods of phylogenetic reconstruction, which includes the laurasiatherian members of the now-invalid polyphyletic order Lipotyphla, but not the afrotherian members.
The Batrachia are a clade of amphibians that includes frogs and salamanders, but not caecilians nor the extinct allocaudates. The name Batrachia was first used by French zoologist Pierre André Latreille in 1800 to refer to frogs, but has more recently been defined in a phylogenetic sense as a node-based taxon that includes the last common ancestor of frogs and salamanders and all of its descendants. The idea that frogs and salamanders are more closely related to each other than either is to caecilians is strongly supported by morphological and molecular evidence; they are, for instance, the only vertebrates able to raise and lower their eyes. However, an alternative hypothesis exists in which salamanders and caecilians are each other's closest relatives as part of a clade called the Procera, with frogs positioned as the sister taxon of this group.
In biology, a substitution model, also called models of sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules, such as DNA sequences or protein sequences, that can be represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.
Émile Zuckerkandl was an Austrian-born French biologist considered one of the founders of the field of molecular evolution. He introduced, with Linus Pauling, the concept of the "molecular clock", which enabled the neutral theory of molecular evolution.
In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.
Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.
In genetics, the Ka/Ks ratio, also known as ω or dN/dS ratio, is used to estimate the balance between neutral mutations, purifying selection and beneficial mutations acting on a set of homologous protein-coding genes. It is calculated as the ratio of the number of nonsynonymous substitutions per non-synonymous site (Ka), in a given period of time, to the number of synonymous substitutions per synonymous site (Ks), in the same period. The latter are assumed to be neutral, so that the ratio indicates the net balance between deleterious and beneficial mutations. Values of Ka/Ks significantly above 1 are unlikely to occur without at least some of the mutations being advantageous. If beneficial mutations are assumed to make little contribution, then Ka/Ks estimates the degree of evolutionary constraint.
Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.
Masatoshi Nei was a Japanese-born American evolutionary biologist.
Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or specie to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.
Human evolutionary genetics studies how one human genome differs from another human genome, the evolutionary past that gave rise to the human genome, and its current effects. Differences between genomes have anthropological, medical, historical and forensic implications and applications. Genetic data can provide important insights into human evolution.
The history of molecular evolution starts in the early 20th century with "comparative biochemistry", but the field of molecular evolution came into its own in the 1960s and 1970s, following the rise of molecular biology. The advent of protein sequencing allowed molecular biologists to create phylogenies based on sequence comparison, and to use the differences between homologous sequences as a molecular clock to estimate the time since the last common ancestor. In the late 1960s, the neutral theory of molecular evolution provided a theoretical basis for the molecular clock, though both the clock and the neutral theory were controversial, since most evolutionary biologists held strongly to panselectionism, with natural selection as the only important cause of evolutionary change. After the 1970s, nucleic acid sequencing allowed molecular evolution to reach beyond proteins to highly conserved ribosomal RNA sequences, the foundation of a reconceptualization of the early history of life.
The chimpanzee–human last common ancestor (CHLCA) is the last common ancestor shared by the extant Homo (human) and Pan genera of Hominini. Estimates of the divergence date vary widely from thirteen to five million years ago.
The human mitochondrial molecular clock is the rate at which mutations have been accumulating in the mitochondrial genome of hominids during the course of human evolution. The archeological record of human activity from early periods in human prehistory is relatively limited and its interpretation has been controversial. Because of the uncertainties from the archeological record, scientists have turned to molecular dating techniques in order to refine the timeline of human evolution. A major goal of scientists in the field is to develop an accurate hominid mitochondrial molecular clock which could then be used to confidently date events that occurred during the course of human evolution.
Ziheng Yang FRS is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.
TimeTree is a free public database developed by S. Blair Hedges and Sudhir Kumar, now at Temple University, for presenting times of divergence in the tree of life. The basic concept has been to produce and present a community consensus of the timetree of life from published studies, and allow easy access to that information on the web or mobile device. The database permits searching for average node times between two species or higher taxa, viewing a timeline from the perspective of a taxon, which shows all divergences back to the origin of life, and building a timetree of a chosen taxon or user-submitted group of taxa. TimeTree has been used in public education to conceptualize the evolution of life, such as in high school settings. David Attenborough's Emmy Award-winning film and television program Rise of Animals used Hedges and Kumar's circular timetree of life, generated from the TimeTree database, as a framework for the production. The timetree was brought to life using animated computer-generated imagery in scenes every 10 minutes during the 2-hour movie. The original development of TimeTree, by Hedges and Kumar, dates to the late 1990s, with initial support from NASA Astrobiology Institute. Since then, it has been supported by additional grants from NASA, and by NSF and NIH. The current version (v5) was released in 2022 and contains data from 4,075 studies and 137,306 species.
The relative rate test is a genetic comparative test between two ingroups and an outgroup or “reference species” to compare mutation and evolutionary rates between the species. Each ingroup species is compared independently to the outgroup to determine how closely related the two species are without knowing the exact time of divergence from their closest common ancestor. If more change has occurred on one lineage relative to another lineage since their shared common ancestor, then the outgroup species will be more different from the faster-evolving lineage's species than it is from the slower-evolving lineage's species. This is because the faster-evolving lineage will, by definition, have accumulated more differences since the common ancestor than the slower-evolving lineage. This method can be applied to averaged data, or individual molecules. It is possible for individual molecules to show evidence of approximately constant rates of change in different lineages even while the rates differ between different molecules. The relative rate test is a direct internal test of the molecular clock, for a given molecule and a given set of species, and shows that the molecular clock does not need to be assumed: It can be directly assessed from the data itself. Note that the logic can also be applied to any kind of data for which a distance measure can be defined.
Phaethoquornithes is a clade of birds that contains Eurypygimorphae and Aequornithes, which was first recovered by genome analysis in 2014. Members of Eurypygimorphae were originally classified in the obsolete group Metaves, and Aequornithes were classified as the sister taxon to Musophagiformes or Gruiformes.
The rate of evolution is quantified as the speed of genetic or morphological change in a lineage over a period of time. The speed at which a molecular entity evolves is of considerable interest in evolutionary biology since determining the evolutionary rate is the first step in characterizing its evolution. Calculating rates of evolutionary change is also useful when studying phenotypic changes in phylogenetic comparative biology. In either case, it can be beneficial to consider and compare both genomic data and paleontological data, especially in regards to estimating the timing of divergence events and establishing geological time scales.
Genetic saturation is the result of multiple substitutions at the same site in a sequence, or identical substitutions in different sequences, such that the apparent sequence divergence rate is lower than the actual divergence that has occurred. When comparing two or more genetic sequences consisting of single nucleotides, differences in sequence observed are only differences in the final state of the nucleotide sequence. Single nucleotides that undergoing genetic saturation change multiple times, sometimes back to their original nucleotide or to a nucleotide common to the compared genetic sequence. Without genetic information from intermediate taxa, it is difficult to know how much, or if any saturation has occurred on an observed sequence. Genetic saturation occurs most rapidly on fast-evolving sequences, such as the hypervariable region of mitochondrial DNA, or in short tandem repeats such as on the Y-chromosome.