Incomplete lineage sorting

Last updated

Incomplete lineage sorting, [1] [2] [3] also termed hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon in population genetics when ancestral gene copies fail to coalesce (looking backwards in time) into a common ancestral copy until deeper than previous speciation events. It is caused by lineage sorting of genetic polymorphisms that were retained across successive nodes in the species tree. [4] In other words, the tree produced by a single gene differs from the population or species level tree, producing a discordant tree. Whatever the mechanism, the result is that a generated species level tree may differ depending on the selected genes used for assessment. [5] [6] This is in contrast to complete lineage sorting, where the tree produced by the gene is the same as the population or species level tree. Both are common results in phylogenetic analysis, although it depends on the gene, organism, and sampling technique.

Contents

Concept

Figure 1. Incomplete lineage sorting: see the text for an explanation. Hemiplasy example.svg
Figure 1. Incomplete lineage sorting: see the text for an explanation.
Figure 2. Apparent incomplete lineage sorting: see the text for an explanation. Non-hemiplasy example.svg
Figure 2. Apparent incomplete lineage sorting: see the text for an explanation.

The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The persistence of polymorphisms across different speciation events can cause incomplete lineage sorting. Suppose two subsequent speciation events occur where an ancestor species gives rise firstly to species A, and secondly to species B and C. When studying a single gene, it can have multiple versions (alleles) causing different characters to appear (polymorphisms). In the example shown in Figure 1, the gene G has two versions (alleles), G0 and G1. The ancestor of A, B and C originally had only one version of gene G, G0. At some point, a mutation occurred and the ancestral population became polymorphic, with some individuals having G0 and others G1. When species A split off, it retained only G1, while the ancestor of B and C remained polymorphic. When B and C diverged, B retained only G1 and C only G0; neither were now polymorphic in G. The tree for gene G shows A and B as sisters, whereas the species tree shows B and C as sisters. If the phylogeny of these species is based on gene G, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related genes. This is of course a simplified example of incomplete lineage sorting, and in real research it is usually more complex containing more genes and species. [7] [8]

However, other mechanisms can lead to the same apparent discordancy, for example, alleles can move across species boundaries via hybridization, and DNA can be transferred between species by viruses. [9] This is illustrated in Figure 2. Here the ancestor of A, B and C, and the ancestor of B and C, had only the G0 version of gene G. A mutation occurred at the divergence of B and C, and B acquired a mutated version, G1. Some time later, the arrow shows that G1 was transferred from B to A by some means (e.g. hybridization or horizontal gene transfer). Studying only the final states of G in the three species makes it appear that A and B are sisters rather than B and C, as in Figure 1, but in Figure 2 this is not caused by incomplete lineage sorting.

Implications

Incomplete lineage sorting has important implications for phylogenetic research. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics. [10] One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes. [8]

In diploid organisms

Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (i.e. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching. [5]

In primate evolution

When studying primates, chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting. [5] A study of more than 23,000 DNA sequence alignments in the family Hominidae (great apes, including humans) showed that about 23% did not support the known sister relationship of chimpanzees and humans. [9]

In human evolution

In human evolution, incomplete lineage sorting is used to diagram hominin lineages that may have failed to sort out at the same time that speciation occurred in prehistory. [11] Due to the advent of genetic testing and genome sequencing, researchers found that the genetic relationships between hominin lineages might disagree with previous understandings of their relatedness based on physical characteristics. [11] Moreover, divergence of the last common ancestor (LCA) may not necessarily occur at the same time as speciation. [12] Lineage sorting is a method that allows paleoanthropologists to explore the genetic relationships and divergences that may not fit with their previous speciation models based on phylogeny alone. [11]

Incomplete lineage sorting of the human family tree is an area of great interest. There are a number of unknowns when considering both the transition from archaic humans to modern humans and divergence of the other great apes from the hominin lineage. [13]

Ape and hominin / human divergence

Incomplete lineage sorting means that the average divergence time between genes may differ from the divergence time between species. Models suggest that the average divergence time between the genes in the human and chimpanzee genome is older than the split between humans and gorillas. What this means is the common ancestor of humans and chimpanzees has left traces of genetic material that was present in the common ancestor of humans, chimpanzees, and gorillas. [12] However, the genetic tree slightly differs from that of the species or phylogeny tree. [14] In the phylogeny tree when we look at the evolutionary relationship between the human, bonobo chimpanzee, and gorilla, the results show that the separation of bonobo and chimpanzee transpired in a close proximity of time to the common ancestor of the bonobo-chimpanzee ancestor and humans, [12] indicating that humans and chimpanzees shared a common ancestor for several million years after separation from gorillas. This creates the phenomenon that is incomplete lineage sorting. Today researchers are relying on DNA fragments in order to study the evolutionary relationships among humans and their counterparts in the hope that it will provide information about speciation and ancestral processes from genomes from different types of humans. [15]

In viruses

Figure 3. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor's diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes. Incomplete lineage sorting.svg
Figure 3. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor’s diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes.

Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 3 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis. [16]

In linguistics

Jacques and List (2019) [17] show that the concept of incomplete lineage sorting can be applied to account for non-treelike phenomena in language evolution. Kalyan and François (2019), proponents of the method of historical glottometry, a model challenging the applicability of the tree model in historical linguistics, concur that "Historical Glottometry does not challenge the family tree model once incomplete lineage sorting has been taken into account." [18]

See also

Related Research Articles

<i>Ardipithecus</i> Extinct genus of hominins

Ardipithecus is a genus of an extinct hominine that lived during the Late Miocene and Early Pliocene epochs in the Afar Depression, Ethiopia. Originally described as one of the earliest ancestors of humans after they diverged from the chimpanzees, the relation of this genus to human ancestors and whether it is a hominin is now a matter of debate. Two fossil species are described in the literature: A. ramidus, which lived about 4.4 million years ago during the early Pliocene, and A. kadabba, dated to approximately 5.6 million years ago. Initial behavioral analysis indicated that Ardipithecus could be very similar to chimpanzees, however more recent analysis based on canine size and lack of canine sexual dimorphism indicates that Ardipithecus was characterised by reduced aggression, and that they more closely resemble bonobos.

<span class="mw-page-title-main">Phylogenetic tree</span> Branching diagram of evolutionary relationships between organisms

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In another word, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the field of the study for the phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins. The benchmarks for determining the mutation rate are often fossil or archaeological dates. The molecular clock was first tested in 1962 on the hemoglobin protein variants of various animals, and is commonly used in molecular evolution to estimate times of speciation or radiation. It is sometimes called a gene clock or an evolutionary clock.

Anagenesis is the gradual evolution of a species that continues to exist as an interbreeding population. This contrasts with cladogenesis, which occurs when there is branching or splitting, leading to two or more lineages and resulting in separate species. Anagenesis does not always lead to the formation of a new species from an ancestral species. When speciation does occur as different lineages branch off and cease to interbreed, a core group may continue to be defined as the original species. The evolution of this group, without extinction or species selection, is anagenesis.

In biology and genetic genealogy, the most recent common ancestor (MRCA), also known as the last common ancestor (LCA), of a set of organisms is the most recent individual from which all the organisms of the set are descended. The term is also used in reference to the ancestry of groups of genes (haplotypes) rather than organisms.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Hominini</span> Tribe of mammals

The Hominini form a taxonomic tribe of the subfamily Homininae ("hominines"). Hominini includes the extant genera Homo (humans) and Pan and in standard usage excludes the genus Gorilla (gorillas).

<span class="mw-page-title-main">Chimpanzee genome project</span> Effort to determine the DNA sequence of the chimpanzee genome

The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Project.

<span class="mw-page-title-main">Polytomy</span> Multifurcated node of a phylogenetic tree

An internal node of a phylogenetic tree is described as a polytomy or multifurcation if (i) it is in a rooted tree and is linked to three or more child subtrees or (ii) it is in an unrooted tree and is attached to four or more branches. A tree that contains any multifurcations can be described as a multifurcating tree.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

<span class="mw-page-title-main">Boreoeutheria</span> Magnorder of mammals containing Laurasiatheria and Euarchontoglires

Boreoeutheria is a magnorder of placental mammals that groups together superorders Euarchontoglires and Laurasiatheria. With a few exceptions male animals in the clade have a scrotum, an ancestral feature of the clade. The sub-clade Scrotifera was named after this feature.

<span class="mw-page-title-main">Autapomorphy</span> Distinctive feature, known as a derived trait, that is unique to a given taxon

In phylogenetics, an autapomorphy is a distinctive feature, known as a derived trait, that is unique to a given taxon. That is, it is found only in one taxon, but not found in any others or outgroup taxa, not even those most closely related to the focal taxon. It can therefore be considered an apomorphy in relation to a single taxon. The word autapomorphy, introduced in 1950 by German entomologist Willi Hennig, is derived from the Greek words αὐτός, autos "self"; ἀπό, apo "away from"; and μορφή, morphḗ = "shape".

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Human evolutionary genetics studies how one human genome differs from another human genome, the evolutionary past that gave rise to the human genome, and its current effects. Differences between genomes have anthropological, medical, historical and forensic implications and applications. Genetic data can provide important insights into human evolution.

<span class="mw-page-title-main">Hominidae</span> Family of primates

The Hominidae, whose members are known as the great apes or hominids, are a taxonomic family of primates that includes eight extant species in four genera: Pongo ; Gorilla ; Pan ; and Homo, of which only modern humans remain.

<span class="mw-page-title-main">Ghost lineage</span> Phylogenetic lineage that is inferred to exist but has no fossil record

A ghost lineage is a hypothesized ancestor in a species lineage that has left no fossil evidence, but can still be inferred to exist or have existed because of gaps in the fossil record or genomic evidence. The process of determining a ghost lineage relies on fossilized evidence before and after the hypothetical existence of the lineage and extrapolating relationships between organisms based on phylogenetic analysis. Ghost lineages assume unseen diversity in the fossil record and serve as predictions for what the fossil record could eventually yield; these hypotheses can be tested by unearthing new fossils or running phylogenetic analyses.

The chimpanzee–human last common ancestor (CHLCA) is the last common ancestor shared by the extant Homo (human) and Pan genera of Hominini. Estimates of the divergence date vary widely from thirteen to five million years ago.

<span class="mw-page-title-main">Gibbon–human last common ancestor</span> Gibbon–human last common ancestor

The phylogenetic split of the superfamily Hominoidea (apes) into the Hylobatidae (gibbons) and Hominidae families is dated to the early Miocene, roughly 20 to 16 million years ago.

Multispecies Coalescent Process is a stochastic process model that describes the genealogical relationships for a sample of DNA sequences taken from several species. It represents the application of coalescent theory to the case of multiple species. The multispecies coalescent results in cases where the relationships among species for an individual gene can differ from the broader history of the species. It has important implications for the theory and practice of phylogenetics and for understanding genome evolution.

<span class="mw-page-title-main">Phylogenetic reconciliation</span>

In phylogenetics, reconciliation is an approach to connect the history of two or more coevolving biological entities. The general idea of reconciliation is that a phylogenetic tree representing the evolution of an entity can be drawn within another phylogenetic tree representing an encompassing entity to reveal their interdependence and the evolutionary events that have marked their shared history. The development of reconciliation approaches started in the 1980s, mainly to depict the coevolution of a gene and a genome, and of a host and a symbiont, which can be mutualist, commensalist or parasitic. It has also been used for example to detect horizontal gene transfer, or understand the dynamics of genome evolution.

References

  1. Simpson, Michael G (2010-07-19). Plant Systematics. Academic Press. ISBN   9780080922089.
  2. Kuritzin, A; Kischka, T; Schmitz, J; Churakov, G (2016). "Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data". PLOS Computational Biology. 12 (3): e1004812. Bibcode:2016PLSCB..12E4812K. doi: 10.1371/journal.pcbi.1004812 . PMC   4788455 . PMID   26967525.
  3. Suh, A; Smeds, L; Ellegren, H (2015). "The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds". PLOS Biology. 13 (8): e1002224. doi: 10.1371/journal.pbio.1002224 . PMC   4540587 . PMID   26284513.
  4. Maddison, Wayne P. (1997-09-01). Wiens, John J. (ed.). "Gene Trees in Species Trees". Systematic Biology. Oxford University Press (OUP). 46 (3): 523–536. doi: 10.1093/sysbio/46.3.523 . ISSN   1076-836X.
  5. 1 2 3 Rogers, Jeffrey; Gibbs, Richard A. (2014-05-01). "Comparative primate genomics: emerging patterns of genome content and dynamics". Nature Reviews Genetics. 15 (5): 347–359. doi:10.1038/nrg3707. PMC   4113315 . PMID   24709753.
  6. Shen, Xing-Xing; Hittinger, Chris Todd; Rokas, Antonis (2017). "Contentious relationships in phylogenomic studies can be driven by a handful of genes". Nature Ecology & Evolution. 1 (5): 126. doi:10.1038/s41559-017-0126. ISSN   2397-334X. PMC   5560076 . PMID   28812701.
  7. Copetti, Dario; Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L. M.; Childs, Kevin L.; Eguiarte, Luis E.; Lee, Seunghee; Liu, Tiffany L.; McMahon, Michelle M.; Whiteman, Noah K.; Wing, Rod A.; Wojciechowski, Martin F. & Sanderson, Michael J. (2017-11-07). "Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti". Proceedings of the National Academy of Sciences. 114 (45): 12003–12008. Bibcode:2017PNAS..11412003C. doi: 10.1073/pnas.1706367114 . PMC   5692538 . PMID   29078296.
  8. 1 2 Futuyma, Douglas J. (2013-07-15). Evolution (3rd ed.). Sunderland, Massachusetts U.S.A. ISBN   9781605351155. OCLC   824532153.{{cite book}}: CS1 maint: location missing publisher (link)
  9. 1 2 Avise, John C. & Robinson, Terence J. (2008). "Hemiplasy: A New Term in the Lexicon of Phylogenetics". Systematic Biology. 57 (3): 503–507. doi: 10.1080/10635150802164587 . PMID   18570042.
  10. Warnow, Tandy; Bayzid, Md Shamsuzzoha; Mirarab, Siavash (2016-05-01). "Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting". Systematic Biology. 65 (3): 366–380. doi: 10.1093/sysbio/syu063 . ISSN   1063-5157. PMID   25164915.
  11. 1 2 3 Maddison, Wayne P. (1997-09-01). "Gene Trees in Species Trees". Systematic Biology. 46 (3): 523–536. doi: 10.1093/sysbio/46.3.523 . ISSN   1076-836X.
  12. 1 2 3 Mailund, Thomas; Munch, Kasper; Schierup, Mikkel Heide (2014-11-23). "Lineage Sorting in Apes". Annual Review of Genetics. 48 (1): 519–535. doi:10.1146/annurev-genet-120213-092532. ISSN   0066-4197. PMID   25251849.
  13. Nichols, Richard (July 2001). "Gene trees and species trees are not the same". Trends in Ecology & Evolution. 16 (7): 358–364. doi:10.1016/s0169-5347(01)02203-0. ISSN   0169-5347. PMID   11403868.
  14. "Primate Speciation: A Case Study of African Apes | Learn Science at Scitable". www.nature.com. Retrieved 2020-05-30.
  15. Peyrégne, Stéphane; Boyle, Michael James; Dannemann, Michael; Prüfer, Kay (September 2017). "Detecting ancient positive selection in humans using extended lineage sorting". Genome Research. 27 (9): 1563–1572. doi:10.1101/gr.219493.116. ISSN   1088-9051. PMC   5580715 . PMID   28720580.
  16. Leitner, Thomas (May 2019). "Phylogenetics in HIV transmission: taking within-host diversity into account". Current Opinion in HIV and AIDS. 14 (3): 181–187. doi:10.1097/COH.0000000000000536. ISSN   1746-630X. PMC   6449181 . PMID   30920395.
  17. Jacques, Guillaume; List, Johann-Mattis (2019). "Why we need tree models in linguistic reconstruction (and when we should apply them)". Journal of Historical Linguistics. 9 (1): 128–167. doi:10.1075/jhl.17008.mat. hdl: 21.11116/0000-0004-4D2E-4 . ISSN   2210-2116. S2CID   52220491.
  18. Kalyan, Siva; François, Alexandre (2019). "When the waves meet the trees". Journal of Historical Linguistics. 9 (1): 168–177. doi:10.1075/jhl.18019.kal. ISSN   2210-2116. S2CID   198707375.