Incomplete lineage sorting

Last updated

Incomplete lineage sorting (ILS) [1] [2] [3] (also referred to as hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism) is a phenomenon in evolutionary biology and population genetics that results in discordance between species and gene trees. [4] [5] By contrast, complete lineage sorting results in concordant species and gene trees. ILS occurs in the context of a gene in an ancestral species which exists in multiple alleles. If a speciation event occurs in this situation, either complete lineage sorting will occur, and both daughter species will inherit all alleles of the gene in question, or incomplete lineage sorting will occur, when one or both daughter species inherits a subset of alleles present in the parental species. For example, if two alleles of a gene are present and a speciation event occurs, one of the two daughter species might inherit both alleles, but the second daughter species only inherits one of the two alleles. In this case, incomplete lineage sorting has occurred. [6]

Contents

Concept

Figure 1. Incomplete lineage sorting: see the text for an explanation. Hemiplasy example.svg
Figure 1. Incomplete lineage sorting: see the text for an explanation.
Figure 2. Apparent incomplete lineage sorting: see the text for an explanation. Non-hemiplasy example.svg
Figure 2. Apparent incomplete lineage sorting: see the text for an explanation.

The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The persistence of polymorphisms across different speciation events can cause incomplete lineage sorting. Suppose two subsequent speciation events occur where an ancestor species gives rise firstly to species A, and secondly to species B and C. When studying a single gene, it can have multiple versions (alleles) causing different characters to appear (polymorphisms). In the example shown in Figure 1, the gene G has two versions (alleles), G0 and G1. The ancestor of A, B and C originally had only one version of gene G, G0. At some point, a mutation occurred and the ancestral population became polymorphic, with some individuals having G0 and others G1. When species A split off, it retained only G1, while the ancestor of B and C remained polymorphic. When B and C diverged, B retained only G1 and C only G0; neither were now polymorphic in G. The tree for gene G shows A and B as sisters, whereas the species tree shows B and C as sisters. If the phylogeny of these species is based on gene G, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related genes. This is of course a simplified example of incomplete lineage sorting, and in real research it is usually more complex containing more genes and species. [7] [8]

However, other mechanisms can lead to the same apparent discordancy, for example, alleles can move across species boundaries via hybridization, and DNA can be transferred between species by viruses. [9] This is illustrated in Figure 2. Here the ancestor of A, B and C, and the ancestor of B and C, had only the G0 version of gene G. A mutation occurred at the divergence of B and C, and B acquired a mutated version, G1. Some time later, the arrow shows that G1 was transferred from B to A by some means (e.g. hybridization or horizontal gene transfer). Studying only the final states of G in the three species makes it appear that A and B are sisters rather than B and C, as in Figure 1, but in Figure 2 this is not caused by incomplete lineage sorting.

Implications

Incomplete lineage sorting has important implications for phylogenetic research. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics. [10] One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes. [8]

In diploid organisms

Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (i.e. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching. [4]

In primate evolution

When studying primates, chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting. [4] A study of more than 23,000 DNA sequence alignments in the family Hominidae (great apes, including humans) showed that about 23% did not support the known sister relationship of chimpanzees and humans. [9]

In human evolution

In human evolution, incomplete lineage sorting is used to diagram hominin lineages that may have failed to sort out at the same time that speciation occurred in prehistory. [11] Due to the advent of genetic testing and genome sequencing, researchers found that the genetic relationships between hominin lineages might disagree with previous understandings of their relatedness based on physical characteristics. [11] Moreover, divergence of the last common ancestor (LCA) may not necessarily occur at the same time as speciation. [12] Lineage sorting is a method that allows paleoanthropologists to explore the genetic relationships and divergences that may not fit with their previous speciation models based on phylogeny alone. [11]

Incomplete lineage sorting of the human family tree is an area of great interest. There are a number of unknowns when considering both the transition from archaic humans to modern humans and divergence of the other great apes from the hominin lineage. [13]

Ape and hominin / human divergence

Incomplete lineage sorting means that the average divergence time between genes may differ from the divergence time between species. Models suggest that the average divergence time between the genes in the human and chimpanzee genome is older than the split between humans and gorillas. What this means is the common ancestor of humans and chimpanzees has left traces of genetic material that was present in the common ancestor of humans, chimpanzees, and gorillas. [12] However, the genetic tree slightly differs from that of the species or phylogeny tree. [14] In the phylogeny tree when we look at the evolutionary relationship between the human, bonobo chimpanzee, and gorilla, the results show that the separation of bonobo and chimpanzee transpired in a close proximity of time to the common ancestor of the bonobo-chimpanzee ancestor and humans, [12] indicating that humans and chimpanzees shared a common ancestor for several million years after separation from gorillas. This creates the phenomenon that is incomplete lineage sorting. Today researchers are relying on DNA fragments in order to study the evolutionary relationships among humans and their counterparts in the hope that it will provide information about speciation and ancestral processes from genomes from different types of humans. [15]

In viruses

Figure 3. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor's diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes. Incomplete lineage sorting.svg
Figure 3. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor’s diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes.

Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 3 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis. [16]

In linguistics

Jacques and List (2019) [17] show that the concept of incomplete lineage sorting can be applied to account for non-treelike phenomena in language evolution. Kalyan and François (2019), proponents of the method of historical glottometry, a model challenging the applicability of the tree model in historical linguistics, concur that "Historical Glottometry does not challenge the family tree model once incomplete lineage sorting has been taken into account." [18]

See also

Related Research Articles

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In other words, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. In evolutionary biology, all life on Earth is theoretically part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the study of phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

Anagenesis is the gradual evolution of a species that continues to exist as an interbreeding population. This contrasts with cladogenesis, which occurs when there is branching or splitting, leading to two or more lineages and resulting in separate species. Anagenesis does not always lead to the formation of a new species from an ancestral species. When speciation does occur as different lineages branch off and cease to interbreed, a core group may continue to be defined as the original species. The evolution of this group, without extinction or species selection, is anagenesis.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

The humanzee is a hypothetical hybrid of chimpanzee and human, thus a form of human–animal hybrid. Serious attempts to create such a hybrid were made by Soviet biologist Ilya Ivanovich Ivanov in the 1920s, and possibly by researchers in China in the 1960s, though neither succeeded.

<span class="mw-page-title-main">Hominini</span> Tribe of mammals

The Hominini (hominins) form a taxonomic tribe of the subfamily Homininae (hominines). They comprise two extant genera: Homo (humans) and Pan, but in standard usage exclude the genus Gorilla (gorillas), which is grouped separately within subfamily Homininae.

<span class="mw-page-title-main">Chimpanzee genome project</span> Effort to determine the DNA sequence of the chimpanzee genome

The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Project.

<span class="mw-page-title-main">Polytomy</span> Multifurcated node of a phylogenetic tree

An internal node of a phylogenetic tree is described as a polytomy or multifurcation if (i) it is in a rooted tree and is linked to three or more child subtrees or (ii) it is in an unrooted tree and is attached to four or more branches. A tree that contains any multifurcations can be described as a multifurcating tree.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or species to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Human evolutionary genetics studies how one human genome differs from another human genome, the evolutionary past that gave rise to the human genome, and its current effects. Differences between genomes have anthropological, medical, historical and forensic implications and applications. Genetic data can provide important insights into human evolution.

<span class="mw-page-title-main">Bateson–Dobzhansky–Muller model</span> Model of the evolution of genetic incompatibility

The Bateson–Dobzhansky–Muller model, also known as Dobzhansky–Muller model, is a model of the evolution of genetic incompatibility, important in understanding the evolution of reproductive isolation during speciation and the role of natural selection in bringing it about. The theory was first described by William Bateson in 1909, then independently described by Theodosius Dobzhansky in 1934, and later elaborated in different forms by Herman Muller, H. Allen Orr and Sergey Gavrilets.

<span class="mw-page-title-main">Ghost lineage</span> Phylogenetic lineage that is inferred to exist but has no fossil record

A ghost lineage is a hypothesized ancestor in a species lineage that has left no fossil evidence, but can still be inferred to exist or have existed because of gaps in the fossil record or genomic evidence. The process of determining a ghost lineage relies on fossilized evidence before and after the hypothetical existence of the lineage and extrapolating relationships between organisms based on phylogenetic analysis. Ghost lineages assume unseen diversity in the fossil record and serve as predictions for what the fossil record could eventually yield; these hypotheses can be tested by unearthing new fossils or running phylogenetic analyses.

The chimpanzee–human last common ancestor (CHLCA) is the last common ancestor shared by the extant Homo (human) and Pan genera of Hominini. Estimates of the divergence date vary widely from thirteen to five million years ago.

A genetic lineage includes all descendants of a given genetic sequence, typically following a new mutation. It is not the same as an allele because it excludes cases where different mutations give rise to the same allele, and includes descendants that differ from the ancestor by one or more mutations. The genetic sequence can be of different sizes, e.g. a single gene or a haplotype containing multiple adjacent genes along a chromosome. Given recombination, each gene can have a separate genetic lineages, even as the population shares a single organismal lineage. In asexual microbes or somatic cells, cell lineages exactly match genetic lineages, and can be traced.

<span class="mw-page-title-main">Gibbon–human last common ancestor</span>

The phylogenetic split of the superfamily Hominoidea (apes) into the Hylobatidae (gibbons) and Hominidae families is dated to the early Miocene, roughly 20 to 16 million years ago.

Multispecies Coalescent Process is a stochastic process model that describes the genealogical relationships for a sample of DNA sequences taken from several species. It represents the application of coalescent theory to the case of multiple species. The multispecies coalescent results in cases where the relationships among species for an individual gene can differ from the broader history of the species. It has important implications for the theory and practice of phylogenetics and for understanding genome evolution.

This glossary of genetics and evolutionary biology is a list of definitions of terms and concepts used in the study of genetics and evolutionary biology, as well as sub-disciplines and related fields, with an emphasis on classical genetics, quantitative genetics, population biology, phylogenetics, speciation, and systematics. It has been designed as a companion to Glossary of cellular and molecular biology, which contains many overlapping and related terms; other related glossaries include Glossary of biology and Glossary of ecology.

Eukaryote hybrid genomes result from interspecific hybridization, where closely related species mate and produce offspring with admixed genomes. The advent of large-scale genomic sequencing has shown that hybridization is common, and that it may represent an important source of novel variation. Although most interspecific hybrids are sterile or less fit than their parents, some may survive and reproduce, enabling the transfer of adaptive variants across the species boundary, and even result in the formation of novel evolutionary lineages. There are two main variants of hybrid species genomes: allopolyploid, which have one full chromosome set from each parent species, and homoploid, which are a mosaic of the parent species genomes with no increase in chromosome number.

<span class="mw-page-title-main">Phylogenetic reconciliation</span> Technique in evolutionary study

In phylogenetics, reconciliation is an approach to connect the history of two or more coevolving biological entities. The general idea of reconciliation is that a phylogenetic tree representing the evolution of an entity can be drawn within another phylogenetic tree representing an encompassing entity to reveal their interdependence and the evolutionary events that have marked their shared history. The development of reconciliation approaches started in the 1980s, mainly to depict the coevolution of a gene and a genome, and of a host and a symbiont, which can be mutualist, commensalist or parasitic. It has also been used for example to detect horizontal gene transfer, or understand the dynamics of genome evolution.

<span class="mw-page-title-main">Panina</span> Subtribe of mammals

Panina is a subtribe of tribe Hominini; it comprises all descendants of the human-chimpanzee last common ancestor (LCA) that are not of the branch of human lineage—that is, all those ancestors of the type genus Pan,. This split/divergence occurred around 8 to 6 mya, which compares with a range of other estimates for this event—likely extended by periods of hybridization—of from 15 to 3 mya. Fossils from this subtribe are typically rare because they tend to live in environments with poor fossilization. Some of the earliest chimpanzee fossils are 500,000 years of age.

References

  1. Simpson, Michael G (2010-07-19). Plant Systematics. Academic Press. ISBN   9780080922089.
  2. Kuritzin, A; Kischka, T; Schmitz, J; Churakov, G (2016). "Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data". PLOS Computational Biology. 12 (3): e1004812. Bibcode:2016PLSCB..12E4812K. doi: 10.1371/journal.pcbi.1004812 . PMC   4788455 . PMID   26967525.
  3. Suh, A; Smeds, L; Ellegren, H (2015). "The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds". PLOS Biology. 13 (8): e1002224. doi: 10.1371/journal.pbio.1002224 . PMC   4540587 . PMID   26284513.
  4. 1 2 3 Rogers, Jeffrey; Gibbs, Richard A. (2014-05-01). "Comparative primate genomics: emerging patterns of genome content and dynamics". Nature Reviews Genetics. 15 (5): 347–359. doi:10.1038/nrg3707. PMC   4113315 . PMID   24709753.
  5. Shen, Xing-Xing; Hittinger, Chris Todd; Rokas, Antonis (2017). "Contentious relationships in phylogenomic studies can be driven by a handful of genes". Nature Ecology & Evolution. 1 (5): 126. doi:10.1038/s41559-017-0126. ISSN   2397-334X. PMC   5560076 . PMID   28812701.
  6. Maddison, Wayne P. (1997-09-01). Wiens, John J. (ed.). "Gene Trees in Species Trees". Systematic Biology. 46 (3). Oxford University Press (OUP): 523–536. doi: 10.1093/sysbio/46.3.523 . ISSN   1076-836X.
  7. Copetti, Dario; Búrquez, Alberto; Bustamante, Enriquena; Charboneau, Joseph L. M.; Childs, Kevin L.; Eguiarte, Luis E.; Lee, Seunghee; Liu, Tiffany L.; McMahon, Michelle M.; Whiteman, Noah K.; Wing, Rod A.; Wojciechowski, Martin F. & Sanderson, Michael J. (2017-11-07). "Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti". Proceedings of the National Academy of Sciences. 114 (45): 12003–12008. Bibcode:2017PNAS..11412003C. doi: 10.1073/pnas.1706367114 . PMC   5692538 . PMID   29078296.
  8. 1 2 Futuyma, Douglas J. (2013-07-15). Evolution (3rd ed.). Sunderland, Massachusetts U.S.A. ISBN   9781605351155. OCLC   824532153.{{cite book}}: CS1 maint: location missing publisher (link)
  9. 1 2 Avise, John C. & Robinson, Terence J. (2008). "Hemiplasy: A New Term in the Lexicon of Phylogenetics". Systematic Biology. 57 (3): 503–507. doi: 10.1080/10635150802164587 . PMID   18570042.
  10. Warnow, Tandy; Bayzid, Md Shamsuzzoha; Mirarab, Siavash (2016-05-01). "Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting". Systematic Biology. 65 (3): 366–380. doi: 10.1093/sysbio/syu063 . ISSN   1063-5157. PMID   25164915.
  11. 1 2 3 Maddison, Wayne P. (1997-09-01). "Gene Trees in Species Trees". Systematic Biology. 46 (3): 523–536. doi: 10.1093/sysbio/46.3.523 . ISSN   1076-836X.
  12. 1 2 3 Mailund, Thomas; Munch, Kasper; Schierup, Mikkel Heide (2014-11-23). "Lineage Sorting in Apes". Annual Review of Genetics. 48 (1): 519–535. doi:10.1146/annurev-genet-120213-092532. ISSN   0066-4197. PMID   25251849.
  13. Nichols, Richard (July 2001). "Gene trees and species trees are not the same". Trends in Ecology & Evolution. 16 (7): 358–364. doi:10.1016/s0169-5347(01)02203-0. ISSN   0169-5347. PMID   11403868.
  14. "Primate Speciation: A Case Study of African Apes | Learn Science at Scitable". www.nature.com. Retrieved 2020-05-30.
  15. Peyrégne, Stéphane; Boyle, Michael James; Dannemann, Michael; Prüfer, Kay (September 2017). "Detecting ancient positive selection in humans using extended lineage sorting". Genome Research. 27 (9): 1563–1572. doi:10.1101/gr.219493.116. ISSN   1088-9051. PMC   5580715 . PMID   28720580.
  16. Leitner, Thomas (May 2019). "Phylogenetics in HIV transmission: taking within-host diversity into account". Current Opinion in HIV and AIDS. 14 (3): 181–187. doi:10.1097/COH.0000000000000536. ISSN   1746-630X. PMC   6449181 . PMID   30920395.
  17. Jacques, Guillaume; List, Johann-Mattis (2019). "Why we need tree models in linguistic reconstruction (and when we should apply them)". Journal of Historical Linguistics. 9 (1): 128–167. doi:10.1075/jhl.17008.mat. hdl: 21.11116/0000-0004-4D2E-4 . ISSN   2210-2116. S2CID   52220491.
  18. Kalyan, Siva; François, Alexandre (2019). "When the waves meet the trees". Journal of Historical Linguistics. 9 (1): 168–177. doi:10.1075/jhl.18019.kal. ISSN   2210-2116. S2CID   198707375.