Last updated

In biology, phylogenetics ( /ˌfləˈnɛtɪks,-lə-/ ) [1] [2] [3] is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by phylogenetic inference methods that focus on observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms. [4]


The tips of a phylogenetic tree can be living taxa or fossils, and represent the "end" or the present time in an evolutionary lineage. A phylogenetic diagram can be rooted or unrooted. A rooted tree diagram indicates the hypothetical common ancestor of the tree. An unrooted tree diagram (a network) makes no assumption about the ancestral line, and does not show the origin or "root" of the taxa in question or the direction of inferred evolutionary transformations. [5]

In addition to their use for inferring phylogenetic patterns among taxa, phylogenetic analyses are often employed to represent relationships among genes or individual organisms. Such uses have become central to understanding biodiversity, evolution, ecology, and genomes.

Phylogenetics is component of systematics that uses similarities and differences of the characteristics of species to interpret their evolutionary relationships and origins. Phylogenetics focuses on whether the characteristics of a species reinforce a phylogenetic inference that it diverged from the most recent common ancestor of a taxonomic group. [6]

In the field of cancer research, phylogenetics can be used to study the clonal evolution of tumors and molecular chronology, predicting and showing how cell populations vary throughout the progression of the disease and during treatment, using whole genome sequencing techniques. [7] The evolutionary processes behind cancer progression are quite different from those in species and are important to phylogenetic inference; these differences manifest in at least four areas: the types of aberrations that occur, the rates of mutation, the intensity, and high heterogeneity - variability - of tumor cell subclones. [8]

Phylogenetics can also aid in drug design and discovery. Phylogenetics allows scientists to organize species and can show which species are likely to have inherited particular traits that are medically useful, such as producing biologically active compounds - those that have effects on the human body. For example, in drug discovery, venom-producing animals are particularly useful. Venoms from these animals produce several important drugs, e.g., ACE inhibitors and Prialt (Ziconotide). To find new venoms, scientists turn to phylogenetics to screen for closely related species that may have the same useful traits. The phylogenetic tree shows which species of fish have an origin of venom, and related fish they may contain the trait. Using this approach in studying venomous fish, biologists are able to identify the fish species that may be venomous. Biologist have used this approach in many species such as snakes and lizards. [9] In forensic science, phylogenetic tools are useful to assess DNA evidence for court cases. The simple phylogenetic tree of viruses A-E shows the relationships between viruses e.g., all viruses are descendants of Virus A.

HIV forensics uses phylogenetic analysis to track the differences in HIV genes and determine the relatedness of two samples. Phylogenetic analysis has been used in criminal trials to exonerate or hold individuals. HIV forensics does have its limitations, i.e., it cannot be the sole proof of transmission between individuals and phylogenetic analysis which shows transmission relatedness does not indicate direction of transmission. [10]

Taxonomy and classification

One small clade of fish, showing how venom has evolved multiple times. Clade of the fish tree of life.png
One small clade of fish, showing how venom has evolved multiple times.

Taxonomy is the identification, naming, and classification of organisms. Compared to systemization, classification emphasizes whether a species has characteristics of a taxonomic group. [6] The Linnaean classification system developed in the 1700s by Carolus Linnaeus is the foundation for modern classification methods. Linnaean classification relies on an organism's phenotype or physical characteristics to group and organize species. [11] With the emergence of biochemistry, organism classifications are now usually based on phylogenetic data, and many systematists contend that only monophyletic taxa should be recognized as named groups. The degree to which classification depends on inferred evolutionary history differs depending on the school of taxonomy: phenetics ignores phylogenetic speculation altogether, trying to represent the similarity between organisms instead; cladistics (phylogenetic systematics) tries to reflect phylogeny in its classifications by only recognizing groups based on shared, derived characters (synapomorphies); evolutionary taxonomy tries to take into account both the branching pattern and "degree of difference" to find a compromise between them.

Inference of a phylogenetic tree

Usual methods of phylogenetic inference involve computational approaches implementing the optimality criteria and methods of parsimony, maximum likelihood (ML), and MCMC-based Bayesian inference. All these depend upon an implicit or explicit mathematical model describing the evolution of characters observed. [12]

Phenetics, popular in the mid-20th century but now largely obsolete, used distance matrix-based methods to construct trees based on overall similarity in morphology or similar observable traits (i.e. in the phenotype or the overall similarity of DNA, not the DNA sequence), which was often assumed to approximate phylogenetic relationships.

Prior to 1950, phylogenetic inferences were generally presented as narrative scenarios. Such methods are often ambiguous and lack explicit criteria for evaluating alternative hypotheses. [13] [14] [15]

Impacts of taxon sampling

In phylogenetic analysis, taxon sampling selects a small group of taxa to represent the evolutionary history of its broader population. [16] This process is also known as stratified sampling or clade-based sampling. [17] The practice occurs given limited resources to compare and analyze every species within a target population. [16] Based on the representative group selected, the construction and accuracy of phylogenetic trees vary, which impacts derived phylogenetic inferences. [17]

Unavailable datasets, such as an organism's incomplete DNA and protein amino acid sequences in genomic databases, directly restrict taxonomic sampling. [17] Consequently, a significant source of error within phylogenetic analysis occurs due to inadequate taxon samples. Accuracy may be improved by increasing the number of genetic samples within its monophyletic group. Conversely, increasing sampling from outgroups extraneous to the target stratified population may decrease accuracy. Long branch attraction is an attributed theory for this occurrence, where nonrelated branches are incorrectly classified together, insinuating a shared evolutionary history. [16]

Percentage of inter-ordinal branches reconstructed with a constant number of bases and four phylogenetic tree construction models; neighbor-joining (NJ), minimum evolution (ME), unweighted maximum parsimony (MP), and maximum likelihood (ML). Demonstrates phylogenetic analysis with fewer taxa and more genes per taxon matches more often with the replicable consensus tree. The dotted line demonstrates an equal accuracy increase between the two taxon sampling methods. Figure is property of Michael S. Rosenberg and Sudhir Kumar as presented in the journal article Taxon Sampling, Bioinformatics, and Phylogenomics. Accuracy increase sites per taxon.png
Percentage of inter-ordinal branches reconstructed with a constant number of bases and four phylogenetic tree construction models; neighbor-joining (NJ), minimum evolution (ME), unweighted maximum parsimony (MP), and maximum likelihood (ML). Demonstrates phylogenetic analysis with fewer taxa and more genes per taxon matches more often with the replicable consensus tree. The dotted line demonstrates an equal accuracy increase between the two taxon sampling methods. Figure is property of Michael S. Rosenberg and Sudhir Kumar as presented in the journal article Taxon Sampling, Bioinformatics, and Phylogenomics.

There are debates if increasing the number of taxa sampled improves phylogenetic accuracy more than increasing the number of genes sampled per taxon. Differences in each method's sampling impact the number of nucleotide sites utilized in a sequence alignment, which may contribute to disagreements. For example, phylogenetic trees constructed utilizing a more significant number of total nucleotides are generally more accurate, as supported by phylogenetic trees' bootstrapping replicability from random sampling.

The graphic presented in Taxon Sampling, Bioinformatics, and Phylogenomics, compares the correctness of phylogenetic trees generated using fewer taxa and more sites per taxon on the x-axis to more taxa and fewer sites per taxon on the y-axis. With fewer taxa, more genes are sampled amongst the taxonomic group; in comparison, with more taxa added to the taxonomic sampling group, fewer genes are sampled. Each method has the same total number of nucleotide sites sampled. Furthermore, the dotted line represents a 1:1 accuracy between the two sampling methods. As seen in the graphic, most of the plotted points are located below the dotted line, which indicates gravitation toward increased accuracy when sampling fewer taxa with more sites per taxon. The research performed utilizes four different phylogenetic tree construction models to verify the theory; neighbor-joining (NJ), minimum evolution (ME), unweighted maximum parsimony (MP), and maximum likelihood (ML). In the majority of models, sampling fewer taxon with more sites per taxon demonstrated higher accuracy.

Generally, with the alignment of a relatively equal number of total nucleotide sites, sampling more genes per taxon has higher bootstrapping replicability than sampling more taxa. However, unbalanced datasets within genomic databases make increasing the gene comparison per taxon in uncommonly sampled organisms increasingly difficult. [17]



The term "phylogeny" derives from the German Phylogenie, introduced by Haeckel in 1866, [18] and the Darwinian approach to classification became known as the "phyletic" approach. [19] It can be traced back to Aristotle, who wrote in his Posterior Analytics , "We may assume the superiority ceteris paribus [other things being equal] of the demonstration which derives from fewer postulates or hypotheses."

Ernst Haeckel's recapitulation theory

The modern concept of phylogenetics evolved primarily as a disproof of a previously widely accepted theory. During the late 19th century, Ernst Haeckel's recapitulation theory, or "biogenetic fundamental law", was widely accepted. It was often expressed as "ontogeny recapitulates phylogeny", i.e. the development of a single organism during its lifetime, from germ to adult, successively mirrors the adult stages of successive ancestors of the species to which it belongs. But this theory has long been rejected. [20] [21] Instead, ontogeny evolves  – the phylogenetic history of a species cannot be read directly from its ontogeny, as Haeckel thought would be possible, but characters from ontogeny can be (and have been) used as data for phylogenetic analyses; the more closely related two species are, the more apomorphies their embryos share.

Timeline of key points

Branching tree diagram from Heinrich Georg Bronn's work (1858) Bronn tree.gif
Branching tree diagram from Heinrich Georg Bronn's work (1858)
Phylogenetic tree suggested by Haeckel (1866) Haeckel arbol bn.png
Phylogenetic tree suggested by Haeckel (1866)

Outside biology

Phylogeny of Indo-European languages A-phylogeny-of-the-Indo-European-languages-showing-several-of-the-major-groups-and-the.png
Phylogeny of Indo-European languages

Phylogenetic tools and representations (trees and networks) can also be applied to philology, the study of the evolution of oral languages and written text and manuscripts, such as in the field of quantitative comparative linguistics. [79]

Computational phylogenetics can be used to investigate a language as an evolutionary system. The evolution of human language closely corresponds with human's biological evolution which allows phylogenetic methods to be applied. The concept of a "tree" serves as an efficient way to represent relationships between languages and language splits. It also serves as a way of testing hypotheses about the connections and ages of language families. For example, relationships among languages can be shown by using cognates as characters. [80] [81] The phylogenetic tree of Indo-European languages shows the relationships between several of the languages in a timeline, as well as the similarity between words and word order.

There are three types of criticisms about using phylogenetics in philology, the first arguing that languages and species are different entities, therefore you can not use the same methods to study both. The second being how phylogenetic methods are being applied to linguistic data. And the third, discusses the types of data that is being used to construct the trees. [80]

Bayesian phylogenetic methods, which are sensitive to how treelike the data is, allow for the reconstruction of relationships among languages, locally and globally. The main two reasons for the use of Bayesian phylogenetics are that (1) diverse scenarios can be included in calculations and (2) the output is a sample of trees and not a single tree with true claim. [82]

The same process can be applied to texts and manuscripts. In Paleography, the study of historical writings and manuscripts, texts were replicated by scribes who copied from their source and alterations - i.e., 'mutations' - occurred when the scribe did not precisely copy the source. [83]

Phylogenetic screens

Phylogenetic screens involve the pharmacological examination of closely related groups of organisms. Advances in cladistics analysis through rapid computer programs and molecular techniques have improved the precision of phylogenetic determination, allowing for the identification of species with pharmacological potential.

Phylogenetic screens have been used in a rudimentary manner in the past, such as studying the Apocynaceae family of plants known for their alkaloid-producing species like Catharanthus, which produces vincristine, an antileukemia drug. However, modern techniques now enable researchers to study close relatives of a species to uncover either (1) higher abundance of important bioactive compounds (e.g., species of Taxus for taxol) or (2) natural variants of known pharmaceuticals (e.g., species of Catharanthus for different forms of vincristine or vinblastine.[ citation needed ]

Looking at Fig 6. it contains the phylogenetic screen of biodiversity within the fungi family. As seen inside the circle there are subtrees present that were done via phylogenetic analysis. These relations help understand the evolutionary history of various groups of organisms, identifying relationships between different species, and predicting future evolutionary changes. If we were to take biodiversity information from existing knowledge there might be relations between species or subgroups that we didn't know. But with emerging imagery systems and new analysis techniques more genetic relation can be found in biodiverse fields. The image below can help with conservation efforts as there are rare species of fungi involved, that could be beneficial to ecosystems all around. [84]

Phylogenetic Subtree of fungi containing different biodiverse sections of the fungi group. Fig. S6. Phylogenetic subtree of P4ATPase in Fungi. Blue- Ascomycota; Red- Basidiomycota; Green- Zygomycota; Cyan- Chytridiomycota; Orange- Entomophthoromycota; Pink- Mucoromycota and Purple- Glomeromycota..jpg
Phylogenetic Subtree of fungi containing different biodiverse sections of the fungi group.

Phylogenetic tree shapes

Whole-genome sequence data of pathogens obtained from outbreaks or epidemics of infectious diseases can provide important insights into transmission dynamics and inform public health strategies. Previous studies have relied on integrating genomic and epidemiological data to reconstruct transmission events. However, recent research has explored the possibility of deducing transmission patterns solely from genomic data using phylodynamics, which involves analyzing the properties of pathogen phylogenies. Phylodynamics uses theoretical models to compare predicted branch lengths with actual branch lengths in phylogenies to infer transmission patterns. Additionally, coalescent theory, which describes probability distributions on trees based on population size, has been adapted for epidemiological purposes. Another potential source of information within phylogenies that has been explored is "tree shape". These approaches are computationally intensive but have the potential to provide valuable insights into pathogen transmission dynamics. [85]

Pathogen Transmission Trees WebTree.jpg
Pathogen Transmission Trees

The structure of the host contact network has a profound impact on the dynamics of outbreaks or epidemics, and outbreak management strategies rely on the type of transmission patterns driving the outbreak. One can expect that pathogen genomes spreading through different contact network structures, such as chains, homogenous networks, or networks with super-spreaders, would accumulate mutations in distinct patterns, resulting in noticeable differences in the shape of phylogenetic trees, as illustrated in Fig. 1. Analyzation of the structural characteristics of phylogenetic trees generated from simulated bacterial genome evolution across multiple types of contact networks  was conducted. Simple topological properties of phylogenetic trees that, when combined, can be used to classify trees into chain-like, homogenous, or super-spreading dynamics, revealing transmission dynamics. These properties form the basis of a computational classifier are used to classify real-world outbreaks. Remarkably, the computational predictions of overall transmission dynamics for each outbreak align with known epidemiology [86]

Graphical Representation of Phylogenetic Tree analysis GraphRepresentation.jpg
Graphical Representation of Phylogenetic Tree analysis

Different transmission networks result in quantitatively different tree shapes to determine whether tree shapes captured information about the underlying disease transmission patterns within an outbreak, we simulated evolution of a bacterial genome over three types of outbreak contact network—homogenous, super-spreading and chain—and summarized the resulting phylogenies with five metrics describing tree shape. Figures 2 and 3 illustrate the distributions of these metrics across the three types of outbreaks, revealing clear differences in tree topology depending on the underlying host contact network. Super-spreader networks gave rise to phylogenies with higher Colless imbalance, longer ladder patterns, lower Δw and deeper trees than transmission networks with a homogeneous distribution of contacts. Trees derived from chain-like networks were less variable, deeper, more imbalanced and narrower than the other trees. Other topological summary metrics considered did not resolve the three outbreak types as fully (Supplementary Information). Scatter plots can be used for pathogen transmission analysis to visualize the relationship between two variables, such as the number of infected individuals and the time since infection. For example, a scatter plot can be used to examine the relationship between the number of cases of a pathogen and the amount of time since the first case was reported. This can help to identify trends and patterns in the data, such as whether the spread of the pathogen is increasing or decreasing over time. Scatter plots can also be used to identify any outliers or clusters of data points, which can provide insight into potential transmission routes or super-spreader events. Overall, scatter plots can be a useful tool in pathogen transmission analysis to identify patterns and trends in the data, and to inform public health interventions and control measures. [86]

Pathogen Transfer Box Plot data BoxPlots.jpg
Pathogen Transfer Box Plot data

The box plot imagery on the right displays the pathogen transformation data. Box plots are often used in statistical analysis to compare different groups or to visualize changes in a single group over time. They are particularly useful when dealing with large datasets or when comparing several groups, as they can quickly highlight differences or similarities in the data. Box plots, also known as box-and-whisker plots, are useful in statistical analysis to provide a summary of the distribution of a dataset. They display the range, median, quartiles, and potential outliers of the data in a visual manner. Box plots are commonly used to compare different groups or to analyze changes in a single group over time. They are especially helpful when working with large datasets or when comparing multiple groups, as they can easily identify any differences or similarities in the data. This makes box plots a valuable tool for analyzing pathogen transmission data, as they can help to identify important features in the distribution of the data. [86]

See also

Related Research Articles

Cladistics is an approach to biological classification in which organisms are categorized in groups ("clades") based on hypotheses of most recent common ancestry. The evidence for hypothesized relationships is typically shared derived characteristics (synapomorphies) that are not present in more distant groups and ancestors. However, from an empirical perspective, common ancestors are inferences based on a cladistic hypothesis of relationships of taxa whose character states can be observed. Theoretically, a last common ancestor and all its descendants constitute a (minimal) clade. Importantly, all descendants stay in their overarching ancestral clade. For example, if the terms worms or fishes were used within a strict cladistic framework, these terms would include humans. Many of these terms are normally used paraphyletically, outside of cladistics, e.g. as a 'grade', which are fruitless to precisely delineate, especially when including extinct species. Radiation results in the generation of new subclades by bifurcation, but in practice sexual hybridization may blur very closely related groupings.

<span class="mw-page-title-main">Cladogram</span> Diagram used to show relations among groups of organisms with common origins

A cladogram is a diagram used in cladistics to show relations among organisms. A cladogram is not, however, an evolutionary tree because it does not show how ancestors are related to descendants, nor does it show how much they have changed, so many differing evolutionary trees can be consistent with the same cladogram. A cladogram uses lines that branch off in different directions ending at a clade, a group of organisms with a last common ancestor. There are many shapes of cladograms but they all have lines that branch off from other lines. The lines can be traced back to where they branch off. These branching off points represent a hypothetical ancestor which can be inferred to exhibit the traits shared among the terminal taxa above it. This hypothetical ancestor might then provide clues about the order of evolution of various features, adaptation, and other evolutionary narratives about ancestors. Although traditionally such cladograms were generated largely on the basis of morphological characters, DNA and RNA sequencing data and computational phylogenetics are now very commonly used in the generation of cladograms, either on their own or in combination with morphology.

<span class="mw-page-title-main">Phylogenetic tree</span> Branching diagram of evolutionary relationships between organisms

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In other words, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. In evolutionary biology, all life on Earth is theoretically part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the study of phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

Molecular phylogenetics is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominantly in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

<span class="mw-page-title-main">Phylogenesis</span>

Phylogenesis is the biological process by which a taxon appears. The science that studies these processes is called phylogenetics.

<span class="mw-page-title-main">Outgroup (cladistics)</span>

In cladistics or phylogenetics, an outgroup is a more distantly related group of organisms that serves as a reference group when determining the evolutionary relationships of the ingroup, the set of organisms under study, and is distinct from sociological outgroups. The outgroup is used as a point of comparison for the ingroup and specifically allows for the phylogeny to be rooted. Because the polarity (direction) of character change can be determined only on a rooted phylogeny, the choice of outgroup is essential for understanding the evolution of traits along a phylogeny.

In phylogenetics and computational phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy. In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

<span class="mw-page-title-main">Substitution model</span> Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of DNA sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or specie to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Phylogenetic comparative methods (PCMs) use information on the historical relationships of lineages (phylogenies) to test evolutionary hypotheses. The comparative method has a long history in evolutionary biology; indeed, Charles Darwin used differences and similarities between species as a major source of evidence in The Origin of Species. However, the fact that closely related lineages share many traits and trait combinations as a result of the process of descent with modification means that lineages are not independent. This realization inspired the development of explicitly phylogenetic comparative methods. Initially, these methods were primarily developed to control for phylogenetic history when testing for adaptation; however, in recent years the use of the term has broadened to include any use of phylogenies in statistical tests. Although most studies that employ PCMs focus on extant organisms, many methods can also be applied to extinct taxa and can incorporate information from the fossil record.

Wayne Paul Maddison, is a professor and Canada Research Chair in Biodiversity at the departments of zoology and botany at the University of British Columbia, and the Director of the Spencer Entomological Collection at the Beaty Biodiversity Museum.

Implied weighting describes a group of methods used in phylogenetic analysis to assign the greatest importance to characters that are most likely to be homologous. These are a posteriori methods, which include also dynamic weighting, as opposed to a priori methods, which include adaptive, independent, and chemical categories.

Cross-species transmission (CST), also called interspecies transmission, host jump, or spillover, is the transmission of an infectious pathogen, such as a virus, between hosts belonging to different species. Once introduced into an individual of a new host species, the pathogen may cause disease for the new host and/or acquire the ability to infect other individuals of the same species, allowing it to spread through the new host population. The phenomenon is most commonly studied in virology, but cross-species transmission may also occur with bacterial pathogens or other types of microorganisms.

<span class="mw-page-title-main">Character evolution</span>

Character evolution is the process by which a character or trait evolves along the branches of an evolutionary tree. Character evolution usually refers to single changes within a lineage that make this lineage unique from others. These changes are called character state changes and they are often used in the study of evolution to provide a record of common ancestry. Character state changes can be phenotypic changes, nucleotide substitutions, or amino acid substitutions. These small changes in a species can be identifying features of when exactly a new lineage diverged from an old one.

Minimum evolution is a distance method employed in phylogenetics modeling. It shares with maximum parsimony the aspect of searching for the phylogeny that has the shortest total sum of branch lengths.

<span class="mw-page-title-main">Phylogenetic reconciliation</span> Technique in evolutionary study

In phylogenetics, reconciliation is an approach to connect the history of two or more coevolving biological entities. The general idea of reconciliation is that a phylogenetic tree representing the evolution of an entity can be drawn within another phylogenetic tree representing an encompassing entity to reveal their interdependence and the evolutionary events that have marked their shared history. The development of reconciliation approaches started in the 1980s, mainly to depict the coevolution of a gene and a genome, and of a host and a symbiont, which can be mutualist, commensalist or parasitic. It has also been used for example to detect horizontal gene transfer, or understand the dynamics of genome evolution.


  1. "phylogenetic". Unabridged (Online). n.d.
  2. "phylogenetic". Dictionary .
  3. from Greek φυλή/φῦλον [phylé/phylon] "tribe, clan, race", and γενετικός [genetikós] "origin, source, birth") Liddell, Henry George; Scott, Robert; Jones, Henry Stuart (1968). A Greek-English lexicon (9 ed.). Oxford: Clarendon Press. p. 1961.
  4. "phylogeny". Biology online. Retrieved 15 February 2013.
  5. Itzik, Peer (1 January 2001). "Phylogenetic Trees".
  6. 1 2 Harris, Katherine (23 June 2019). Taxonomy & Phylogeny. Biology LibreTexts. Retrieved 19 April 2023.
  7. Herberts, Cameron; Annala, Matti; Sipola, Joonatan; Ng, Sarah W. S.; Chen, Xinyi E.; Nurminen, Anssi; Korhonen, Olga V.; Munzur, Aslı D.; Beja, Kevin; Schönlau, Elena; Bernales, Cecily Q.; Ritch, Elie; Bacon, Jack V. W.; Lack, Nathan A.; Nykter, Matti (August 2022). "Deep whole-genome ctDNA chronology of treatment-resistant prostate cancer". Nature. 608 (7921): 199–208. Bibcode:2022Natur.608..199H. doi:10.1038/s41586-022-04975-9. ISSN   1476-4687. PMID   35859180. S2CID   250730778.
  8. Schwartz, Russell; Schäffer, Alejandro A. (April 2017). "The evolution of tumour phylogenetics: principles and practice". Nature Reviews Genetics. 18 (4): 213–229. doi:10.1038/nrg.2016.170. ISSN   1471-0056. PMC   5886015 . PMID   28190876.
  9. 1 2 "Drug discovery - Understanding Evolution". 7 July 2021. Retrieved 23 April 2023.
  10. Bernard, EJ; Azad, Y; Vandamme, AM; Weait, M; Geretti, AM (2007). "HIV forensics: pitfalls and acceptable standards in the use of phylogenetic analysis as evidence in criminal investigations of HIV transmission". HIV Medicine. 8 (6): 382–387. doi: 10.1111/j.1468-1293.2007.00486.x . ISSN   1464-2662. PMID   17661846. S2CID   38883310.
  11. CK-12 Foundation (6 March 2021). Linnaean Classification. Biology LibreTexts. Retrieved 19 April 2023.{{cite book}}: CS1 maint: numeric names: authors list (link)
  12. Phylogenetic Inference. 15 February 2024.{{cite book}}: |website= ignored (help)
  13. Richard C. Brusca & Gary J. Brusca (2003). Invertebrates (2nd ed.). Sunderland, Massachusetts: Sinauer Associates. ISBN   978-0-87893-097-5.
  14. Bock, W. J. (2004). Explanations in systematics. Pp. 49–56. In Williams, D. M. and Forey, P. L. (eds) Milestones in Systematics. London: Systematics Association Special Volume Series 67. CRC Press, Boca Raton, Florida.
  15. Auyang, Sunny Y. (1998). Narratives and Theories in Natural History. In: Foundations of complex-system theories: in economics, evolutionary biology, and statistical physics. Cambridge, U.K.; New York: Cambridge University Press.[ page needed ]
  16. 1 2 3 Rosenberg, Michael (28 August 2001). "Incomplete taxon sampling is not a problem for phylogenetic inference". Proceedings of the National Academy of Sciences. 98 (19): 10751–10756. Bibcode:2001PNAS...9810751R. doi: 10.1073/pnas.191248498 . PMC   58547 . PMID   11526218.
  17. 1 2 3 4 5 Rosenberg, Michael; Kumar, Sudhir (1 February 2023). "Taxon Sampling, Bioinformatics, and Phylogenetics". Evolutionary Journal of the Linnean Society. 52 (1): 119–124. doi:10.1080/10635150390132894. PMC   2796430 . PMID   12554445 . Retrieved 19 April 2023.
  18. Harper, Douglas (2010). "Phylogeny". Online Etymology Dictionary .
  19. Stuessy 2009.
  20. Blechschmidt, Erich (1977) The Beginnings of Human Life. Springer-Verlag Inc., p. 32: "The so-called basic law of biogenetics is wrong. No buts or ifs can mitigate this fact. It is not even a tiny bit correct or correct in a different form, making it valid in a certain percentage. It is totally wrong."
  21. Ehrlich, Paul; Richard Holm; Dennis Parnell (1963) The Process of Evolution. New York: McGraw–Hill, p. 66: "Its shortcomings have been almost universally pointed out by modern authors, but the idea still has a prominent place in biological mythology. The resemblance of early vertebrate embryos is readily explained without resort to mysterious forces compelling each individual to reclimb its phylogenetic tree."
  22. Bayes, Mr; Price, Mr (1763). "An Essay towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, F. R. S. Communicated by Mr. Price, in a Letter to John Canton, A. M. F. R. S". Philosophical Transactions of the Royal Society of London. 53: 370–418. doi: 10.1098/rstl.1763.0053 .
  23. Strickberger, Monroe. 1996. Evolution, 2nd. ed. Jones & Bartlett.[ page needed ]
  24. The Theory of Evolution, Teaching Company course, Lecture 1
  25. Darwin's Tree of Life Archived 13 March 2014 at the Wayback Machine
  26. Archibald, J. David (1 August 2009). "Edward Hitchcock's Pre-Darwinian (1840) "Tree of Life"". Journal of the History of Biology. 42 (3): 561–592. doi:10.1007/s10739-008-9163-y. ISSN   1573-0387. PMID   20027787. S2CID   16634677.
  27. Archibald, J. David (2008). "Edward Hitchcock's Pre-Darwinian (1840) 'Tree of Life'". Journal of the History of Biology. 42 (3): 561–92. CiteSeerX . doi:10.1007/s10739-008-9163-y. PMID   20027787. S2CID   16634677.
  28. Darwin, Charles; Wallace, Alfred (1858). "On the Tendency of Species to form Varieties; and on the Perpetuation of Varieties and Species by Natural Means of Selection". Journal of the Proceedings of the Linnean Society of London. Zoology. 3 (9): 45–62. doi: 10.1111/j.1096-3642.1858.tb02500.x .
  29. Dollo, Louis. 1893. Les lois de l'évolution. Bull. Soc. Belge Géol. Paléont. Hydrol. 7: 164–66.
  30. Galis, Frietson; Arntzen, Jan W.; Lande, Russell (2010). "Dollo's Law and the Irreversibility of Digit Loss in Bachia". Evolution. 64 (8): 2466–76, discussion 2477–85. doi:10.1111/j.1558-5646.2010.01041.x. PMID   20500218. S2CID   24520027 . Retrieved 23 April 2023.
  31. Tillyard, R. J (2012). "A New Classification of the Order Perlaria". The Canadian Entomologist. 53 (2): 35–43. doi:10.4039/Ent5335-2. S2CID   90171163.
  32. Hennig, Willi (1950). Grundzüge einer Theorie der Phylogenetischen Systematik[Basic features of a theory of phylogenetic systematics] (in German). Berlin: Deutscher Zentralverlag. OCLC   12126814.[ page needed ]
  33. Wagner, Warren Herbert (1952). "The fern genus Diellia: structure, affinities, and taxonomy". University of California Publications in Botany. 26 (1–6): 1–212. OCLC   4228844.
  34. Webster's 9th New Collegiate Dictionary[ full citation needed ]
  35. Cain, A. J; Harrison, G. A (2009). "Phyletic Weighting". Proceedings of the Zoological Society of London. 135 (1): 1–31. doi:10.1111/j.1469-7998.1960.tb05828.x.
  36. "The reconstruction of evolution" in "Abstracts of Papers". Annals of Human Genetics. 27 (1): 103–5. 1963. doi:10.1111/j.1469-1809.1963.tb00786.x.
  37. Camin, Joseph H; Sokal, Robert R (1965). "A Method for Deducing Branching Sequences in Phylogeny". Evolution. 19 (3): 311–26. doi: 10.1111/j.1558-5646.1965.tb01722.x . S2CID   20957422.
  38. Wilson, Edward O (1965). "A Consistency Test for Phylogenies Based on Contemporaneous Species". Systematic Zoology. 14 (3): 214–20. doi:10.2307/2411550. JSTOR   2411550.
  39. Hennig. W. (1966). Phylogenetic systematics. Illinois University Press, Urbana.[ page needed ]
  40. Farris, James S (1969). "A Successive Approximations Approach to Character Weighting". Systematic Zoology. 18 (4): 374–85. doi:10.2307/2412182. JSTOR   2412182.
  41. 1 2 Kluge, A. G; Farris, J. S (1969). "Quantitative Phyletics and the Evolution of Anurans". Systematic Biology. 18 (1): 1–32. doi:10.1093/sysbio/18.1.1.
  42. Quesne, Walter J. Le (1969). "A Method of Selection of Characters in Numerical Taxonomy". Systematic Zoology. 18 (2): 201–205. doi:10.2307/2412604. JSTOR   2412604.
  43. Farris, J. S (1970). "Methods for Computing Wagner Trees". Systematic Biology. 19: 83–92. doi:10.1093/sysbio/19.1.83.
  44. Neyman, Jerzy (1971). "Molecular studies of evolution: a source of novel statistical problems". Statistical Decision Theory and Related Topics. pp. 1–27. doi:10.1016/B978-0-12-307550-5.50005-8. ISBN   978-0-12-307550-5.
  45. Fitch, W. M (1971). "Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology". Systematic Biology. 20 (4): 406–16. doi:10.1093/sysbio/20.4.406. JSTOR   2412116.
  46. Robinson, D.F (1971). "Comparison of labeled trees with valency three". Journal of Combinatorial Theory . Series B. 11 (2): 105–19. doi: 10.1016/0095-8956(71)90020-7 .
  47. Kidd, K. K; Sgaramella-Zonta, L. A (1971). "Phylogenetic analysis: Concepts and methods". American Journal of Human Genetics. 23 (3): 235–52. PMC   1706731 . PMID   5089842.
  48. Adams, E. N (1972). "Consensus Techniques and the Comparison of Taxonomic Trees". Systematic Biology. 21 (4): 390–397. doi:10.1093/sysbio/21.4.390.
  49. Farris, James S (1976). "Phylogenetic Classification of Fossils with Recent Species". Systematic Zoology. 25 (3): 271–282. doi:10.2307/2412495. JSTOR   2412495.
  50. Farris, J. S (1977). "Phylogenetic Analysis Under Dollo's Law". Systematic Biology. 26: 77–88. doi:10.1093/sysbio/26.1.77.
  51. Nelson, G (1979). "Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson's Familles Des Plantes (1763-1764)". Systematic Biology. 28: 1–21. doi:10.1093/sysbio/28.1.1.
  52. Gordon, A. D (1979). "A Measure of the Agreement between Rankings". Biometrika. 66 (1): 7–15. doi:10.1093/biomet/66.1.7. JSTOR   2335236.
  53. Efron B. (1979). Bootstrap methods: another look at the jackknife. Ann. Stat. 7: 1–26.
  54. Margush, T; McMorris, F (1981). "Consensus-trees". Bulletin of Mathematical Biology. 43 (2): 239. doi:10.1016/S0092-8240(81)90019-7 (inactive 15 February 2024).{{cite journal}}: CS1 maint: DOI inactive as of February 2024 (link)
  55. Sokal, Robert R; Rohlf, F. James (1981). "Taxonomic Congruence in the Leptopodomorpha Re-Examined". Systematic Zoology. 30 (3): 309. doi:10.2307/2413252. JSTOR   2413252.
  56. Felsenstein, Joseph (1981). "Evolutionary trees from DNA sequences: A maximum likelihood approach". Journal of Molecular Evolution. 17 (6): 368–76. Bibcode:1981JMolE..17..368F. doi:10.1007/BF01734359. PMID   7288891. S2CID   8024924.
  57. Hendy, M.D; Penny, David (1982). "Branch and bound algorithms to determine minimal evolutionary trees". Mathematical Biosciences. 59 (2): 277. doi:10.1016/0025-5564(82)90027-X.
  58. Lipscomb, Diana (1985). "The Eukaryotic Kingdoms". Cladistics. 1 (2): 127–40. doi:10.1111/j.1096-0031.1985.tb00417.x. PMID   34965673. S2CID   84151309.
  59. Felsenstein, J (1985). "Confidence limits on phylogenies: an approach using the bootstrap". Evolution. 39 (4): 783–791. doi:10.2307/2408678. JSTOR   2408678. PMID   28561359.
  60. Lanyon, S. M (1985). "Detecting Internal Inconsistencies in Distance Data". Systematic Biology. 34 (4): 397–403. CiteSeerX . doi:10.1093/sysbio/34.4.397.
  61. Saitou, N.; Nei, M. (1987). "The neighbor-joining method: A new method for reconstructing phylogenetic trees". Molecular Biology and Evolution. 4 (4): 406–25. doi: 10.1093/oxfordjournals.molbev.a040454 . PMID   3447015.
  62. Bremer, Kåre (1988). "The Limits of Amino Acid Sequence Data in Angiosperm Phylogenetic Reconstruction". Evolution. 42 (4): 795–803. doi:10.1111/j.1558-5646.1988.tb02497.x. PMID   28563878. S2CID   13647124.
  63. Farris, James S (1989). "The Retention Index and the Rescaled Consistency Index". Cladistics. 5 (4): 417–419. doi:10.1111/j.1096-0031.1989.tb00573.x. PMID   34933481. S2CID   84287895.
  64. Archie, James W (1989). "Homoplasy Excess Ratios: New Indices for Measuring Levels of Homoplasy in Phylogenetic Systematics and a Critique of the Consistency Index". Systematic Zoology. 38 (3): 253–269. doi:10.2307/2992286. JSTOR   2992286.
  65. Bremer, Kåre (1990). "Combinable Component Consensus". Cladistics. 6 (4): 369–372. doi: 10.1111/j.1096-0031.1990.tb00551.x . PMID   34933485. S2CID   84151348.
  66. D. L. Swofford and G. J. Olsen. 1990. Phylogeny reconstruction. In D. M. Hillis and G. Moritz (eds.), Molecular Systematics, pages 411–501. Sinauer Associates, Sunderland, Mass.
  67. Goloboff, Pablo A (1991). "Homoplasy and the Choice Among Cladograms". Cladistics. 7 (3): 215–232. doi: 10.1111/j.1096-0031.1991.tb00035.x . PMID   34933469. S2CID   85418697.
  68. Goloboff, Pablo A (1991). "Random Data, Homoplasy and Information". Cladistics. 7 (4): 395–406. doi:10.1111/j.1096-0031.1991.tb00046.x. S2CID   85132346.
  69. Goloboff, Pablo A (1993). "Estimating Character Weights During Tree Search". Cladistics. 9 (1): 83–91. doi:10.1111/j.1096-0031.1993.tb00209.x. PMID   34929936. S2CID   84231334.
  70. Wilkinson, M (1994). "Common Cladistic Information and its Consensus Representation: Reduced Adams and Reduced Cladistic Consensus Trees and Profiles". Systematic Biology. 43 (3): 343–368. doi:10.1093/sysbio/43.3.343.
  71. Wilkinson, Mark (1995). "More on Reduced Consensus Methods". Systematic Biology. 44 (3): 435–439. doi:10.2307/2413604. JSTOR   2413604.
  72. Li, Shuying; Pearl, Dennis K; Doss, Hani (2000). "Phylogenetic Tree Construction Using Markov Chain Monte Carlo". Journal of the American Statistical Association. 95 (450): 493. CiteSeerX . doi:10.1080/01621459.2000.10474227. JSTOR   2669394. S2CID   122459537.
  73. Mau, Bob; Newton, Michael A; Larget, Bret (1999). "Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods". Biometrics. 55 (1): 1–12. CiteSeerX . doi:10.1111/j.0006-341X.1999.00001.x. JSTOR   2533889. PMID   11318142. S2CID   932887.
  74. Rannala, Bruce; Yang, Ziheng (1996). "Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference". Journal of Molecular Evolution. 43 (3): 304–11. Bibcode:1996JMolE..43..304R. doi:10.1007/BF02338839. PMID   8703097. S2CID   8269826.
  75. Goloboff, P (2003). "Improvements to resampling measures of group support". Cladistics. 19 (4): 324–32. doi:10.1111/j.1096-0031.2003.tb00376.x. hdl: 11336/101057 . S2CID   55516104.
  76. Li, M.; Chen, X.; Li, X.; Ma, B.; Vitanyi, P.M.B. (December 2004). "The Similarity Metric". IEEE Transactions on Information Theory. 50 (12): 3250–3264. doi:10.1109/TIT.2004.838101. S2CID   221927.
  77. Cilibrasi, R.; Vitanyi, P.M.B. (April 2005). "Clustering by Compression". IEEE Transactions on Information Theory. 51 (4): 1523–1545. arXiv: cs/0312044 . doi:10.1109/TIT.2005.844059. S2CID   911.
  78. Pagel, Mark (2017). "Darwinian perspectives on the evolution of human languages". Psychonomic Bulletin & Review. 24 (1): 151–157. doi:10.3758/s13423-016-1072-z. ISSN   1069-9384. PMC   5325856 . PMID   27368626.
  79. Heggarty, Paul (2006). "Interdisciplinary Indiscipline? Can Phylogenetic Methods Meaningfully Be Applied to Language Data — and to Dating Language?" (PDF). In Peter Forster; Colin Renfrew (eds.). Phylogenetic Methods and the Prehistory of Languages. McDonald Institute Monographs. McDonald Institute for Archaeological Research. Archived from the original (PDF) on 28 January 2021. Retrieved 19 January 2021.
  80. 1 2 Bowern, Claire (14 January 2018). "Computational Phylogenetics". Annual Review of Linguistics. 4 (1): 281–296. doi: 10.1146/annurev-linguistics-011516-034142 . ISSN   2333-9683.
  81. Retzlaff, Nancy; Stadler, Peter F. (2018). "Phylogenetics beyond biology". Theory in Biosciences. 137 (2): 133–143. doi:10.1007/s12064-018-0264-7. ISSN   1431-7613. PMC   6208858 . PMID   29931521.
  82. Hoffmann, Konstantin; Bouckaert, Remco; Greenhill, Simon J; Kühnert, Denise (25 November 2021). "Bayesian phylogenetic analysis of linguistic data using BEAST". Journal of Language Evolution. 6 (2): 119–135. doi: 10.1093/jole/lzab005 . ISSN   2058-458X.
  83. Spencer, Matthew; Davidson, Elizabeth A; Barbrook, Adrian C; Howe, Christopher J (21 April 2004). "Phylogenetics of artificial manuscripts". Journal of Theoretical Biology. 227 (4): 503–511. Bibcode:2004JThBi.227..503S. doi:10.1016/j.jtbi.2003.11.022. ISSN   0022-5193. PMID   15038985.
  84. "Phylogenetics - an overview | ScienceDirect Topics". Retrieved 28 April 2023.
  85. Colijn, Caroline; Gardy, Jennifer (1 January 2014). "Phylogenetic tree shapes resolve disease transmission patterns". Evolution, Medicine, and Public Health. 2014 (1): 96–108. doi:10.1093/emph/eou018. ISSN   2050-6201. PMC   4097963 . PMID   24916411.
  86. 1 2 3 Colijn, Gardy, Caroline, Jennifer (9 June 2014). ""Phylogenetic Tree Shapes Resolve Disease Transmission Patterns." Evolution, Medicine, and Public Health, U.S. National Library of Medicine". Evolution, Medicine, and Public Health. 2014 (1): 96–108. doi:10.1093/emph/eou018. PMC   4097963 . PMID   24916411.{{cite journal}}: CS1 maint: multiple names: authors list (link)