Implied weighting

Last updated

Implied weighting describes a group of methods used in phylogenetic analysis to assign the greatest importance to characters that are most likely to be homologous. These are a posteriori methods, which include also dynamic weighting, as opposed to a priori methods, which include adaptive, independent, and chemical categories (see Weighting at the American Museum of Natural History's website).

The first attempt to implement such a technique was by Farris (1969), [1] which he called successive approximations weighting, whereby a tree was constructed with equal weights, and characters that appeared as homoplasies on this tree were downweighted based on the CI (consistency index) or RCI (rescaled consistency index), which are measures of homology. The analysis was repeated with these new weights, and characters were again re-weighted; subsequent iteration was continued until a stable state was reached. Farris suggested that each character could be considered independently with respect to a weight implied by frequency of change. However, the final tree depended strongly on the starting weights and the finishing criteria. [2]

The most widely used and implemented method, called implied weighting, follows from Goloboff (1993). [2] The first time a character changes state on a tree, this state change is given the weight '1'; subsequent changes are less 'expensive' and are given smaller weights as the characters tendency for homoplasy becomes more apparent. The trees which maximize the concave function of homoplasy resolve character conflict in favour of the characters which have more homology (less homoplasy) and imply that the average weight for the characters is as high as possible.

Goloboff recognizes that trees with the heaviest average weights give the most 'respect' to the data: a low average weight implies that most characters are being 'ignored' by the tree-building algorithms. [2]

Though originally proposed with a severe weighting of k=3, Goloboff now prefers more 'gentle' concavities (e.g. k = 12), [3] which have been shown to be more effective in simulated and real-world cases. [4]

Related Research Articles

Cladistics Method of biological systematics in evolutionary biology

Cladistics is an approach to biological classification in which organisms are categorized in groups ("clades") based on hypotheses of most recent common ancestry. The evidence for hypothesized relationships is typically shared derived characteristics (synapomorphies) that are not present in more distant groups and ancestors. Theoretically, a common ancestor and all its descendants are part of the clade. However, from an empirical perspective, common ancestors are inferences based on a cladistic hypothesis of relationships of taxa whose character states can be observed. Importantly, all descendants stay in their overarching ancestral clade. For example, if the terms worms or fishes were used within a strict cladistic framework, these terms would include humans. Many of these terms are normally used paraphyletically, outside of cladistics, e.g. as a 'grade'. Radiation results in the generation of new subclades by bifurcation, but in practice sexual hybridization may blur very closely related groupings.

Phylogenetics Study of evolutionary relationships between organisms

In biology, phylogenetics is a part of systematics that addresses the inference of the evolutionary history and relationships among or within groups of organisms. These relationships are hypothesized by phylogenetic inference methods that evaluate observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology, often under a specified model of evolution of these traits. The result of such an analysis is a phylogeny —a diagrammatic hypothesis of relationships that reflects the evolutionary history of a group of organisms. The tips of a phylogenetic tree can be living taxa or fossils, and represent the 'end', or the present, in an evolutionary lineage. A phylogenetic diagram can be rooted or unrooted. A rooted tree diagram indicates the hypothetical common ancestor, or ancestral lineage, of the tree. An unrooted tree diagram makes no assumption about the ancestral line, and does not show the origin or "root" of the taxa in question or the direction of inferred evolutionary transformations. In addition to their proper use for inferring phylogenetic patterns among taxa, phylogenetic analyses are often employed to represent relationships among gene copies or individual organisms. Such uses have become central to understanding biodiversity, evolution, ecology, and genomes. In February 2021, scientists reported, for the first time, the sequencing of DNA from animal remains, a mammoth in this instance, over a million years old, the oldest DNA sequenced to date.

Cladogram Diagram used to show relations among groups of organisms with common origins

A cladogram is a diagram used in cladistics to show relations among organisms. A cladogram is not, however, an evolutionary tree because it does not show how ancestors are related to descendants, nor does it show how much they have changed, so many differing evolutionary trees can be consistent with the same cladogram. A cladogram uses lines that branch off in different directions ending at a clade, a group of organisms with a last common ancestor. There are many shapes of cladograms but they all have lines that branch off from other lines. The lines can be traced back to where they branch off. These branching off points represent a hypothetical ancestor which can be inferred to exhibit the traits shared among the terminal taxa above it. This hypothetical ancestor might then provide clues about the order of evolution of various features, adaptation, and other evolutionary narratives about ancestors. Although traditionally such cladograms were generated largely on the basis of morphological characters, DNA and RNA sequencing data and computational phylogenetics are now very commonly used in the generation of cladograms, either on their own or in combination with morphology.

Homology (biology) Shared ancestry between a pair of structures or genes in different taxa

In biology, homology is similarity due to shared ancestry between a pair of structures or genes in different taxa. A common example of homologous structures is the forelimbs of vertebrates, where the wings of bats and birds, the arms of primates, the front flippers of whales and the forelegs of four-legged vertebrates like dogs and crocodiles are all derived from the same ancestral tetrapod structure. Evolutionary biology explains homologous structures adapted to different purposes as the result of descent with modification from a common ancestor. The term was first applied to biology in a non-evolutionary context by the anatomist Richard Owen in 1843. Homology was later explained by Charles Darwin's theory of evolution in 1859, but had been observed before this, from Aristotle onwards, and it was explicitly analysed by Pierre Belon in 1555.

Phylogenetic tree Branching diagram of evolutionary relationships between organisms

A phylogenetic tree is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry.

Mygalomorphae Infraorder of arachnids (spiders)

The Mygalomorphae, or mygalomorphs, are an infraorder of spiders, and comprise one of three major groups of living spiders with over 3000 species, found on all continents except Antarctica. Many members are known as trapdoor spiders due to them forming trapdoors over their burrows. Other prominent groups include Australian funnel web spiders, and tarantulas, with the latter accounting for around one third of all mygalomorphs.

Outgroup (cladistics)

In cladistics or phylogenetics, an outgroup is a more distantly related group of organisms that serves as a reference group when determining the evolutionary relationships of the ingroup, the set of organisms under study, and is distinct from sociological outgroups. The outgroup is used as a point of comparison for the ingroup and specifically allows for the phylogeny to be rooted. Because the polarity (direction) of character change can be determined only on a rooted phylogeny, the choice of outgroup is essential for understanding the evolution of traits along a phylogeny.

Apomorphy and synapomorphy Two concepts on heritable traits

In phylogenetics, an apomorphy is a novel character or character state that has evolved from its ancestral form. A synapomorphy is an apomorphy shared by two or more taxa and is therefore hypothesized to have evolved in their most recent common ancestor. In cladistics, synapomorphy is synonymous with homology.

In phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes is to be preferred. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy. In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.

Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms.

Autapomorphy Distinctive feature, known as a derived trait, that is unique to a given taxon

In phylogenetics, an autapomorphy is a distinctive feature, known as a derived trait, that is unique to a given taxon. That is, it is found only in one taxon, but not found in any others or outgroup taxa, not even those most closely related to the focal taxon. It can therefore be considered an apomorphy in relation to a single taxon. The word autapomorphy, first introduced in 1950 by German entomologist Willi Hennig, is derived from the Greek words αὐτός, autos "self"; ἀπό, apo "away from"; and μορφή, morphḗ = "shape".

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Tree rearrangements are deterministic algorithms devoted to searching for an optimal tree structure. They can be applied to any set of data that are naturally arranged into a tree, but have most applications in computational phylogenetics, especially in maximum parsimony and maximum likelihood searches of phylogenetic trees, which seek to identify one among many possible trees that best explains the evolutionary history of a particular gene or species.

Distance matrices are used in phylogeny as non-parametric distance methods and were originally applied to phenetic data using a matrix of pairwise distances. These distances are then reconciled to produce a tree. The distance matrix can come from a number of different sources, including measured distance or morphometric analysis, various pairwise distance formulae applied to discrete morphological characters, or genetic distance from sequence, restriction fragment, or allozyme data. For phylogenetic character data, raw distance values can be calculated by simply counting the number of pairwise differences in character states.

Quantitative comparative linguistics is the use of quantitative analysis as applied to comparative linguistics. Examples include the statistical fields of lexicostatistics and glottochronology, and the borrowing of phylogenetics from biology.

Stratocladistics is a technique in phylogenetics of making phylogenetic inferences using both geological and morphobiological data. It follows many of the same rules as cladistics, using Bayesian logic to quantify how good a phylogenetic hypothesis is in terms of debt and parsimony. However, in addition to the morphological debt that is used to determine phylogenetic dissimilarities in cladistics, there is also stratigraphic debt which adds the dimension of time to the equation. Although stratocladistics has been viewed with suspicion by some workers, it represents a total evidence approach that has some advantages over traditional cladistic approaches. For example, stratocladistics has been shown to outperform simple parsimony in tests based on simulated data and stratocladistics has better resolution than simple cladistics, with fewer equally parsimonious trees than in a basic cladistic analysis.

Wayne Paul Maddison, is a professor and Canada Research Chair at the departments of zoology and botany at the University of British Columbia, and the Director of the Spencer Entomological Collection at the Beaty Biodiversity Museum.

Character evolution

Character evolution is the process by which a character or trait evolves along the branches of an evolutionary tree. Character evolution usually refers to single changes within a lineage that make this lineage unique from others. These changes are called character state changes and they are often used in the study of evolution to provide a record of common ancestry. Character state changes can be phenotypic changes, nucleotide substitutions, or amino acid substitutions. These small changes in a species can be identifying features of when exactly a new lineage diverged from an old one.

Homoplasy

Homoplasy, in biology and phylogenetics, is when a trait has been gained or lost independently in separate lineages over the course of evolution. This is different from homology, which is when the similarity of traits can be parsimoniously explained by common ancestry. Homoplasy can arise from both similar selection pressures acting on adapting species, and the effects of genetic drift.

References

  1. Farris, James S. (December 1969). "A Successive Approximations Approach to Character Weighting". Systematic Zoology. 18 (4): 374–385. doi:10.2307/2412182. JSTOR   2412182.
  2. 1 2 3 Goloboff, Pablo A. (March 1993). "Estimating Character Weights During Tree Search". Cladistics. 9 (1): 83–91. doi:10.1111/j.1096-0031.1993.tb00209.x. PMID   34929936. S2CID   84231334.
  3. Goloboff, Pablo A.; Torres, Ambrosio; Arias, J. Salvador (August 2018). "Weighted parsimony outperforms other methods of phylogenetic inference under models appropriate for morphology". Cladistics. 34 (4): 407–437. doi: 10.1111/cla.12205 . PMID   34649370.
  4. Smith, Martin R. (February 2019). "Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets". Biology Letters. 15 (2): 20180632. doi: 10.1098/rsbl.2018.0632 . PMC   6405459 . PMID   30958126.