PAUP*

Last updated
Phylogenetic Analysis Using Parsimony *and other methods
Original author(s) David L. Swofford
Stable release
4.0b10
Preview release
4.0a164
Written inC
Operating system Windows, macOS, Unix-like
Platform Cross-platform
Type Science
License Quasi-commercial
Website PAUP*

PAUP* (Phylogenetic Analysis Using Parsimony *and other methods) is a computational phylogenetics program for inferring evolutionary trees (phylogenies), written by David L. Swofford. Originally, as the name implies, PAUP only implemented parsimony, but from version 4.0 (when the program became known as PAUP*) it also supports distance matrix and likelihood methods. Version 3.0 ran on Macintosh computers and supported a rich, user-friendly graphical interface. Together with the program MacClade, [1] with which it shares the NEXUS data format, [2] PAUP* was the phylogenetic software of choice for many phylogenetists. [3]

Version 4.0 added support for Windows (graphical shell and command line) and Unix (command line only) platforms. However, the graphical user interface for the Macintosh does not support versions of Mac OS X higher than 10.14 (although a GUI for later versions of Mac OS is planned[ needs update ]). PAUP* is also available as a plugin for Geneious. PAUP*, which now sports the self-referential title of PAUP* (* Phylogenetic Analysis Using PAUP), is undergoing rapid updates and it now includes the "species tree" method SVDquartets, [4] [5] in addition to parsimony, likelihood, and distance methods for phylogenetics.

Related Research Articles

In biology, phylogenetics is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by phylogenetic inference, methods that focus on observed heritable traits, such as DNA sequences, protein amino acid sequences, or morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms.

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In other words, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. In evolutionary biology, all life on Earth is theoretically part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the study of phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

In phylogenetics and computational phylogenetics, maximum parsimony is an optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes. Under the maximum-parsimony criterion, the optimal tree will minimize the amount of homoplasy. In other words, under this criterion, the shortest possible tree that explains the data is considered best. Some of the basic ideas behind maximum parsimony were presented by James S. Farris in 1970 and Walter M. Fitch in 1971.

In phylogenetics, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular or morphological change accumulated within a lineage is sufficient to cause that lineage to appear similar to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Such bias is more common when the overall divergence of some taxa results in long branches within a phylogeny. Long branches are often attracted to the base of a phylogenetic tree, because the lineage included to represent an outgroup is often also long-branched. The frequency of true LBA is unclear and often debated, and some authors view it as untestable and therefore irrelevant to empirical phylogenetic inference. Although often viewed as a failing of parsimony-based methodology, LBA could in principle result from a variety of scenarios and be inferred under multiple analytical paradigms.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

PHYLogeny Inference Package (PHYLIP) is a free computational phylogenetics package of programs for inferring evolutionary trees (phylogenies). It consists of 65 portable programs, i.e., the source code is written in the programming language C. As of version 3.696, it is licensed as open-source software; versions 3.695 and older were proprietary software freeware. Releases occur as source code, and as precompiled executables for many operating systems including Windows, Mac OS 8, Mac OS 9, OS X, Linux ; and FreeBSD from FreeBSD.org. Full documentation is written for all the programs in the package and is included therein. The programs in the phylip package were written by Professor Joseph Felsenstein, of the Department of Genome Sciences and the Department of Biology, University of Washington, Seattle.

The extensible NEXUS file format is widely used in bioinformatics. It stores information about taxa, morphological and molecular characters, distances, genetic codes, assumptions, sets, trees, etc. Several popular phylogenetic programs such as PAUP*, MrBayes, Mesquite, MacClade and SplitsTree use this format.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals, populations, or specie to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

Phylogenetic comparative methods (PCMs) use information on the historical relationships of lineages (phylogenies) to test evolutionary hypotheses. The comparative method has a long history in evolutionary biology; indeed, Charles Darwin used differences and similarities between species as a major source of evidence in The Origin of Species. However, the fact that closely related lineages share many traits and trait combinations as a result of the process of descent with modification means that lineages are not independent. This realization inspired the development of explicitly phylogenetic comparative methods. Initially, these methods were primarily developed to control for phylogenetic history when testing for adaptation; however, in recent years the use of the term has broadened to include any use of phylogenies in statistical tests. Although most studies that employ PCMs focus on extant organisms, many methods can also be applied to extinct taxa and can incorporate information from the fossil record.

The Jamaican monkey is an extinct species of New World monkey that was endemic to Jamaica. It was first uncovered at Long Mile Cave by Harold Anthony in 1920.

The Hispaniola monkey is an extinct primate that was endemic on the island of Hispaniola, in the present-day Dominican Republic. The species is thought to have gone extinct around the 16th century. The exact timing and cause of the extinction are unclear, but it is likely related to the settlement of Hispaniola by Europeans after 1492.

A supertree is a single phylogenetic tree assembled from a combination of smaller phylogenetic trees, which may have been assembled using different datasets or a different selection of taxa. Supertree algorithms can highlight areas where additional data would most usefully resolve any ambiguities. The input trees of a supertree should behave as samples from the larger tree.

A patrocladogram is a cladistic branching pattern that has been precisely modified by use of patristic distances ; a type of phylogram. The patristic distance is defined as, "the number of apomorphic step changes separating two taxa on a cladogram," and is used exclusively to determine the amount of divergence of a characteristic from a common ancestor. This means that cladistic and patristic distances are combined to construct a new tree using various phenetic algorithms. The purpose of the patrocladogram in biological classification is to form a hypothesis about which evolutionary processes are actually involved before making a taxonomic decision. Patrocladograms are based on biostatistics that include but are not limited to: parsimony, distance matrix, likelihood methods, and Bayesian probability. Some examples of genomically related data that can be used as inputs for these methods are: molecular sequences, whole genome sequences, gene frequencies, restriction sites, distance matrices, unique characters, mutations such as SNPs, and mitochondrial genome data.

Wayne Paul Maddison, is a professor and Canada Research Chair in Biodiversity at the departments of zoology and botany at the University of British Columbia, and the Director of the Spencer Entomological Collection at the Beaty Biodiversity Museum.

T-REX is a freely available web server, developed at the department of Computer Science of the Université du Québec à Montréal, dedicated to the inference, validation and visualization of phylogenetic trees and phylogenetic networks. The T-REX web server allows the users to perform several popular methods of phylogenetic analysis as well as some new phylogenetic applications for inferring, drawing and validating phylogenetic trees and networks.

<span class="mw-page-title-main">Mesquite (software)</span>

Mesquite is a software package primarily designed for phylogenetic analyses. It was developed as a successor to MacClade, when the authors recognized that implementing a modular architecture in MacClade would be infeasible. Mesquite is largely written in Java and uses NEXUS-formatted files as input. Mesquite is available as a compiled executable for Macintosh, Windows, and Unix-like platforms, and the source code is available on GitHub.

Multispecies Coalescent Process is a stochastic process model that describes the genealogical relationships for a sample of DNA sequences taken from several species. It represents the application of coalescent theory to the case of multiple species. The multispecies coalescent results in cases where the relationships among species for an individual gene can differ from the broader history of the species. It has important implications for the theory and practice of phylogenetics and for understanding genome evolution.

<span class="mw-page-title-main">Phylogenetic reconciliation</span> Technique in evolutionary study

In phylogenetics, reconciliation is an approach to connect the history of two or more coevolving biological entities. The general idea of reconciliation is that a phylogenetic tree representing the evolution of an entity can be drawn within another phylogenetic tree representing an encompassing entity to reveal their interdependence and the evolutionary events that have marked their shared history. The development of reconciliation approaches started in the 1980s, mainly to depict the coevolution of a gene and a genome, and of a host and a symbiont, which can be mutualist, commensalist or parasitic. It has also been used for example to detect horizontal gene transfer, or understand the dynamics of genome evolution.

References

  1. Maddison DR, Maddision WP (2001-02-08). MacClade 4: Analysis of Phylogeny and Character Evolution. ISBN   0-8789-3470-7. Archived from the original on 2011-09-27. Retrieved 2015-12-08.
  2. Maddison DR, Swofford DL, Maddison WP (December 1997). "NEXUS: an extensible file format for systematic information". Systematic Biology. 46 (4): 590–621. doi: 10.1093/sysbio/46.4.590 . PMID   11975335.
  3. Hall BG (2001). Phylogenetic Trees Made Easy . Sinauer Associates. ISBN   0-8789-3311-5.
  4. Chifman J, Kubatko L (December 2014). "Quartet inference from SNP data under the coalescent model". Bioinformatics. 30 (23): 3317–24. doi:10.1093/bioinformatics/btu530. PMC   4296144 . PMID   25104814.
  5. Chifman J, Kubatko L (June 2015). "Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites". Journal of Theoretical Biology. 374: 35–47. arXiv: 1406.4811 . doi:10.1016/j.jtbi.2015.03.006. PMID   25791286. S2CID   17167792.