Neighbor-net

Last updated
An example of a neighbor-net phylogenetic network generated by SplitsTree v4.6. Heterobranchia tree.png
An example of a neighbor-net phylogenetic network generated by SplitsTree v4.6.

NeighborNet [1] is an algorithm for constructing phylogenetic networks which is loosely based on the neighbor joining algorithm. Like neighbor joining, the method takes a distance matrix as input, and works by agglomerating clusters. However, the NeighborNet algorithm can lead to collections of clusters which overlap and do not form a hierarchy, and are represented using a type of phylogenetic network called a splits graph. If the distance matrix satisfies the Kalmanson combinatorial conditions then Neighbor-net will return the corresponding circular ordering. [2] [3] The method is implemented in the SplitsTree and R/Phangorn [4] [5] packages.

Examples of the application of Neighbor-net can be found in virology, [6] horticulture, [7] dinosaur genetics, [8] comparative linguistics, [9] and archaeology. [10]

Related Research Articles

In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa to create the phylogenetic tree.

<span class="mw-page-title-main">Substitution model</span> Description of the process by which states in sequences change into each other and back

In biology, a substitution model, also called models of DNA sequence evolution, are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules represented as sequence of symbols. Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny. Estimates of evolutionary distances are typically calculated using substitution models. Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

A phylogenetic network is any graph used to visualize evolutionary relationships between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved. They differ from phylogenetic trees by the explicit modeling of richly linked networks, by means of the addition of hybrid nodes instead of only tree nodes. Phylogenetic trees are a subset of phylogenetic networks. Phylogenetic networks can be inferred and visualised with software such as SplitsTree, the R-package, phangorn, and, more recently, Dendroscope. A standard format for representing phylogenetic networks is a variant of Newick format which is extended to support networks as well as trees.

Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, heuristics, and approaches involved in phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood, parsimony, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

The extensible NEXUS file format is widely used in bioinformatics. It stores information about taxa, morphological and molecular characters, distances, genetic codes, assumptions, sets, trees, etc. Several popular phylogenetic programs such as PAUP*, MrBayes, Mesquite, MacClade and SplitsTree use this format.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. In the context of evolutionary biology, ancestral reconstruction can be used to recover different kinds of ancestral character states of organisms that lived millions of years ago. These states include the genetic sequence, the amino acid sequence of a protein, the composition of a genome, a measurable characteristic of an organism (phenotype), and the geographic range of an ancestral population or species. This is desirable because it allows us to examine parts of phylogenetic trees corresponding to the distant past, clarifying the evolutionary history of the species in the tree. Since modern genetic sequences are essentially a variation of ancient ones, access to ancient sequences may identify other variations and organisms which could have arisen from those sequences. In addition to genetic sequences, one might attempt to track the changing of one character trait to another, such as fins turning to legs.

Bayesian inference of phylogeny combines the information in the prior and in the data likelihood to create the so-called posterior probability of trees, which is the probability that the tree is correct given the data, the prior and the likelihood model. Bayesian inference was introduced into molecular phylogenetics in the 1990s by three independent groups: Bruce Rannala and Ziheng Yang in Berkeley, Bob Mau in Madison, and Shuying Li in University of Iowa, the last two being PhD students at the time. The approach has become very popular since the release of the MrBayes software in 2001, and is now one of the most popular methods in molecular phylogenetics.

TREE-PUZZLE is a computer program used to construct phylogenetic trees from sequence data by maximum likelihood analysis. Branch lengths can be calculated with and without the molecular clock hypothesis.

<span class="mw-page-title-main">SplitsTree</span>

SplitsTree is a popular freeware program for inferring phylogenetic trees, phylogenetic networks, or, more generally, splits graphs, from various types of data such as a sequence alignment, a distance matrix or a set of trees. SplitsTree implements published methods such as split decomposition, neighbor-net, consensus networks, super networks methods or methods for computing hybridization or simple recombination networks. It uses the NEXUS file format. The splits graph is defined using a special data block.

Quantitative comparative linguistics is the use of quantitative analysis as applied to comparative linguistics. Examples include the statistical fields of lexicostatistics and glottochronology, and the borrowing of phylogenetics from biology.

A supertree is a single phylogenetic tree assembled from a combination of smaller phylogenetic trees, which may have been assembled using different datasets or a different selection of taxa. Supertree algorithms can highlight areas where additional data would most usefully resolve any ambiguities. The input trees of a supertree should behave as samples from the larger tree.

T-REX is a freely available web server, developed at the department of Computer Science of the Université du Québec à Montréal, dedicated to the inference, validation and visualization of phylogenetic trees and phylogenetic networks. The T-REX web server allows the users to perform several popular methods of phylogenetic analysis as well as some new phylogenetic applications for inferring, drawing and validating phylogenetic trees and networks.

In bioinformatics, alignment-free sequence analysis approaches to molecular sequence and structure data provide alternatives over alignment-based approaches.

Multispecies Coalescent Process is a stochastic process model that describes the genealogical relationships for a sample of DNA sequences taken from several species. It represents the application of coalescent theory to the case of multiple species. The multispecies coalescent results in cases where the relationships among species for an individual gene can differ from the broader history of the species. It has important implications for the theory and practice of phylogenetics and for understanding genome evolution.

Minimum evolution is a distance method employed in phylogenetics modeling. It shares with maximum parsimony the aspect of searching for the phylogeny that has the shortest total sum of branch lengths.

Katharina Theresia Huber is a German applied mathematician and mathematical biologist whose research concerns phylogenetic trees, evolutionary analysis, their mathematical foundations, and their mathematical visualization. She is an associate professor in the School of Computing Sciences at the University of East Anglia in England, and the school's director of postgraduate research.

<span class="mw-page-title-main">Phylogenetic reconciliation</span> Technique in evolutionary study

In phylogenetics, reconciliation is an approach to connect the history of two or more coevolving biological entities. The general idea of reconciliation is that a phylogenetic tree representing the evolution of an entity can be drawn within another phylogenetic tree representing an encompassing entity to reveal their interdependence and the evolutionary events that have marked their shared history. The development of reconciliation approaches started in the 1980s, mainly to depict the coevolution of a gene and a genome, and of a host and a symbiont, which can be mutualist, commensalist or parasitic. It has also been used for example to detect horizontal gene transfer, or understand the dynamics of genome evolution.

References

  1. Bryant D, Moulton V (February 2004). "Neighbor-net: an agglomerative method for the construction of phylogenetic networks". Molecular Biology and Evolution. 21 (2): 255–65. doi: 10.1093/molbev/msh018 . PMID   14660700.
  2. Bryant D, Moulton V, Spillner A (June 2007). "Consistency of the neighbor-net algorithm". Algorithms for Molecular Biology. 2: 8. doi: 10.1186/1748-7188-2-8 . PMC   1948893 . PMID   17597551.
  3. Levy D, Pachter L (August 2011). "The neighbor-net algorithm". Advances in Applied Mathematics. 47 (2): 240–58. doi: 10.1016/j.aam.2010.09.002 .
  4. Schliep KP (February 2011). "phangorn: phylogenetic analysis in R". Bioinformatics. 27 (4): 592–3. doi:10.1093/bioinformatics/btq706. PMC   3035803 . PMID   21169378.
  5. Schliep K, Potts AA, Morrison DA, Grimm GW (2017). "Intertwining phylogenetic trees and networks". Methods in Ecology and Evolution. 8 (10): 1212–1220. doi: 10.1111/2041-210X.12760 .
  6. Schmidt-Chanasit J, Bialonski A, Heinemann P, Ulrich RG, Günther S, Rabenau HF, Doerr HW (March 2009). "A 10-year molecular survey of herpes simplex virus type 1 in Germany demonstrates a stable and high prevalence of genotypes A and B". Journal of Clinical Virology. 44 (3): 235–7. doi:10.1016/j.jcv.2008.12.016. PMID   19186100.
  7. Kilian B, Ozkan H, Deusch O, Effgen S, Brandolini A, Kohl J, et al. (January 2007). "Independent wheat B and G genome origins in outcrossing Aegilops progenitor haplotypes". Molecular Biology and Evolution. 24 (1): 217–27. doi: 10.1093/molbev/msl151 . hdl: 11858/00-001M-0000-0012-38C8-E . PMID   17053048.
  8. Buckley M, Walker A, Ho SY, Yang Y, Smith C, Ashton P, et al. (January 2008). "Comment on "Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry"". Science. 319 (5859): 33, author reply 33. Bibcode:2008Sci...319...33B. doi:10.1126/science.1147046. PMC   2694913 . PMID   18174420.
  9. Bowern, Claire (2010). "Historical linguistics in Australia: trees, networks and their implications". Philosophical Transactions of the Royal Society B: Biological Sciences. 365 (1559): 3845–3854. doi:10.1098/rstb.2010.0013. ISSN   0962-8436. PMC   2981908 . PMID   21041209.
  10. Shennan S (2009). Pattern and process in cultural evolution. University of California Press.