SplitsTree

Last updated
SplitsTree
Developer(s) Daniel Huson and David Bryant
Stable release
4.17.1 / 2021
Preview release
5.3 / 2021
Operating system Windows, Linux, Mac OS X
Type Bioinformatics
License Proprietary
Website uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/splitstree/

SplitsTree is a popular freeware program for inferring phylogenetic trees, phylogenetic networks, or, more generally, splits graphs, from various types of data such as a sequence alignment, a distance matrix or a set of trees. [1] [2] SplitsTree implements published methods such as split decomposition, [3] neighbor-net, consensus networks, [4] super networks methods or methods for computing hybridization or simple recombination networks. It uses the NEXUS file format. The splits graph is defined using a special data block (SPLITS block).

Contents

An example of a neighbor-net phylogenetic network generated by SplitsTree v4.6. Heterobranchia tree.png
An example of a neighbor-net phylogenetic network generated by SplitsTree v4.6.

See also

Related Research Articles

Phylogenetic tree Branching diagram of evolutionary relationships between organisms

A phylogenetic tree is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry.

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioral, and social systems. The field is broadly defined and includes foundations in biology, applied mathematics, statistics, biochemistry, chemistry, biophysics, molecular biology, genetics, genomics, computer science, ecology, and evolution, but is most commonly thought of as the intersection of computer science, biology, and big data.

Graph drawing Visualization of node-link graphs

Graph drawing is an area of mathematics and computer science combining methods from geometric graph theory and information visualization to derive two-dimensional depictions of graphs arising from applications such as social network analysis, cartography, linguistics, and bioinformatics.

Biclustering, block clustering, co-clustering, or two-modeclustering is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix. The term was first introduced by Boris Mirkin to name a technique introduced many years earlier, in 1972, by J. A. Hartigan.

A phylogenetic network is any graph used to visualize evolutionary relationships between nucleotide sequences, genes, chromosomes, genomes, or species. They are employed when reticulation events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved. They differ from phylogenetic trees by the explicit modeling of richly linked networks, by means of the addition of hybrid nodes instead of only tree nodes. Phylogenetic trees are a subset of phylogenetic networks. Phylogenetic networks can be inferred and visualised with software such as SplitsTree, the R-package, phangorn, and, more recently, Dendroscope. A standard format for representing phylogenetic networks is a variant of Newick format which is extended to support networks as well as trees.

A disk-covering method is a divide-and-conquer meta-technique for large-scale phylogenetic analysis which has been shown to improve the performance of both heuristics for NP-hard optimization problems and polynomial-time distance-based methods. Disk-covering methods are a meta-technique in that they have flexibility in several areas, depending on the performance metrics that are being optimized for the base method. Such metrics can be efficiency, accuracy, or sequence length requirements for statistical performance. There have been several disk-covering methods developed, which have been applied to different "base methods". Disk-covering methods have been used with distance-based methods to produce "fast-converging methods", which are methods that will reconstruct the true tree from sequences that have at most a polynomial number of sites.

Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms.

Multiple sequence alignment Alignment of more than two molecular sequence

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

The extensible NEXUS file format is widely used in bioinformatics. It stores information about taxa, morphological and molecular characters, distances, genetic codes, assumptions, sets, trees, etc. Several popular phylogenetic programs such as PAUP*, MrBayes, Mesquite, MacClade and SplitsTree use this format.

Dendroscope

Dendroscope is an interactive computer software program written in Java for viewing Phylogenetic trees. This program is designed to view trees of all sizes and is very useful for creating figures. Dendroscope can be used for a variety of analyses of molecular data sets but is particularly designed for metagenomics or analyses of uncultured environmental samples.

Median graph Graph with a median for each three vertices

In graph theory, a division of mathematics, a median graph is an undirected graph in which every three vertices a, b, and c have a unique median: a vertex m(a,b,c) that belongs to shortest paths between each pair of a, b, and c.

UGENE

UGENE is computer software for bioinformatics. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as free and open-source software, under a GNU General Public License (GPL) version 2.

Quantitative comparative linguistics is the use of quantitative analysis as applied to comparative linguistics. Examples include the statistical fields of lexicostatistics and glottochronology, and the borrowing of phylogenetics from biology.

The Robinson–Foulds or symmetric difference metric, often abbreviated as the RF distance, is a simple way to calculate the distance between phylogenetic trees. It is defined as where A is the number of partitions of data implied by the first tree but not the second tree and B is the number of partitions of data implied by the second tree but not the first tree. The partitions are calculated for each tree by removing each branch. Thus, the number of eligible partitions for each tree is equal to the number of branches in that tree. RF distances have been criticized as biased, but they represent a relatively intuitive measure of the distances between phylogenetic trees and therefore remain widely used. Nevertheless, the biases inherent to the RF distances suggest that researches should consider using "Generalized" Robinson–Foulds metrics that may have better theoretical and practical performance and avoid the biases and misleading attributes of the original metric.

A supertree is a single phylogenetic tree assembled from a combination of smaller phylogenetic trees, which may have been assembled using different datasets or a different selection of taxa. Supertree algorithms can highlight areas where additional data would most usefully resolve any ambiguities. The input trees of a supertree should behave as samples from the larger tree.

Neighbor-net

NeighborNet is an algorithm for constructing phylogenetic networks which is loosely based on the neighbor joining algorithm. Like neighbor joining, the method takes a distance matrix as input, and works by agglomerating clusters. However, the NeighborNet algorithm can lead to collections of clusters which overlap and do not form a hierarchy, and are represented using a type of phylogenetic network called a splits graph. If the distance matrix satisfies the Kalmanson combinatorial conditions then Neighbor-net will return the corresponding circular ordering. The method is implemented in the SplitsTree and R/Phangorn packages.

For a given set of taxa like X, and a set of splits S on X, usually together with a non-negative weighting, which may represent character changes distance, or may also have a more abstract interpretation, if the set of splits S is compatible, then it can be represented by an unrooted phylogenetic tree and each edge in the tree corresponds to exactly one of the splits. More generally, S can always be represented by a split network, which is an unrooted phylogenetic network with the property that every split s in S is represented by an array of parallel edges in the network.

T-REX is a freely available web server, developed at the department of Computer Science of the Université du Québec à Montréal, dedicated to the inference, validation and visualization of phylogenetic trees and phylogenetic networks. The T-REX web server allows the users to perform several popular methods of phylogenetic analysis as well as some new phylogenetic applications for inferring, drawing and validating phylogenetic trees and networks.

References

  1. Dress, A.; K. T. Huber; V. Moulton (2001). "Metric spaces in pure and applied mathematics" (PDF). Documenta Mathematica: 121–139.
  2. Huson, D. H.; D. Bryant (2006). "Application of Phylogenetic Networks in Evolutionary Studies". Mol. Biol. Evol. 23 (2): 254–267. doi: 10.1093/molbev/msj030 . PMID   16221896.
  3. Bandelt, H. J.; Dress, A. W. (1992). "Split decomposition: a new and useful approach to phylogenetic analysis of distance data". Molecular Phylogenetics and Evolution. 1 (3): 242–252. doi:10.1016/1055-7903(92)90021-8. ISSN   1055-7903. PMID   1342941.
  4. Holland, Barbara; Moulton, Vincent (2003). Benson, Gary; Page, Roderic D. M. (eds.). "Consensus Networks: A Method for Visualising Incompatibilities in Collections of Trees". Algorithms in Bioinformatics. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2812: 165–176. doi:10.1007/978-3-540-39763-2_13. ISBN   9783540397632.