Quartet distance

Last updated

The quartet distance [1] is a way of measuring the distance between two phylogenetic trees. It is defined as the number of subsets of four leaves that are not related by the same topology in both trees.

Computing the quartet distance

The most straightforward computation of the quartet distance would require time, where is the number of leaves in the trees.

For binary trees, better algorithms have been found to compute the distance in

and

Gerth Stølting Brodal et al. found an algorithm that takes time to compute the quartet distance between two multifurcating trees when is the maximum degree of the trees, [5] which is accessible in C, perl, and the R package Quartet.

Related Research Articles

Heap (data structure) Computer science data structure

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack data structure in which each element additionally has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued, while in other implementations, ordering of elements with the same priority is undefined.

Minimum spanning tree data structure, subgraph of a weighted graph

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

Binary heap

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full text indices, data compression algorithms, and the field of bibliometrics.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

Range searching

In data structures, the range searching problem most generally consists of preprocessing a set S of objects, in order to determine which objects from S intersect with a query object, called a range. For example, if S is a set of points corresponding to the coordinates of several cities, a geometric variant of the problem is to find cities within a certain latitude and longitude range.

The dynamic convex hull problem is a class of dynamic problems in computational geometry. The problem consists in the maintenance, i.e., keeping track, of the convex hull for input data undergoing a sequence of discrete changes, i.e., when input data elements may be inserted, deleted, or modified. It should be distinguished from the kinetic convex hull, which studies similar problems for continuously moving points. Dynamic convex hull problems may be distinguished by the types of the input data and the allowed types of modification of the input data.

Relative neighborhood graph

In computational geometry, the relative neighborhood graph (RNG) is an undirected graph defined on a set of points in the Euclidean plane by connecting two points and by an edge whenever there does not exist a third point that is closer to both and than they are to each other. This graph was proposed by Godfried Toussaint in 1980 as a way of defining a structure from a set of points that would match human perceptions of the shape of the set.

In computer science, lattice problems are a class of optimization problems related to mathematical objects called lattices. The conjectured intractability of such problems is central to the construction of secure lattice-based cryptosystems: Lattice problems are an example of NP-hard problems which have been shown to be average-case hard, providing a test case for the security of cryptographic algorithms. In addition, some lattice problems which are worst-case hard can be used as a basis for extremely secure cryptographic schemes. The use of worst-case hardness in such schemes makes them among the very few schemes that are very likely secure even against quantum computers. For applications in such cryptosystems, lattices over vector space or free modules are generally considered.

In graph theory and theoretical computer science, the longest path problem is the problem of finding a simple path of maximum length in a given graph. A path is called simple if it does not have any repeated vertices; the length of a path may either be measured by its number of edges, or by the sum of the weights of its edges. In contrast to the shortest path problem, which can be solved in polynomial time in graphs without negative-weight cycles, the longest path problem is NP-hard and the decision version of the problem, which asks whether a path exists of at least some given length, is NP-complete. This means that the decision problem cannot be solved in polynomial time for arbitrary graphs unless P = NP. Stronger hardness results are also known showing that it is difficult to approximate. However, it has a linear time solution for directed acyclic graphs, which has important applications in finding the critical path in scheduling problems.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an -vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

Kenneth L. Clarkson American computer scientist

Kenneth Lee Clarkson is an American computer scientist known for his research in computational geometry. He is a researcher at the IBM Almaden Research Center, and co-editor-in-chief of the Journal of Computational Geometry.

In computer science, the Brodal queue is a heap/priority queue structure with very low worst case time bounds: for insertion, find-minimum, meld and decrease-key and for delete-minimum and general deletion. They are the first heap variant to achieve these bounds without resorting to amortization of operational costs. Brodal queues are named after their inventor Gerth Stølting Brodal.

Finger search tree

In computer science, finger search trees are a type of binary search tree that keeps pointers to interior nodes, called fingers. The fingers speed up searches, insertions, and deletions for elements close to the fingers, giving amortized O(log n) lookups, and amortized O(1) insertions and deletions. It should not be confused with a finger tree nor a splay tree, although both can be used to implement finger search trees.

Funnelsort is a comparison-based sorting algorithm. It is similar to mergesort, but it is a cache-oblivious algorithm, designed for a setting where the number of elements to sort is too large to fit in a cache where operations are done. It was introduced by Matteo Frigo, Charles Leiserson, Harald Prokop, and Sridhar Ramachandran in 1999 in the context of the cache oblivious model.

Greedy geometric spanner

In computational geometry, a greedy geometric spanner is an undirected graph whose vertices represent points in a Euclidean space, and whose edges are selected by a greedy algorithm to make the shortest path distances in the graph approximate the Euclidean distances between pairs of points. The greedy spanner was first described in the PhD thesis of Gautam Das and conference paper and subsequent journal paper by Ingo Althöfer et al. These sources also credited Marshall Bern (unpublished) with the independent discovery of the same construction.

Max-min item allocation is a fair item allocation problem, in which the goal is to maximize the minimum value of an agent. That is, among all possible allocations, the goal is to find an allocation in which the smallest value of an agent is as large as possible. Another term for a max-min allocation is egalitarian item allocation.

References

  1. Estabrook, George F.; McMorris, F. R.; Meacham, Christopher A. (1985). "Comparison of Undirected Phylogenetic Trees Based on Subtrees of Four Evolutionary Units". Systematic Zoology. 34 (2): 193–200. doi:10.2307/2413326. JSTOR   2413326.
  2. Bryant, D.; J. Tsang; P. E. Kearney; M. Li. (11 Jan 2000). "Computing the quartet distance between evolutionary trees". Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms. N.Y.: ACM Press: 285–286.
  3. Brodal, Gerth Stølting; Fagerberg, Rolf; Pedersen, Christian N. S. (2001). "Computing the Quartet Distance between Evolutionary Trees in Time ". Algorithms and Computation. Lecture Notes in Computer Science. 2223. pp. 731–742. doi:10.1007/3-540-45678-3_62. ISBN   978-3-540-42985-2.
  4. Brodal, Gerth Stølting; Rolf Fagerberg; Christian Nørgaard Storm Pedersen (2003). "Computing the Quartet Distance Between Evolutionary Trees in Time ". Algorithmica. 38 (2): 377–395. doi:10.1007/s00453-003-1065-y. S2CID   6911940.CS1 maint: discouraged parameter (link)
  5. Brodal, Gerth Stølting; Rolf Fagerberg; T Mailund; Christian Nørgaard Storm Pedersen; A Sand (2013). "Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree" (PDF). Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM: 1814–1832. doi: 10.1137/1.9781611973105.130 . ISBN   978-1-61197-251-1.CS1 maint: discouraged parameter (link)