Unrooted binary tree

Last updated
A cladogram in the form of an unrooted binary tree, representing the similarities and evolutionary history among species of actinobacteria. TreeActinobacteria.svg
A cladogram in the form of an unrooted binary tree, representing the similarities and evolutionary history among species of actinobacteria.

In mathematics and computer science, an unrooted binary tree is an unrooted tree in which each vertex has either one or three neighbors.

Contents

Definitions

A free tree or unrooted tree is a connected undirected graph with no cycles. The vertices with one neighbor are the leaves of the tree, and the remaining vertices are the internal nodes of the tree. The degree of a vertex is its number of neighbors; in a tree with more than one node, the leaves are the vertices of degree one. An unrooted binary tree is a free tree in which all internal nodes have degree exactly three.

In some applications it may make sense to distinguish subtypes of unrooted binary trees: a planar embedding of the tree may be fixed by specifying a cyclic ordering for the edges at each vertex, making it into a plane tree. In computer science, binary trees are often rooted and ordered when they are used as data structures, but in the applications of unrooted binary trees in hierarchical clustering and evolutionary tree reconstruction, unordered trees are more common. [1]

Additionally, one may distinguish between trees in which all vertices have distinct labels, trees in which the leaves only are labeled, and trees in which the nodes are not labeled. In an unrooted binary tree with n leaves, there will be n  2 internal nodes, so the labels may be taken from the set of integers from 1 to 2n  1 when all nodes are to be labeled, or from the set of integers from 1 to n when only the leaves are to be labeled. [1]

Rooted binary trees

An unrooted binary tree T may be transformed into a full rooted binary tree (that is, a rooted tree in which each non-leaf node has exactly two children) by choosing a root edgee of T, placing a new root node in the middle of e, and directing every edge of the resulting subdivided tree away from the root node. Conversely, any full rooted binary tree may be transformed into an unrooted binary tree by removing the root node, replacing the path between its two children by a single undirected edge, and suppressing the orientation of the remaining edges in the graph. For this reason, there are exactly 2n 3 times as many full rooted binary trees with n leaves as there are unrooted binary trees with n leaves. [1]

Hierarchical clustering

A hierarchical clustering of a collection of objects may be formalized as a maximal family of sets of the objects in which no two sets cross. That is, for every two sets S and T in the family, either S and T are disjoint or one is a subset of the other, and no more sets can be added to the family while preserving this property. If T is an unrooted binary tree, it defines a hierarchical clustering of its leaves: for each edge (u,v) in T there is a cluster consisting of the leaves that are closer to u than to v, and these sets together with the empty set and the set of all leaves form a maximal non-crossing family. Conversely, from any maximal non-crossing family of sets over a set of n elements, one can form a unique unrooted binary tree that has a node for each triple (A,B,C) of disjoint sets in the family that together cover all of the elements. [2]

Evolutionary trees

According to simple forms of the theory of evolution, the history of life can be summarized as a phylogenetic tree in which each node describes a species, the leaves represent the species that exist today, and the edges represent ancestor-descendant relationships between species. This tree has a natural orientation from ancestors to descendants, and a root at the common ancestor of the species, so it is a rooted tree. However, some methods of reconstructing binary trees can reconstruct only the nodes and the edges of this tree, but not their orientations.

For instance, cladistic methods such as maximum parsimony use as data a set of binary attributes describing features of the species. These methods seek a tree with the given species as leaves in which the internal nodes are also labeled with features, and attempt to minimize the number of times some feature is present at only one of the two endpoints of an edge in the tree. Ideally, each feature should only have one edge for which this is the case. Changing the root of a tree does not change this number of edge differences, so methods based on parsimony are not capable of determining the location of the tree root and will produce an unrooted tree, often an unrooted binary tree. [3]

Unrooted binary trees also are produced by methods for inferring evolutionary trees based on quartet data specifying, for each four leaf species, the unrooted binary tree describing the evolution of those four species, and by methods that use quartet distance to measure the distance between trees. [4]

Branch-decomposition

Unrooted binary trees are also used to define branch-decompositions of graphs, by forming an unrooted binary tree whose leaves represent the edges of the given graph. That is, a branch-decomposition may be viewed as a hierarchical clustering of the edges of the graph. Branch-decompositions and an associated numerical quantity, branch-width, are closely related to treewidth and form the basis for efficient dynamic programming algorithms on graphs. [5]

Enumeration

Because of their applications in hierarchical clustering, the most natural graph enumeration problem on unrooted binary trees is to count the number of trees with n labeled leaves and unlabeled internal nodes. An unrooted binary tree on n labeled leaves can be formed by connecting the nth leaf to a new node in the middle of any of the edges of an unrooted binary tree on n  1 labeled leaves. There are 2n  5 edges at which the nth node can be attached; therefore, the number of trees on n leaves is larger than the number of trees on n  1 leaves by a factor of 2n  5. Thus, the number of trees on n labeled leaves is the double factorial

[6]

The numbers of trees on 2, 3, 4, 5, ... labeled leaves are

1, 1, 3, 15, 105, 945, 10395, 135135, 2027025, 34459425, ... (sequence A001147 in the OEIS ).

Fundamental Equalities

The leaf-to-leaf path-length on a fixed Unrooted Binary Tree (UBT) T encodes the number of edges belonging to the unique path in T connecting a given leaf to another leaf. For example, by referring to the UBT shown in the image on the right, the path-length between the leaves 1 and 2 is equal to 2 whereas the path-length between the leaves 1 and 3 is equal to 3. The path-length sequence from a given leaf on a fixed UBT T encodes the lengths of the paths from the given leaf to all the remaining ones. For example, by referring to the UBT shown in the image on the right, the path-length sequence from the leaf 1 is . The set of path-length sequences associated to the leaves of T is usually referred to as the path-length sequence collection of T [7] .

An example of an unrooted binary tree with four leaves An example of an unrooted binary tree with four leaves.pdf
An example of an unrooted binary tree with four leaves

Daniele Catanzaro, Raffaele Pesenti and Laurence Wolsey showed [7] that the path-length sequence collection encoding a given UBT with n leaves must satisfy specific equalities, namely

These equalities are proved to be necessary and independent for a path-length collection to encode an UBT with n leaves. [7] It is currently unknown whether they are also sufficient.

Alternative names

Unrooted binary trees have also been called free binary trees, [8] cubic trees, [9] ternary trees [5] and unrooted ternary trees. [10] However, the "free binary tree" name has also been applied to unrooted trees that may have degree-two nodes [11] and to rooted binary trees with unordered children, [12] and the "ternary tree" name is more frequently used to mean a rooted tree with three children per node.

Notes

  1. 1 2 3 Furnas (1984).
  2. See e.g. Eppstein (2009) for the same correspondence between clusterings and trees, but using rooted binary trees instead of unrooted trees and therefore including an arbitrary choice of the root node.
  3. Hendy & Penny (1989).
  4. St. John et al. (2003).
  5. 1 2 Robertson & Seymour (1991).
  6. Balding, Bishop & Cannings (2007).
  7. 1 2 3 4 Catanzaro D, Pesenti R, Wolsey L (2020). "On the Balanced Minimum Evolution Polytope". Discrete Optimization. 36: 100570. doi:10.1016/j.disopt.2020.100570. S2CID   213389485.
  8. Czumaj & Gibbons (1996).
  9. Exoo (1996).
  10. Cilibrasi & Vitanyi (2006).
  11. Harary, Palmer & Robinson (1992).
  12. Przytycka & Larmore (1994).

Related Research Articles

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

<span class="mw-page-title-main">Huffman coding</span> Technique to compress data

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

<span class="mw-page-title-main">Tree (data structure)</span> Abstract data type simulating a hierarchical tree structure and represented as a set of linked nodes

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Tree (graph theory)</span> Undirected, connected and acyclic graph

In graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by at most one path, or equivalently an acyclic undirected graph, or equivalently a disjoint union of trees.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Phylogenetic tree</span> Branching diagram of evolutionary relationships between organisms

A phylogenetic tree, phylogeny or evolutionary tree is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time. In other words, it is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. In evolutionary biology, all life on Earth is theoretically part of a single phylogenetic tree, indicating common ancestry. Phylogenetics is the study of phylogenetic trees. The main challenge is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of species or taxa. Computational phylogenetics focuses on the algorithms involved in finding optimal phylogenetic tree in the phylogenetic landscape.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

In computer science, a binary decision diagram (BDD) or branching program is a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression.

<span class="mw-page-title-main">Quadtree</span> Tree data structure in which each internal node has exactly four children, to partition a 2D area

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In mathematics, computer science and especially graph theory, a distance matrix is a square matrix containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the distance being used to define this matrix may or may not be a metric. If there are N elements, this matrix will have size N×N. In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.

The Wedderburn–Etherington numbers are an integer sequence named for Ivor Malcolm Haddon Etherington and Joseph Wedderburn that can be used to count certain kinds of binary trees. The first few numbers in the sequence are

In computational phylogenetics, tree alignment is a computational problem concerned with producing multiple sequence alignments, or alignments of three or more sequences of DNA, RNA, or protein. Sequences are arranged into a phylogenetic tree, modeling the evolutionary relationships between species or taxa. The edit distances between sequences are calculated for each of the tree's internal vertices, such that the sum of all edit distances within the tree is minimized. Tree alignment can be accomplished using one of several algorithms with various trade-offs between manageable tree size and computational effort.

In constraint satisfaction, a decomposition method translates a constraint satisfaction problem into another constraint satisfaction problem that is binary and acyclic. Decomposition methods work by grouping variables into sets, and solving a subproblem for each set. These translations are done because solving binary acyclic problems is a tractable problem.

In mathematics, Newick tree format is a way of representing graph-theoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick's restaurant in Dover, New Hampshire, US. The adopted format is a generalization of the format developed by Meacham in 1984 for the first tree-drawing programs in Felsenstein's PHYLIP package.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

<span class="mw-page-title-main">Top tree</span>

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In graph theory and theoretical computer science, the level ancestor problem is the problem of preprocessing a given rooted tree T into a data structure that can determine the ancestor of a given node at a given distance from the root of the tree.

In the mathematical field of graph theory, an agreement forest for two given trees is any forest which can, informally speaking, be obtained from both trees by removing a common number of edges.

In discrete mathematics and theoretical computer science, the rotation distance between two binary trees with the same number of nodes is the minimum number of tree rotations needed to reconfigure one tree into another. Because of a combinatorial equivalence between binary trees and triangulations of convex polygons, rotation distance is equivalent to the flip distance for triangulations of convex polygons.

References