Tree contraction

Last updated October 27, 2023

In computer science, parallel tree contraction is a broadly applicable technique for the parallel solution of a large number of tree problems, and is used as an algorithm design technique for the design of a large number of parallel graph algorithms. Parallel tree contraction was introduced by Gary L. Miller and John H. Reif,^[1] and has subsequently been modified to improve efficiency by X. He and Y. Yesha,^[2] Hillel Gazit, Gary L. Miller and Shang-Hua Teng^[3] and many others.^[4]

Tree contraction has been used in designing many efficient parallel algorithms, including expression evaluation, finding lowest common ancestors, tree isomorphism, graph isomorphism, maximal subtree isomorphism, common subexpression elimination, computing the 3-connected components of a graph, and finding an explicit planar embedding of a planar graph ^[5]

Based on the research and work on parallel tree contraction, various algorithms have been proposed targeting to improve the efficiency or simplicity of this topic. This article hereby focuses on a particular solution, which is a variant of the algorithm by Miller and Reif, and its application.

Introduction

Over the past several decades there has been significant research on deriving new parallel algorithms for a variety of problems, with the goal of designing highly parallel (polylogarithmic depth), work-efficient (linear in the sequential running time) algorithms.^[1] For some problems, tree turns out to be a nice solution. Addressing these problems, we can sometimes get more parallelism simply by representing our problem as a tree.

Considering a generic definition of a tree, there is a root vertex, and several child vertices attached to the root.^[6] And the child vertices might have children themselves, and so on so forth. Eventually, the paths come down to leaves, which are defined to be the terminal of a tree. Then based on this generic tree, we can further come up with some special cases: (1) balanced binary tree; (2) linked list.^[7] A balanced binary tree has exactly two branches for each vertex except for leaves. This gives a O(log n) bound on the depth of the tree.^[8] A linked list is also a tree where every vertex has only one child. We can also achieve O(log n) depth using symmetry breaking.^[9]

Given the general case of a tree, we would like to keep the bound at O(log n) no matter it is unbalanced or list-like or a mix of both. To address this problem, we make use of an algorithm called prefix sum by using the Euler tour technique.^[10] With the Euler tour technique, a tree could be represented in a flat style, and thus prefix sum could be applied to an arbitrary tree in this format. In fact, prefix sum can be used on any set of values and binary operation which form a group: the binary operation must be associative, every value must have an inverse, and there exists an identity value.

With a bit of thought, we can find some exceptional cases where prefix sum becomes incapable or inefficient. Consider the example of multiplication when the set of values includes 0. Or there are some commonly desired operations are max() and min() which do not have inverses. The goal is to seek an algorithm which works on all trees, in expected O(n) work and O(log n) depth. In the following sections, a Rake/Compress algorithm will be proposed to fulfill this goal.^[11]

Definitions

Fig. 1: Rake Operation Rake-1.png — Fig. 1: Rake Operation

Fig. 2: Compress Operation Compress-1.png — Fig. 2: Compress Operation

Before going into the algorithm itself, we first look at a few terminologies that will be used later.

Rake^[12] – Rake step joins every left leaf of binary nodes to the parent. By join, we mean that it undergoes a functional process which achieves the operation we want to make. An example of rake is given in Figure 1.
Compress^[12] – Compress step is actually a sequence of several events: (1) Find an independent set of unary nodes. (Independence here is defined such that no two are neighbors, meaning no parent to child relation) (2) Join each node in independent set with its child (Note that independent set is not unique). An example of compress is given in Figure 2.

And in order to solve actual problems using tree contraction, the algorithm has a structure:

Repeat until tree becomes a unary node {     Rake;     Compress; }

Analysis

For the moment, let us assume that all nodes have less than three children, namely binary. Generally speaking, as long as the degree is bounded, the bounds will hold.^[13] But we will analyze the binary case for simplicity. In the two “degenerate” cases listed above, the rake is the best tool for dealing with balanced binary trees, and compress is the best for linked lists. However, arbitrary trees will have to require a combination of these operations. By this combination, we claim a theorem that

Theorem: After O(log n) expected rake and compress steps, a tree is reduced to a single node.

Now rephrase the tree contraction algorithm as follows:

Input: A binary tree rooted at r
Output: A single node
Operation: A sequence of contraction steps, each consisting of a rake operation and a compress operation (in any order). The rake operation removes all the leaf nodes in parallel. The compress operation finds an independent set of unary nodes and splice out the selected nodes.

To approach the theorem, we first take a look at a property of a binary tree. Given a binary tree T, we can partition the nodes of T into 3 groups: $T_{0}$ contains all leaf nodes, $T_{1}$ contains all nodes with 1 child, and $T_{2}$ contains all nodes with 2 children. It is easy to see that: $V(T)=T_{0}\cup T_{1}\cup T_{2}$ . Now we propose:

Claim: $|T_{0}|=|T_{2}|+1$

This claim can be proved by strong induction on the number of nodes. It is easy to see that the base case of n=1 trivially holds. And we further assume the claim also holds for any tree with at most n nodes. Then given a tree with n+1 nodes rooted at r, there appears to be two cases:

If r has only one subtree, consider the subtree of r. We know that the subtree has the same number of binary nodes and the same number of leaf nodes as the whole tree itself. This is true since the root is a unary node. And based the previous assumption, a unary node does not change either $T_{0}$ or $T_{2}$ .
If r has two subtrees, we define $T_{0}^{L},T_{2}^{L}$ to be the leaf nodes and binary nodes in the left subtree, respectively. Similarly, we define the same $T_{0}^{R},T_{2}^{R}$ for the right subtree. From previous, there is $|T_{0}^{L}|=|T_{2}^{L}|+1$ and $|T_{0}^{R}|=|T_{2}^{R}|+1$ . Also we know that T has $|T_{0}^{L}|+|T_{0}^{R}|$ leaf nodes and $|T_{2}^{L}|+|T_{2}^{R}|+1$ binary nodes. Thus, we can derive:

|T_{0}^{L}|+|T_{0}^{R}|=|T_{2}^{L}|+1+|T_{2}^{R}|+1=(|T_{2}^{L}|+|T_{2}^{R}|+1)+1

which proves the claim.

Following the claim, we then prove a lemma, which leads us to the theorem.

Lemma: The number of nodes of after a contraction step is reduced by a constant factor in expectation.

Assume the number of nodes before the contraction to be m, and m' after the contraction. By definition, the rake operation deletes all $T_{0}$ and the compress operation deletes at least 1/4 of $T_{1}$ in expectation. All $T_{2}$ remains. Therefore, we can see:

E[m']\leq |T_{2}|+{\tfrac {3}{4}}*|T_{1}|\leq {\tfrac {3}{4}}+{\tfrac {3}{4}}*|T_{1}|+{\tfrac {3}{2}}*|T_{2}|={\tfrac {3}{4}}(1+|T_{1}|+2*|T_{2}|)={\tfrac {3}{4}}(|T_{0}|+|T_{1}|+|T_{2}|)={\tfrac {3}{4}}m

Finally, based on this lemma, we can conclude that if the nodes are reduced by a constant factor in each iteration, after $O(\log n)$ , there will be only one node left.^[14]

Applications

Expression Evaluation

To evaluate an expression given as a binary tree (this problem also known as binary expression tree),^[15] consider that: An arithmetic expression is a tree where the leaves have values from some domain and each internal vertex has two children and a label from {+, x, %}. And further assume that these binary operations can be performed in constant time.

We now show the evaluation can be done with parallel tree contraction.^[16]

Step 1. Assign expressions to every node. The expression of a leaf is simply the value that it contains. Write L + R, L − R, or L × R for the operators, where L and R are the values of the expressions in the left and right subtrees, respectively.
Step 2. When a left (right) child with 0 children is merged into an operator, replace L (R) with the value of the child.
Step 3. When a node has 1 child, it has an expression that is a function of one variable. When a left (right) child with 1 child is merged into an operator, replace L (R) with the expression and change the variable in the expression to L (R) if appropriate.

In a node with 2 children, the operands in the expression are f(L) and g(R), where f and g are linear functions, and in a node with 1 child, the expression is h(x), where h is a linear function and x is either L or R. We prove this invariant by induction. At the beginning, the invariant is clearly satisfied. There are three types of merges that result in a not fully evaluated expression. (1) A 1-child node is merged into a 2-children node. (2) A leaf is merged into a 2-children node. (3) A 1-child node is merged into a 1-child node. All three types of merges do not change the invariant. Therefore, every merge simply evaluates or composes linear functions, which takes constant time ^[17]

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take $O(log n)$ time in both the average and worst cases, where $is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.$

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with $k = 2$ . A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binary decision diagram (BDD) or branching program is a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression.

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

The Wedderburn–Etherington numbers are an integer sequence named for Ivor Malcolm Haddon Etherington and Joseph Wedderburn that can be used to count certain kinds of binary trees. The first few numbers in the sequence are

In computer science, the Sethi–Ullman algorithm is an algorithm named after Ravi Sethi and Jeffrey D. Ullman, its inventors, for translating abstract syntax trees into machine code that uses as few registers as possible.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers $x 0, x 1, x 2, ...$ is a second sequence of numbers $y 0, y 1, y 2, ...$ , the sums of prefixes of the input sequence:

A tango tree is a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Pătrașcu in 2004. It is named after Buenos Aires, of which the tango is emblematic.

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

References

1 2 Gary L. Miller and John H. Reif, Parallel Tree Contraction--Part I: Fundamentals., 1989
↑ X. He and Y. Yesha, "Binary tree algebraic computation and parallel algorithms for simple graphs.", Journal of Algorithms, 1988, pp 92-113
↑ Hillel Gazit, Gary L. Miller and Shang-Hua Teng, Optimal tree contraction in the EREW model, Springer, 1988
↑ Karl Abrahamson and et al., "A simple parallel tree contraction algorithm ^{[ dead link ]}.", Journal of Algorithms, 1989, pp 287-302
↑ John H. Reif and Stephen R. Tate, Dynamic parallel tree contraction, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures (ACM), 1994
↑ Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms , Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Section 10.4: Representing rooted trees, pp. 214–217. Chapters 12–14 (Binary Search Trees, Red-Black Trees, Augmenting Data Structures), pp. 253–320.
↑ Donald Knuth. The Art of Computer Programming: Fundamental Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89683-4 . Section 2.3: Trees, pp. 308–423.
↑ Nešetřil, Jaroslav; Ossona de Mendez, Patrice (2012), "Chapter 6. Bounded height trees and tree-depth", Sparsity: Graphs, Structures, and Algorithms, Algorithms and Combinatorics, vol. 28, Heidelberg: Springer, pp. 115–144, doi:10.1007/978-3-642-27875-4, ISBN 978-3-642-27874-7, MR 2920058 .
↑ Andrew Goldberg, Serge Plotkin, and Gregory Shannon, Parallel symmetry-breaking in sparse graphs, Proceedings of the nineteenth annual ACM symposium on Theory of computing (ACM), 1987
↑ Euler tour trees - in Lecture Notes in Advanced Data Structures. Prof. Erik Demaine; Scribe: Katherine Lai.
↑ Gary L. Miller and John H. Reif, Parallel tree contraction and its application, Defense Technical Information Center, 1985
1 2 Parallel Algorithms: Tree Operations, Guy Blelloch, Carnegie Mellon University, 2009
↑ MORIHATA, Akimasa, and Kiminori MATSUZAKI, A Parallel Tree Contraction Algorithm on Non-Binary Trees, MATHEMATICAL ENGINEERING TECHNICAL REPORTS, 2008
↑ Parallel Algorithms: Analyzing Parallel Tree Contraction, Guy Blelloch, 2007
↑ S Buss, Algorithms for boolean formula evaluation and for tree contraction, Arithmetic, Proof Theory, and Computational Complexity, 1993, pp. 96-115
↑ Bader, David A., Sukanya Sreshta, and Nina R. Weisse-Bernstein, Evaluating arithmetic expressions using tree contraction: A fast and scalable parallel implementation for symmetric multiprocessors (SMPs) ^{[ dead link ]}, High Performance Computing—HiPC 2002. Springer Berlin Heidelberg, 2002, pp. 63-75.
↑ Applications of Parallel Tree Contraction, Samuel Yeom, 2015

External links

6.851: Advanced Data Structures by Prof. Erik Demaine

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Miller89book-1] 1 2 Gary L. Miller and John H. Reif, Parallel Tree Contraction--Part I: Fundamentals., 1989

[2] X. He and Y. Yesha, "Binary tree algebraic computation and parallel algorithms for simple graphs.", Journal of Algorithms, 1988, pp 92-113

[3] Hillel Gazit, Gary L. Miller and Shang-Hua Teng, Optimal tree contraction in the EREW model, Springer, 1988

[4] Karl Abrahamson and et al., "A simple parallel tree contraction algorithm ^{[ dead link ]}.", Journal of Algorithms, 1989, pp 287-302

[Reif94dynamic-5] John H. Reif and Stephen R. Tate, Dynamic parallel tree contraction, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures (ACM), 1994

[6] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms , Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Section 10.4: Representing rooted trees, pp. 214–217. Chapters 12–14 (Binary Search Trees, Red-Black Trees, Augmenting Data Structures), pp. 253–320.

[7] Donald Knuth. The Art of Computer Programming: Fundamental Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89683-4 . Section 2.3: Trees, pp. 308–423.

[8] Nešetřil, Jaroslav; Ossona de Mendez, Patrice (2012), "Chapter 6. Bounded height trees and tree-depth", Sparsity: Graphs, Structures, and Algorithms, Algorithms and Combinatorics, vol. 28, Heidelberg: Springer, pp. 115–144, doi:10.1007/978-3-642-27875-4, ISBN 978-3-642-27874-7, MR 2920058 .

[9] Andrew Goldberg, Serge Plotkin, and Gregory Shannon, Parallel symmetry-breaking in sparse graphs, Proceedings of the nineteenth annual ACM symposium on Theory of computing (ACM), 1987

[10] Euler tour trees - in Lecture Notes in Advanced Data Structures. Prof. Erik Demaine; Scribe: Katherine Lai.

[Miller85app-11] Gary L. Miller and John H. Reif, Parallel tree contraction and its application, Defense Technical Information Center, 1985

[cmutrees-12] 1 2 Parallel Algorithms: Tree Operations, Guy Blelloch, Carnegie Mellon University, 2009

[13] MORIHATA, Akimasa, and Kiminori MATSUZAKI, A Parallel Tree Contraction Algorithm on Non-Binary Trees, MATHEMATICAL ENGINEERING TECHNICAL REPORTS, 2008

[14] Parallel Algorithms: Analyzing Parallel Tree Contraction, Guy Blelloch, 2007

[15] S Buss, Algorithms for boolean formula evaluation and for tree contraction, Arithmetic, Proof Theory, and Computational Complexity, 1993, pp. 96-115

[16] Bader, David A., Sukanya Sreshta, and Nina R. Weisse-Bernstein, Evaluating arithmetic expressions using tree contraction: A fast and scalable parallel implementation for symmetric multiprocessors (SMPs) ^{[ dead link ]}, High Performance Computing—HiPC 2002. Springer Berlin Heidelberg, 2002, pp. 63-75.

[17] Applications of Parallel Tree Contraction, Samuel Yeom, 2015

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal Fibonacci Leftist Pairing Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top