Join-based tree algorithms

Last updated

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. [1] Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

Contents

The join operation takes as input two binary balanced trees and of the same balancing scheme, and a key , and outputs a new balanced binary tree whose in-order traversal is the in-order traversal of , then then the in-order traversal of . In particular, if the trees are search trees, which means that the in-order of the trees maintain a total ordering on keys, it must satisfy the condition that all keys in are smaller than and all keys in are greater than .

History

The join operation was first defined by Tarjan [2] on red–black trees, which runs in worst-case logarithmic time. Later Sleator and Tarjan [3] described a join algorithm for splay trees which runs in amortized logarithmic time. Later Adams [4] extended join to weight-balanced trees and used it for fast set–set functions including union, intersection and set difference. In 1998, Blelloch and Reid-Miller extended join on treaps, and proved the bound of the set functions to be for two trees of size and , which is optimal in the comparison model. They also brought up parallelism in Adams' algorithm by using a divide-and-conquer scheme. In 2016, Blelloch et al. formally proposed the join-based algorithms, and formalized the join algorithm for four different balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps. In the same work they proved that Adams' algorithms on union, intersection and difference are work-optimal on all the four balancing schemes.

Join algorithms

The function join considers rebalancing the tree, and thus depends on the input balancing scheme. If the two trees are balanced, join simply creates a new node with left subtree t1, root k and right subtree t2. Suppose that t1 is heavier (this "heavier" depends on the balancing scheme) than t2 (the other case is symmetric). Join follows the right spine of t1 until a node c which is balanced with t2. At this point a new node with left child c, root k and right child t2 is created to replace c. The new node may invalidate the balancing invariant. This can be fixed with rotations.

The following is the join algorithms on different balancing schemes.

The join algorithm for AVL trees:

function joinRightAVL(TL, k, TR)     (l, k', c) := expose(TL)     if h(c) ≤ h(TR) + 1         T' := Node(c, k, TR)         if h(T') ≤ h(l) + 1             return Node(l, k', T')         elsereturn rotateLeft(Node(l, k', rotateRight(T')))     else          T' := joinRightAVL(c, k, TR)         T := Node(l, k', T')         if h(T') ≤ h(l) + 1             return T         elsereturn rotateLeft(T)  function joinLeftAVL(TL, k, TR)     /* symmetric to joinRightAVL */  function join(TL, k, TR)     if h(TL) > h(TR) + 1         return joinRightAVL(TL, k, TR)     else if h(TR) > h(TL) + 1         return joinLeftAVL(TL, k, TR)     elsereturn Node(TL, k, TR)

Where:

The join algorithm for red–black trees:

function joinRightRB(TL, k, TR)     if r(TL) = ⌊r(TR)/2⌋ × 2         return Node(TL, ⟨k, red⟩, TR)     else          (L', ⟨k', c'⟩, R') := expose(TL)         T' := Node(L', ⟨k', c'⟩, joinRightRB(R', k, TR))         if c' = black and T'.right.color = T'.right.right.color = red             T'.right.right.color := black             return rotateLeft(T')         elsereturn T'  function joinLeftRB(TL, k, TR)     /* symmetric to joinRightRB */  function join(TL, k, TR)     if ⌊r(TL)/2⌋ > ⌊r(TR)/2⌋ × 2         T' := joinRightRB(TL, k, TR)         if (T'.color = red) and (T'.right.color = red)             T'.color := black         return T'     else if ⌊r(TR)/2⌋ > ⌊r(TL)/2⌋ × 2         /* symmetric */     else if TL.color = black and TR = black         return Node(TL, ⟨k, red⟩, TR)     elsereturn Node(TL, ⟨k, black⟩, TR)

Where:

The join algorithm for weight-balanced trees:

function joinRightWB(TL, k, TR)     (l, k', c) := expose(TL)     if w(TL) =α w(TR)         return Node(TL, k, TR)     else          T' := joinRightWB(c, k, TR)         (l1, k1, r1) := expose(T')         if w(l) =α w(T')             return Node(l, k', T')         else if w(l) =α w(l1) and w(l)+w(l1) =α w(r1)             return rotateLeft(Node(l, k', T'))         elsereturn rotateLeft(Node(l, k', rotateRight(T'))  function joinLeftWB(TL, k, TR)     /* symmetric to joinRightWB */  function join(TL, k, TR)     if w(TL) >α w(TR)         return joinRightWB(TL, k, TR)     else if w(TR) >α w(TL)         return joinLeftWB(TL, k, TR)     elsereturn Node(TL, k, TR)

Where:

Join-based algorithms

In the following, extracts the left child , key , and right child of node into a tuple . creates a node with left child , key and right child . "" means that two statements and can run in parallel.

Split

To split a tree into two trees, those smaller than key x, and those larger than key x, we first draw a path from the root by inserting x into the tree. After this insertion, all values less than x will be found on the left of the path, and all values greater than x will be found on the right. By applying Join, all the subtrees on the left side are merged bottom-up using keys on the path as intermediate nodes from bottom to top to form the left tree, and the right part is asymmetric. For some applications, Split also returns a boolean value denoting if x appears in the tree. The cost of Split is , order of the height of the tree.

The split algorithm is as follows:

function split(T, k)     if (T = nil)         return (nil, false, nil)     else         (L, m, R) := expose(T)         if k < m             (L', b, R') := split(L, k)             return (L', b, join(R', m, R))         else if k > m             (L', b, R') := split(R, k)             return (join(L, m, L'), b, R'))         elsereturn (L, true, R)

Join2

This function is defined similarly as join but without the middle key. It first splits out the last key of the left tree, and then join the rest part of the left tree with the right tree with . The algorithm is as follows:

function splitLast(T)     (L, k, R) := expose(T)     if R = nil         return (L, k)     else         (T', k') := splitLast(R)         return (join(L, k, T'), k')  function join2(L, R)     if L = nil         return R     else         (L', k) := splitLast(L)         return join(L', k, R)

The cost is for a tree of size .

Insert and delete

The insertion and deletion algorithms, when making use of join can be independent of balancing schemes. For an insertion, the algorithm compares the key to be inserted with the key in the root, inserts it to the left/right subtree if the key is smaller/greater than the key in the root, and joins the two subtrees back with the root. A deletion compares the key to be deleted with the key in the root. If they are equal, return join2 on the two subtrees. Otherwise, delete the key from the corresponding subtree, and join the two subtrees back with the root. The algorithms are as follows:

function insert(T, k)     if T = nil         return Node(nil, k, nil)     else         (L, k', R) := expose(T)         if k < k'             return join(insert(L,k), k', R)         else if k > k'             return join(L, k', insert(R, k))         elsereturn T  function delete(T, k)     if T = nil         return nil     else         (L, k', R) := expose(T)         if k < k'             return join(delete(L, k), k', R)         else if k > k'             return join(L, k', delete(R, k))         elsereturn join2(L, R)

Both insertion and deletion requires time if .

Set–set functions

Several set operations have been defined on weight-balanced trees: union, intersection and set difference. The union of two weight-balanced trees t1 and t2 representing sets A and B, is a tree t that represents AB. The following recursive function computes this union:

function union(t1, t2)     if t1 = nil         return t2else if t2 = nil         return t1else         (l1, k1, r1) := expose(t1)         (t<, b, t>) := split(t2, k1)         l' := union(l1, t<) || r' := union(r1, t>)         return join(l', k1, r')

Similarly, the algorithms of intersection and set-difference are as follows:

function intersection(t1, t2)     if t1 = nil or t2 = nil         return nil     else         (l1, k1, r1) := expose(t1)         (t<, b, t>) = split(t2, k1)         l' := intersection(l1, t<) || r' := intersection(r1, t>)         if b             return join(l', k1, r')         elsereturn join2(l', r')  function difference(t1, t2)     if t1 = nil          return nil     else if t2 = nil          return t1else         (l1, k1, r1) := expose(t1)         (t<, b, t>) := split(t2, k1)         l' = difference(l1, t<) || r' = difference(r1, t>)         if b             return join2(l', r')         elsereturn join(l', k1, r')

The complexity of each of union, intersection and difference is for two weight-balanced trees of sizes and . This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth . [1] When , the join-based implementation applies the same computation as in a single-element insertion or deletion if the root of the larger tree is used to split the smaller tree.

Build

The algorithm for building a tree can make use of the union algorithm, and use the divide-and-conquer scheme:

function build(A[], n)     if n = 0         return nil     else if n = 1         return Node(nil, A[0], nil)     else         l' := build(A, n/2) || r' := (A+n/2, n-n/2)         return union(L, R)

This algorithm costs work and has depth. A more-efficient algorithm makes use of a parallel sorting algorithm.

function buildSorted(A[], n)     if n = 0         return nil     else if n = 1         return Node(nil, A[0], nil)     else         l' := build(A, n/2) || r' := (A+n/2+1, n-n/2-1)         return join(l', A[n/2], r')  function build(A[], n)     A' := sort(A, n)     return buildSorted(A, n)

This algorithm costs work and has depth assuming the sorting algorithm has work and depth.

Filter

This function selects all entries in a tree satisfying a predicate , and return a tree containing all selected entries. It recursively filters the two subtrees, and join them with the root if the root satisfies , otherwise join2 the two subtrees.

function filter(T, p)     if T = nil         return nil     else         (l, k, r) := expose(T)         l' := filter(l, p) || r' := filter(r, p)         if p(k)             return join(l', k, r')         elsereturn join2(l', R)

This algorithm costs work and depth on a tree of size , assuming has constant cost.

Used in libraries

The join-based algorithms are applied to support interface for sets, maps, and augmented maps [5] in libraries such as Hackage, SML/NJ, and PAM. [5]

Notes

    Related Research Articles

    <span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

    In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

    <span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

    In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

    <span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

    In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

    <span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

    In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

    In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

    A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

    <span class="mw-page-title-main">Treap</span>

    In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

    In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

    In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

    In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

    An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.

    <i>k</i>-d tree Multidimensional search tree for points in k dimensional space

    In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

    In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.

    In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

    A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

    A skew heap is a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied:

    In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

    In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

    <span class="mw-page-title-main">Top tree</span>

    A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

    In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. A ball tree partitions data points into a nested set of balls. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search.

    References

    1. 1 2 Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2016), "Just Join for Parallel Ordered Sets", Symposium on Parallel Algorithms and Architectures, Proc. of 28th ACM Symp. Parallel Algorithms and Architectures (SPAA 2016), ACM, pp. 253–264, arXiv: 1602.02120 , doi:10.1145/2935764.2935768, ISBN   978-1-4503-4210-0
    2. Tarjan, Robert Endre (1983), "Data structures and network algorithms", Data structures and network algorithms, Siam, pp. 45–56
    3. Sleator, Daniel Dominic; Tarjan, Robert Endre (1985), "Self-adjusting binary search trees", Journal of the ACM, Siam
    4. Adams, Stephen (1992), "Implementing sets efficiently in a functional language", Implementing sets efficiently in a functional language, Citeseer, CiteSeerX   10.1.1.501.8427 .
    5. 1 2 Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2018), "PAM: parallel augmented maps", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 290–304