Treap

Last updated
Treap
Treap.svg
Type Randomized binary search tree
Time complexity in big O notation
OperationAverageWorst case
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
Space complexity
SpaceO(n)O(n)

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

Contents

Description

A treap with alphabetic key and numeric max heap order TreapAlphaKey.svg
A treap with alphabetic key and numeric max heap order

The treap was first described by Raimund Seidel and Cecilia R. Aragon in 1989; [1] [2] its name is a portmanteau of tree and heap. It is a Cartesian tree in which each key is given a (randomly chosen) numeric priority. As with any binary search tree, the inorder traversal order of the nodes is the same as the sorted order of the keys. The structure of the tree is determined by the requirement that it be heap-ordered: that is, the priority number for any non-leaf node must be greater than or equal to the priority of its children. Thus, as with Cartesian trees more generally, the root node is the maximum-priority node, and its left and right subtrees are formed in the same manner from the subsequences of the sorted order to the left and right of that node.

An equivalent way of describing the treap is that it could be formed by inserting the nodes highest priority-first into a binary search tree without doing any rebalancing. Therefore, if the priorities are independent random numbers (from a distribution over a large enough space of possible priorities to ensure that two nodes are very unlikely to have the same priority) then the shape of a treap has the same probability distribution as the shape of a random binary search tree, a search tree formed by inserting the nodes without rebalancing in a randomly chosen insertion order. Because random binary search trees are known to have logarithmic height with high probability, the same is true for treaps. This mirrors the binary search tree argument that quicksort runs in expected time. If binary search trees are solutions to the dynamic problem version of sorting, then Treaps correspond specifically to dynamic quicksort where priorities guide pivot choices.

Aragon and Seidel also suggest assigning higher priorities to frequently accessed nodes, for instance by a process that, on each access, chooses a random number and replaces the priority of the node with that number if it is higher than the previous priority. This modification would cause the tree to lose its random shape; instead, frequently accessed nodes would be more likely to be near the root of the tree, causing searches for them to be faster.

Naor and Nissim [3] describe an application in maintaining authorization certificates in public-key cryptosystems.

Operations

Basic operations

Treaps support the following basic operations:

Building a treap

Bulk operations

In addition to the single-element insert, delete and lookup operations, several fast "bulk" operations have been defined on treaps: union, intersection and set difference. These rely on two helper operations, split and join.

Join performed on treaps
T
1
{\displaystyle T_{1}}
and
T
2
{\displaystyle T_{2}}
. Right child of
T
1
{\displaystyle T_{1}}
after the join is defined as a join of its former right child and
T
2
{\displaystyle T_{2}}
. Treap merge.svg
Join performed on treaps and . Right child of after the join is defined as a join of its former right child and .

The join algorithm is as follows:

function join(L, k, R)     if prior(k, k(L)) and prior(k, k(R)) return Node(L, k, R)     if prior(k(L), k(R)) return Node(left(L), k(L), join(right(L), k, R))     return Node(join(L, k, left(R)), k(R), right(R))
To split
T
{\displaystyle T}
by
x
{\displaystyle x}
, recursive split call is done to either left or right child of
T
{\displaystyle T}
. Treap split.svg
To split by , recursive split call is done to either left or right child of .

The split algorithm is as follows:

function split(T, k)     if (T = nil) return (nil, false, nil)     (L, (m, c), R) = expose(T)     if (k = m) return (L, true, R)     if (k < m)          (L', b, R') = split(L, k)         return (L', b, join(R', m, R))     if (k > m)          (L', b, R') = split(R, k)         return (join(L, m, L'), b, R'))

The union of two treaps t1 and t2, representing sets A and B is a treap t that represents AB. The following recursive algorithm computes the union:

function union(t1, t2):     if t1 = nil:         return t2if t2 = nil:         return t1if priority(t1) < priority(t2):         swap t1 and t2     t<, t> ← split t2 on key(t1)     return join(union(left(t1), t<), key(t1),                     union(right(t1), t>))

Here, split is presumed to return two trees: one holding the keys less than its input key, one holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists as well.)

The algorithm for intersection is similar, but requires the join helper routine. The complexity of each of union, intersection and difference is O(m log n/m) for treaps of sizes m and n, with mn. Moreover, since the recursive calls to union are independent of each other, they can be executed in parallel. [4]

Split and Union call Join but do not deal with the balancing criteria of treaps directly, such an implementation is usually called the "join-based" implementation.

Note that if hash values of keys are used as priorities and structurally equal nodes are merged already at construction, then each merged node will be a unique representation of a set of keys. Provided that there can only be one simultaneous root node representing a given set of keys, two sets can be tested for equality by pointer comparison, which is constant in time.

This technique can be used to enhance the merge algorithms to perform fast also when the difference between two sets is small. If input sets are equal, the union and intersection functions could break immediately returning one of the input sets as result, while the difference function should return the empty set.

Let d be the size of the symmetric difference. The modified merge algorithms will then also be bounded by O(d log n/d). [5] [6]

Randomized binary search tree

The randomized binary search tree, introduced by Martínez and Roura subsequently to the work of Aragon and Seidel on treaps, [7] stores the same nodes with the same random distribution of tree shape, but maintains different information within the nodes of the tree in order to maintain its randomized structure.

Rather than storing random priorities on each node, the randomized binary search tree stores a small integer at each node, the number of its descendants (counting itself as one); these numbers may be maintained during tree rotation operations at only a constant additional amount of time per rotation. When a key x is to be inserted into a tree that already has n nodes, the insertion algorithm chooses with probability 1/(n + 1) to place x as the new root of the tree, and otherwise, it calls the insertion procedure recursively to insert x within the left or right subtree (depending on whether its key is less than or greater than the root). The numbers of descendants are used by the algorithm to calculate the necessary probabilities for the random choices at each step. Placing x at the root of a subtree may be performed either as in the treap by inserting it at a leaf and then rotating it upwards, or by an alternative algorithm described by Martínez and Roura that splits the subtree into two pieces to be used as the left and right children of the new node.

The deletion procedure for a randomized binary search tree uses the same information per node as the insertion procedure, but unlike the insertion procedure, it only needs on average O(1) random decisions to join the two subtrees descending from the left and right children of the deleted node into a single tree. That is because the subtrees to be joined are on average at depth Θ(log n); joining two trees of size n and m needs Θ(log(n+m)) random choices on average. If the left or right subtree of the node to be deleted is empty, the join operation is trivial; otherwise, the left or right child of the deleted node is selected as the new subtree root with probability proportional to its number of descendants, and the join proceeds recursively.

Comparison

The information stored per node in the randomized binary tree is simpler than in a treap (a small integer rather than a high-precision random number), but it makes a greater number of calls to the random number generator (O(log n) calls per insertion or deletion rather than one call per insertion) and the insertion procedure is slightly more complicated due to the need to update the numbers of descendants per node. A minor technical difference is that, in a treap, there is a small probability of a collision (two keys getting the same priority), and in both cases, there will be statistical differences between a true random number generator and the pseudo-random number generator typically used on digital computers. However, in any case, the differences between the theoretical model of perfect random choices used to design the algorithm and the capabilities of actual random number generators are vanishingly small.

Although the treap and the randomized binary search tree both have the same random distribution of tree shapes after each update, the history of modifications to the trees performed by these two data structures over a sequence of insertion and deletion operations may be different. For instance, in a treap, if the three numbers 1, 2, and 3 are inserted in the order 1, 3, 2, and then the number 2 is deleted, the remaining two nodes will have the same parent-child relationship that they did prior to the insertion of the middle number. In a randomized binary search tree, the tree after the deletion is equally likely to be either of the two possible trees on its two nodes, independently of what the tree looked like prior to the insertion of the middle number.

Implicit treap

An implicit treap [8] [ unreliable source? ] is a simple variation of an ordinary treap which can be viewed as a dynamic array that supports the following operations in :

The idea behind an implicit treap is to use the array index as a key, but to not store it explicitly. Otherwise, an update (insertion/deletion) would result in changes of the keys in nodes of the tree.

The key value (implicit key) of a node T is the number of nodes less than that node plus one. Note that such nodes can be present not only in its left subtree but also in left subtrees of its ancestors P, if T is in the right subtree of P.

Therefore we can quickly calculate the implicit key of the current node as we perform an operation by accumulating the sum of all nodes as we descend the tree. Note that this sum does not change when we visit the left subtree but it will increase by when we visit the right subtree.

The join algorithm for an implicit treap is as follows:

voidjoin(pitem&t,piteml,pitemr){if(!l||!r)t=l?l:r;elseif(l->prior>r->prior)join(l->r,l->r,r),t=l;elsejoin(r->l,l,r->l),t=r;upd_cnt(t);}

[8] The split algorithm for an implicit treap is as follows:

voidsplit(pitemt,pitem&l,pitem&r,intkey,intadd=0){if(!t)returnvoid(l=r=0);intcur_key=add+cnt(t->l);//implicit keyif(key<=cur_key)split(t->l,l,t->l,key,add),r=t;elsesplit(t->r,t->r,r,key,add+1+cnt(t->l)),l=t;upd_cnt(t);}

[8]

Operations

Insert element

To insert an element at position pos we divide the array into two subsections [0...pos-1] and [pos..sz] by calling the split function and we get two trees and . Then we merge with the new node by calling the join function. Finally we call the join function to merge and .

Delete element

We find the element to be deleted and perform a join on its children L and R. We then replace the element to be deleted with the tree that resulted from the join operation.

Find sum, minimum or maximum in a given range

To perform this calculation we will proceed as follows:

  • First we will create an additional field F to store the value of the target function for the range represented by that node. we will create a function that calculates the value F based on the values of the L and R children of the node. We will call this target function at the end of all functions that modify the tree, i.e., split and join.
  • Second we need to process a query for a given range [A..B]: We will call the split function twice and split the treap into which contains , which contains , and which contains . After the query is answered we will call the join function twice to restore the original treap.

Addition/painting in a given range

To perform this operation we will proceed as follows:

  • We will create an extra field D which will contain the added value for the subtree. We will create a function push which will be used to propagate this change from a node to its children. We will call this function at the beginning of all functions which modify the tree, i.e., split and join so that after any changes made to the tree the information will not be lost.

Reverse in a given range

To show that the subtree of a given node needs to be reversed for each node we will create an extra boolean field R and set its value to true. To propagate this change we will swap the children of the node and set R to true for all of them.

See also

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

<span class="mw-page-title-main">Quadtree</span> Tree data structure in which each internal node has exactly four children, to partition a 2D area

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

A skew heap is a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied:

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

<span class="mw-page-title-main">Random binary tree</span> Binary tree selected at random

In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Different distributions have been used, leading to different properties for these trees.

In computer science, parallel tree contraction is a broadly applicable technique for the parallel solution of a large number of tree problems, and is used as an algorithm design technique for the design of a large number of parallel graph algorithms. Parallel tree contraction was introduced by Gary L. Miller and John H. Reif, and has subsequently been modified to improve efficiency by X. He and Y. Yesha, Hillel Gazit, Gary L. Miller and Shang-Hua Teng and many others.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

References

  1. Aragon, Cecilia R.; Seidel, Raimund (1989), "Randomized Search Trees" (PDF), 30th Annual Symposium on Foundations of Computer Science, Washington, D.C.: IEEE Computer Society Press, pp. 540–545, doi:10.1109/SFCS.1989.63531, ISBN   0-8186-1982-1
  2. Seidel, Raimund; Aragon, Cecilia R. (1996), "Randomized Search Trees", Algorithmica, 16 (4/5): 464–497, doi:10.1007/s004539900061
  3. Naor, M.; Nissim, K. (April 2000), "Certificate revocation and certificate update" (PDF), IEEE Journal on Selected Areas in Communications, 18 (4): 561–570, doi:10.1109/49.839932, S2CID   13833836 [ permanent dead link ].
  4. Blelloch, Guy E.; Reid-Miller, Margaret (1998), "Fast set operations using treaps", Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures - SPAA '98, New York, NY, USA: ACM, pp. 16–26, doi:10.1145/277651.277660, ISBN   0-89791-989-0, S2CID   7342709 .
  5. Liljenzin, Olle (2013). "Confluently Persistent Sets and Maps". arXiv: 1301.3388 . Bibcode:2013arXiv1301.3388L.{{cite journal}}: Cite journal requires |journal= (help)
  6. Confluent Sets and Maps on GitHub
  7. Martínez, Conrado; Roura, Salvador (1997), "Randomized binary search trees", Journal of the ACM, 45 (2): 288–323, doi: 10.1145/274787.274812 , S2CID   714621
  8. 1 2 3 "Treap - Competitive Programming Algorithms". cp-algorithms.com. Retrieved 2021-11-21.