Zip tree

Last updated

The zip tree was introduced as a variant of random binary search tree by Robert Tarjan, Caleb Levy, and Stephen Timmel. [1] Zip trees are similar to max treaps except ranks are generated through a geometric distribution and maintain their max-heap property during insertions and deletions through unzipping and zipping rather than tree rotations. Nodes of the tree contain a distinct, comparable key and a numeric rank. The tree is max heap ordered with respect to the ranks with ties broken in favor of smaller keys. Nodes of the tree must contain distinct keys but allow for duplicate ranks. The rank tie-breaker favoring smaller keys creates a bias in the tree favoring smaller nodes. A slightly modified zip tree variant, zip-zip trees address this bias by introducing a different tie-breaker with a second rank. [2]

Contents

Operations

Zip trees support the operations of a binary search tree. The main implementation differences come through the zip tree's implementation of insert and delete through unzipping and zipping paths to maintain the heap ordering of the tree.

The time it takes to insert or delete a node u is equal to the time to search for u plus the time for unzipping or zipping. The time it takes to unzip or zip is proportional to one plus the number of nodes on the path(s) being unzipped/zipped. The expected depth of any node in a zip tree is at most 1.5 log n, making the expected running time of the insert, delete, and search operations all O(log n). [1]

Insertion

When inserting a node x into a zip tree, first generate a new rank from a geometric distribution with a probability of success of 1/2. Let x.key be the key of the node x, and let x.rank be the rank of the node x.

Insertion and deletion of a node with the key 10 and rank 3 in a zip tree. Insert-delete-ziptree.jpg
Insertion and deletion of a node with the key 10 and rank 3 in a zip tree.

Then, follow the search path for x in the tree until finding a node u such that u.rank<= x.rank and u.key < x.key. Continue along the search path for x, "unzippping" every node v passed by placing them either in path P if v.key < x.key, or path Q if v.key > x.key. Keys must be unique so if at any point v.key = x.key, the search stops and no new node is inserted.

Once the search for x is complete, x is inserted in place of the node u. The top node of the path P becomes the left child of x and the top node of Q becomes the right child. The parent and child pointers between u and u.parent will be updated accordingly with x, and if u was previously the root node, x becomes the new root.

Deletion

When deleting a node x, first search the tree to find it. If no node is found with the same key as x.key, no deletion occurs.

Once x is found, continue two searches down the left and right subtrees of x, "zipping" together the right spine of the left subtree and left spine of the right subtree into one path R in decreasing order by rank. While creating this path top-down, nodes are added as left and right children of their parent accordingly based on their keys.

Once the path R is complete, the root of the path will replace x. The parent and child pointers between x and x.parent are updated accordingly, and if x was previously the root node, the top node of R is the new root.

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Tree rotation</span> A local change in a binary tree that preserves leaf order

In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape of the tree, and in particular to decrease its height by moving smaller subtrees down and larger subtrees up, resulting in improved performance of many tree operations.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Treap</span> Random search tree data structure

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue. It is an example of a mergeable heap, as it supports merging two heaps in logarithmic time. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

A skew heap is a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied:

A link/cut tree is a data structure for representing a forest, a set of rooted trees, and offers the following operations:

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

<span class="mw-page-title-main">Random binary tree</span> Binary tree selected at random

In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Different distributions have been used, leading to different properties for these trees.

In computer science, a WAVL tree or weak AVL tree is a self-balancing binary search tree. WAVL trees are named after AVL trees, another type of balanced search tree, and are closely related both to AVL trees and red–black trees, which all fall into a common framework of rank balanced trees. Like other balanced binary search trees, WAVL trees can handle insertion, deletion, and search operations in time O(log n) per operation.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

In computer science, a strict Fibonacci heap is a priority queue data structure with low worst case time bounds. It matches the amortized time bounds of the Fibonacci heap in the worst case. To achieve these time bounds, strict Fibonacci heaps maintain several invariants by performing restoring transformations after every operation. These transformations can be done in constant time by using auxiliary data structures to track invariant violations, and the pigeonhole principle guarantees that these can be fixed. Strict Fibonacci heaps were invented in 2012 by Gerth S. Brodal, George Lagogiannis, and Robert E. Tarjan.

References

  1. 1 2 Tarjan, Robert E.; Levy, Caleb C.; Timmel, Stephen (2021-10-31). "Zip Trees". ACM Transactions on Algorithms. 17 (4): 1–12. arXiv: 1806.06726 . doi:10.1145/3476830. ISSN   1549-6325.
  2. Gila, Ofek; Goodrich, Michael T.; Tarjan, Robert E. (2023). "Zip-Zip Trees: Making Zip Trees More Balanced, Biased, Compact, or Persistent". In Morin, Pat; Suri, Subhash (eds.). Algorithms and Data Structures. Lecture Notes in Computer Science. Vol. 14079. Cham: Springer Nature Switzerland. pp. 474–492. arXiv: 2307.07660 . doi:10.1007/978-3-031-38906-1_31. ISBN   978-3-031-38906-1.