Binary search tree | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type | tree | |||||||||||||||||||||||
Invented | 1960 | |||||||||||||||||||||||
Invented by | P.F. Windley, A.D. Booth, A.J.T. Colin, and T.N. Hibbard | |||||||||||||||||||||||
|
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.
Binary search trees allow binary search for fast lookup, addition, and removal of data items. Since the nodes in a BST are laid out so that each comparison skips about half of the remaining tree, the lookup performance is proportional to that of binary logarithm. BSTs were devised in the 1960s for the problem of efficient storage of labeled data and are attributed to Conway Berners-Lee and David Wheeler.
The performance of a binary search tree is dependent on the order of insertion of the nodes into the tree since arbitrary insertions may lead to degeneracy; several variations of the binary search tree can be built with guaranteed worst-case performance. The basic operations include: search, traversal, insert and delete. BSTs with guaranteed worst-case complexities perform better than an unsorted array, which would require linear search time.
The complexity analysis of BST shows that, on average, the insert, delete and search takes for nodes. In the worst case, they degrade to that of a singly linked list: . To address the boundless increase of the tree height with arbitrary insertions and deletions, self-balancing variants of BSTs are introduced to bound the worst lookup complexity to that of the binary logarithm. AVL trees were the first self-balancing binary search trees, invented in 1962 by Georgy Adelson-Velsky and Evgenii Landis.
Binary search trees can be used to implement abstract data types such as dynamic sets, lookup tables and priority queues, and used in sorting algorithms such as tree sort.
The binary search tree algorithm was discovered independently by several researchers, including P.F. Windley, Andrew Donald Booth, Andrew Colin, Thomas N. Hibbard. [1] [2] The algorithm is attributed to Conway Berners-Lee and David Wheeler, who used it for storing labeled data in magnetic tapes in 1960. [3] One of the earliest and popular binary search tree algorithm is that of Hibbard. [1]
The time complexities of a binary search tree increases boundlessly with the tree height if the nodes are inserted in an arbitrary order, therefore self-balancing binary search trees were introduced to bound the height of the tree to . [4] Various height-balanced binary search trees were introduced to confine the tree height, such as AVL trees, Treaps, and red–black trees. [5]
The AVL tree was invented by Georgy Adelson-Velsky and Evgenii Landis in 1962 for the efficient organization of information. [6] [7] It was the first self-balancing binary search tree to be invented. [8]
A binary search tree is a rooted binary tree in which nodes are arranged in strict total order in which the nodes with keys greater than any particular node A is stored on the right sub-trees to that node A and the nodes with keys equal to or less than A are stored on the left sub-trees to A, satisfying the binary search property. [9] : 298 [10] : 287
Binary search trees are also efficacious in sortings and search algorithms. However, the search complexity of a BST depends upon the order in which the nodes are inserted and deleted; since in worst case, successive operations in the binary search tree may lead to degeneracy and form a singly linked list (or "unbalanced tree") like structure, thus has the same worst-case complexity as a linked list. [11] [9] : 299-302
Binary search trees are also a fundamental data structure used in construction of abstract data structures such as sets, multisets, and associative arrays.
Searching in a binary search tree for a specific key can be programmed recursively or iteratively.
Searching begins by examining the root node. If the tree is nil, the key being searched for does not exist in the tree. Otherwise, if the key equals that of the root, the search is successful and the node is returned. If the key is less than that of the root, the search proceeds by examining the left subtree. Similarly, if the key is greater than that of the root, the search proceeds by examining the right subtree. This process is repeated until the key is found or the remaining subtree is . If the searched key is not found after a subtree is reached, then the key is not present in the tree. [10] : 290–291
The following pseudocode implements the BST search procedure through recursion. [10] : 290
Recursive-Tree-Search(x, key) if x = NIL or key = x.key thenreturn x if key < x.key thenreturn Recursive-Tree-Search(x.left, key) elsereturn Recursive-Tree-Search(x.right, key) end if |
The recursive procedure continues until a or the being searched for are encountered.
The recursive version of the search can be "unrolled" into a while loop. On most machines, the iterative version is found to be more efficient. [10] : 291
Iterative-Tree-Search(x, key) while x ≠ NIL and key ≠ x.key doif key < x.key then x := x.left else x := x.right end ifrepeatreturn x |
Since the search may proceed till some leaf node, the running time complexity of BST search is where is the height of the tree. However, the worst case for BST search is where is the total number of nodes in the BST, because an unbalanced BST may degenerate to a linked list. However, if the BST is height-balanced the height is . [10] : 290
For certain operations, given a node , finding the successor or predecessor of is crucial. Assuming all the keys of a BST are distinct, the successor of a node in a BST is the node with the smallest key greater than 's key. On the other hand, the predecessor of a node in a BST is the node with the largest key smaller than 's key. The following pseudocode finds the successor and predecessor of a node in a BST. [12] [13] [10] : 292–293
BST-Successor(x) if x.right ≠ NIL thenreturn BST-Minimum(x.right) end if y := x.parent while y ≠ NIL and x = y.right do x := y y := y.parent repeatreturn y | BST-Predecessor(x) if x.left ≠ NIL thenreturn BST-Maximum(x.left) end if y := x.parent while y ≠ NIL and x = y.left do x := y y := y.parent repeatreturn y |
Operations such as finding a node in a BST whose key is the maximum or minimum are critical in certain operations, such as determining the successor and predecessor of nodes. Following is the pseudocode for the operations. [10] : 291–292
BST-Maximum(x) while x.right ≠ NIL do x := x.right repeatreturn x | BST-Minimum(x) while x.left ≠ NIL do x := x.left repeatreturn x |
Operations such as insertion and deletion cause the BST representation to change dynamically. The data structure must be modified in such a way that the properties of BST continue to hold. New nodes are inserted as leaf nodes in the BST. [10] : 294–295 Following is an iterative implementation of the insertion operation. [10] : 294
1 BST-Insert(T, z) 2 y := NIL 3 x := T.root 4 while x ≠ NIL do 5 y := x 6 if z.key < x.key then 7 x := x.left 8 else 9 x := x.right 10 end if 11 repeat 12 z.parent := y 13 if y = NIL then 14 T.root := z 15 else if z.key < y.key then 16 y.left := z 17 else 18 y.right := z 19 end if |
The procedure maintains a "trailing pointer" as a parent of . After initialization on line 2, the while loop along lines 4-11 causes the pointers to be updated. If is , the BST is empty, thus is inserted as the root node of the binary search tree , if it is not , insertion proceeds by comparing the keys to that of on the lines 15-19 and the node is inserted accordingly. [10] : 295
The deletion of a node, say , from the binary search tree has three cases: [10] : 295-297
The following pseudocode implements the deletion operation in a binary search tree. [10] : 296-298
1 BST-Delete(BST, z) 2 if z.left = NIL then 3 Shift-Nodes(BST, z, z.right) 4 else if z.right = NIL then 5 Shift-Nodes(BST, z, z.left) 6 else 7 y := BST-Successor(z) 8 if y.parent ≠ z then 9 Shift-Nodes(BST, y, y.right) 10 y.right := z.right 11 y.right.parent := y 12 end if 13 Shift-Nodes(BST, z, y) 14 y.left := z.left 15 y.left.parent := y 16 end if |
1 Shift-Nodes(BST, u, v) 2 if u.parent = NIL then 3 BST.root := v 4 else if u = u.parent.left then 5 u.parent.left := v 5 else 6 u.parent.right := v 7 end if 8 if v ≠ NIL then 9 v.parent := u.parent 10 end if |
The procedure deals with the 3 special cases mentioned above. Lines 2-3 deal with case 1; lines 4-5 deal with case 2 and lines 6-16 for case 3. The helper function is used within the deletion algorithm for the purpose of replacing the node with in the binary search tree . [10] : 298 This procedure handles the deletion (and substitution) of from .
A BST can be traversed through three basic algorithms: inorder, preorder, and postorder tree walks. [10] : 287
Following is a recursive implementation of the tree walks. [10] : 287–289
Inorder-Tree-Walk(x) if x ≠ NIL then Inorder-Tree-Walk(x.left) visit node Inorder-Tree-Walk(x.right) end if | Preorder-Tree-Walk(x) if x ≠ NIL thenvisit node Preorder-Tree-Walk(x.left) Preorder-Tree-Walk(x.right) end if | Postorder-Tree-Walk(x) if x ≠ NIL then Postorder-Tree-Walk(x.left) Postorder-Tree-Walk(x.right) visit nodeend if |
Without rebalancing, insertions or deletions in a binary search tree may lead to degeneration, resulting in a height of the tree (where is number of items in a tree), so that the lookup performance is deteriorated to that of a linear search. [14] Keeping the search tree balanced and height bounded by is a key to the usefulness of the binary search tree. This can be achieved by "self-balancing" mechanisms during the updation operations to the tree designed to maintain the tree height to the binary logarithmic complexity. [4] [15] : 50
A tree is height-balanced if the heights of the left sub-tree and right sub-tree are guaranteed to be related by a constant factor. This property was introduced by the AVL tree and continued by the red–black tree. [15] : 50–51 The heights of all the nodes on the path from the root to the modified leaf node have to be observed and possibly corrected on every insert and delete operation to the tree. [15] : 52
In a weight-balanced tree, the criterion of a balanced tree is the number of leaves of the subtrees. The weights of the left and right subtrees differ at most by . [16] [15] : 61 However, the difference is bound by a ratio of the weights, since a strong balance condition of cannot be maintained with rebalancing work during insert and delete operations. The -weight-balanced trees gives an entire family of balance conditions, where each left and right subtrees have each at least a fraction of of the total weight of the subtree. [15] : 62
There are several self-balanced binary search trees, including T-tree, [17] treap, [18] red-black tree, [19] B-tree, [20] 2–3 tree, [21] and Splay tree. [22]
Binary search trees are used in sorting algorithms such as tree sort, where all the elements are inserted at once and the tree is traversed at an in-order fashion. [23] BSTs are also used in quicksort. [24]
Binary search trees are used in implementing priority queues, using the node's key as priorities. Adding new elements to the queue follows the regular BST insertion operation but the removal operation depends on the type of priority queue: [25]
In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.
In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.
In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.
A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.
In computer science, a trie, also called digital tree or prefix tree, is a type of search tree: specifically, a k-ary tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.
In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".
A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".
In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.
In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.
In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.
An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson.
In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.
In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.
In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Different distributions have been used, leading to different properties for these trees.
In computer science, an order statistic tree is a variant of the binary search tree (or more generally, a B-tree) that supports two additional operations beyond insertion, lookup and deletion:
In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.
In computer science, a WAVL tree or weak AVL tree is a self-balancing binary search tree. WAVL trees are named after AVL trees, another type of balanced search tree, and are closely related both to AVL trees and red–black trees, which all fall into a common framework of rank balanced trees. Like other balanced binary search trees, WAVL trees can handle insertion, deletion, and search operations in time O(log n) per operation.
In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:
In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.
The 2–3 trees defined at the close of Section 6.2.3 are equivalent to B-Trees of order 3.