AA tree

Last updated

An AA tree in computer science is a form of balanced tree used for storing and retrieving ordered data efficiently. AA trees are named after their originator, Swedish computer scientist Arne Andersson. [1]

Contents

AA trees are a variation of the red–black tree, a form of binary search tree which supports efficient addition and deletion of entries. Unlike red–black trees, red nodes on an AA tree can only be added as a right subchild. In other words, no red node can be a left sub-child. This results in the simulation of a 2–3 tree instead of a 2–3–4 tree, which greatly simplifies the maintenance operations. The maintenance algorithms for a red–black tree need to consider seven different shapes to properly balance the tree:

Red Black Shape Cases.svg

An AA tree on the other hand only needs to consider two shapes due to the strict requirement that only right links can be red:

AA Tree Shape Cases.svg

Balancing rotations

Whereas red–black trees require one bit of balancing metadata per node (the color), AA trees require O(log(log(N))) bits of metadata per node, in the form of an integer "level". The following invariants hold for AA trees:

  1. The level of every leaf node is one.
  2. The level of every left child is exactly one less than that of its parent.
  3. The level of every right child is equal to or one less than that of its parent.
  4. The level of every right grandchild is strictly less than that of its grandparent.
  5. Every node of level greater than one has two children.

A link where the child's level is equal to that of its parent is called a horizontal link, and is analogous to a red link in the red–black tree. Individual right horizontal links are allowed, but consecutive ones are forbidden; all left horizontal links are forbidden. These are more restrictive constraints than the analogous ones on red–black trees, with the result that re-balancing an AA tree is procedurally much simpler than re-balancing a red–black tree.

Insertions and deletions may transiently cause an AA tree to become unbalanced (that is, to violate the AA tree invariants). Only two distinct operations are needed for restoring balance: "skew" and "split". Skew is a right rotation to replace a subtree containing a left horizontal link with one containing a right horizontal link instead. Split is a left rotation and level increase to replace a subtree containing two or more consecutive right horizontal links with one containing two fewer consecutive right horizontal links. Implementation of balance-preserving insertion and deletion is simplified by relying on the skew and split operations to modify the tree only if needed, instead of making their callers decide whether to skew or split.

function skew isinput: T, a node representing an AA tree that needs to be rebalanced.     output: Another node representing the rebalanced AA tree.      if nil(T) thenreturn Nil     else if nil(left(T)) thenreturn T     else if level(left(T)) == level(T) thenSwap the pointers of horizontal left links.         L = left(T)         left(T) := right(L)         right(L) := T         return L     elsereturn T     end ifend function

Skew: AA Tree Skew2.svg

function split isinput: T, a node representing an AA tree that needs to be rebalanced.     output: Another node representing the rebalanced AA tree.      if nil(T) thenreturn Nil     else if nil(right(T)) or  nil(right(right(T))) thenreturn T     else if level(T) == level(right(right(T))) thenWe have two horizontal right links.  Take the middle node, elevate it, and return it.         R = right(T)         right(T) := left(R)         left(R) := T         level(R) := level(R) + 1         return R     elsereturn T     end ifend function

Split: AA Tree Split2.svg

Insertion

Insertion begins with the normal binary tree search and insertion procedure. Then, as the call stack unwinds (assuming a recursive implementation of the search), it's easy to check the validity of the tree and perform any rotations as necessary. If a horizontal left link arises, a skew will be performed, and if two horizontal right links arise, a split will be performed, possibly incrementing the level of the new root node of the current subtree. Note, in the code as given above, the increment of level(T). This makes it necessary to continue checking the validity of the tree as the modifications bubble up from the leaves.

function insert isinput: X, the value to be inserted, and T, the root of the tree to insert it into.     output: A balanced version T including X.      Do the normal binary tree insertion procedure. Set the result of therecursive call to the correct child in case a new node was created or theroot of the subtree changes.if nil(T) thenCreate a new leaf node with X.return node(X, 1, Nil, Nil)     else if X < value(T) then         left(T) := insert(X, left(T))     else if X > value(T) then         right(T) := insert(X, right(T))     end ifNote that the case of X == value(T) is unspecified. As given, an insertwill have no effect. The implementor may desire different behavior.Perform skew and then split. The conditionals that determine whether ornot a rotation will occur or not are inside of the procedures, as givenabove.     T := skew(T)     T := split(T)      return Tend function

Deletion

As in most balanced binary trees, the deletion of an internal node can be turned into the deletion of a leaf node by swapping the internal node with either its closest predecessor or successor, depending on which are in the tree or on the implementor's whims. Retrieving a predecessor is simply a matter of following one left link and then all of the remaining right links. Similarly, the successor can be found by going right once and left until a null pointer is found. Because of the AA property of all nodes of level greater than one having two children, the successor or predecessor node will be in level 1, making their removal trivial.

To re-balance a tree, there are a few approaches. The one described by Andersson in his original paper is the simplest, and it is described here, although actual implementations may opt for a more optimized approach. After a removal, the first step to maintaining tree validity is to lower the level of any nodes whose children are two levels below them, or who are missing children. Then, the entire level must be skewed and split. This approach was favored, because when laid down conceptually, it has three easily understood separate steps:

  1. Decrease the level, if appropriate.
  2. Skew the level.
  3. Split the level.

However, we have to skew and split the entire level this time instead of just a node, complicating our code.

function delete isinput: X, the value to delete, and T, the root of the tree from which it should be deleted.     output: T, balanced, without the value X.         if nil(T) then         return T     else if X > value(T) then         right(T) := delete(X, right(T))     else if X < value(T) then         left(T) := delete(X, left(T))     elseIf we're a leaf, easy, otherwise reduce to leaf case.if leaf(T) then             return right(T)         else if nil(left(T)) then             L := successor(T)             right(T) := delete(value(L), right(T))             value(T) := value(L)         else             L := predecessor(T)             left(T) := delete(value(L), left(T))             value(T) := value(L)         end ifend ifRebalance the tree. Decrease the level of all nodes in this level ifnecessary, and then skew and split all nodes in the new level.     T := decrease_level(T)     T := skew(T)     right(T) := skew(right(T))     if not nil(right(T))         right(right(T)) := skew(right(right(T)))     end if     T := split(T)     right(T) := split(right(T))     return T end function
function decrease_level isinput: T, a tree for which we want to remove links that skip levels.     output: T with its level decreased.      should_be = min(level(left(T)), level(right(T))) + 1     if should_be < level(T) then         level(T) := should_be         if should_be < level(right(T)) then             level(right(T)) := should_be         end ifend if     return T end function

A good example of deletion by this algorithm is present in the Andersson paper.

Performance

The performance of an AA tree is equivalent to the performance of a red–black tree. While an AA tree makes more rotations than a red–black tree, the simpler algorithms tend to be faster, and all of this balances out to result in similar performance. A red–black tree is more consistent in its performance than an AA tree, but an AA tree tends to be flatter, which results in slightly faster search times. [2]

See also

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Tree (data structure)</span> Linked node hierarchical data structure

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Tree rotation</span> A local change in a binary tree that preserves leaf order

In discrete mathematics, tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down. It is used to change the shape of the tree, and in particular to decrease its height by moving smaller subtrees down and larger subtrees up, resulting in improved performance of many tree operations.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

<span class="mw-page-title-main">Rope (data structure)</span> Data structure for storing strings

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.

<span class="mw-page-title-main">T-tree</span>

In computer science a T-tree is a type of binary tree data structure that is used by main-memory databases, such as Datablitz, eXtremeDB, MySQL Cluster, Oracle TimesTen and MobileLite.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

In computer science, a ternary search tree is a type of trie where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as an associative map structure with the ability for incremental string search. However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees include spell-checking and auto-completion.

A skew heap is a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied:

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

In computer science, an order statistic tree is a variant of the binary search tree (or more generally, a B-tree) that supports two additional operations beyond insertion, lookup and deletion:

In computer science, a randomized meldable heap is a priority queue based data structure in which the underlying structure is also a heap-ordered binary tree. However, there are no restrictions on the shape of the underlying binary tree.

In computer science, a WAVL tree or weak AVL tree is a self-balancing binary search tree. WAVL trees are named after AVL trees, another type of balanced search tree, and are closely related both to AVL trees and red–black trees, which all fall into a common framework of rank balanced trees. Like other balanced binary search trees, WAVL trees can handle insertion, deletion, and search operations in time O(log n) per operation.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

References

  1. Andersson, Arne (1993). "Balanced Search Trees made Simple". WADS '93: Proceedings of the Third Workshop on Algorithms and Data Structures. Springer-Verlag: 60–71. ISBN   3540571558.
  2. "A Disquisition on The Performance Behavior of Binary Search Tree Data Structures (pages 67-75)" (PDF). Archived from the original (PDF) on 2014-03-27. Retrieved 2010-10-17.