Weak heap

Last updated

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

Contents

A sorting algorithm using weak heaps, weak-heapsort, uses a number of comparisons that is close to the theoretical lower bound on the number of comparisons required to sort a list, so is particularly useful when comparison is expensive, such as when comparing strings using the full Unicode collation algorithm.

Description

A weak heap is most easily understood as a heap-ordered multi-way tree stored as a binary tree using the "right-child left-sibling" convention. (This is equivalent to, but reversed from, the usual left-child right-sibling binary tree.)

In the multi-way tree, and assuming a max-heap, each parent's key is greater than or equal to () all the child keys (and thus, by induction, all members of the subtree).

Expressed as a binary tree, this translates to the following invariants: [1]

The last condition is a consequence of the fact that an implicit binary tree is a complete binary tree.

The structure of this tree maps very neatly onto the traditional 1-based (Ahnentafel) implicit binary tree arrangement, where node k has a next sibling (left child) numbered 2k and a first child (right child) numbered 2k + 1, by adding an additional root numbered 0. This root has no siblings, only a first child, which is node 1 (2×0 + 1).

This structure is very similar to that of a binomial heap, with a tree of height h being composed of a root plus trees of heights h − 1, h − 2, ..., 1. A perfect (no missing leaves) weak heap with 2n elements is exactly isomorphic to a binomial heap of the same size, [2] but the two algorithms handle sizes which are not a power of 2 differently: a binomial heap uses multiple perfect trees, while a weak heap uses a single imperfect tree.

Weak heaps require the ability to exchange the left and right children (and associated subtrees) of a node. In an explicit (pointer-based) representation of the tree, this is straightforward. In an implicit (array) representation, this requires one "reverse bit" per internal node to indicate which child is considered the left child. A weak heap is thus not a strictly implicit data structure since it requires O(n) additional space (1/2 bit per node). However, it is often possible to find space for this extra bit within the node structure, such as by tagging a pointer which is already present.

In the implicit binary tree, node k with reverse bit rk has parent k/2, left child 2k + rk, and right child 2k + 1 − rk.

Viewed as a multi-way tree, each node in a weak heap is linked to two others: a "next sibling" and a "first child". In the implicit tree, the links are fixed, so which of the two links is the sibling and which the first child is indicated by the reverse bit.

Operations on weak heaps

Note that every node in a weak heap can be considered the root of a smaller weak heap by ignoring its next sibling. Nodes with no first child are automatically valid weak heaps.

A node of height h has h − 1 children: a first child of height h − 1, a second child of height h − 2, and so on to the last child of height 1. These may be found by following the first child link and then successive next sibling links.

It also has next siblings of height h − 1, h − 2, etc.

A node's parent in the multi-way tree is called its "distinguished ancestor". To find this in the binary tree, find the node's binary parent. If the node is the right child (first child), the parent is the distinguished ancestor. If the node is the left child (next sibling), its distinguished ancestor is the same as its binary parent's. In the implicit tree, finding the binary parent is easy, but its reverse bit must be consulted to determine which type of child the node is. (Early papers used the term "grandparent" for the distinguished ancestor, [3] a meaning confusingly different from the usual "parent of parent".)

Although the distinguished ancestor may be log2n levels high in the tree, the average distance is 2. (It's at least 1, and half of the time we recurse, so D = 1 + D/2, meaning that D = 2.) Thus, even a simple iterative algorithm for finding the distinguished ancestor is sufficient.

Like binomial heaps, the fundamental operation on weak heaps is merging two heaps of equal height h, to make a weak heap of height h+1. This requires exactly one comparison, between the roots. Whichever root is greater (assuming a max-heap) is the final root. Its first child is the losing root, which retains its children (right subtree). The winning root's children are installed as siblings of the losing root.

This operation can be performed on the implicit tree structure because the heaps being merged are never arbitrary. Rather, the two heaps are formed as part of sifting a node up the multi-way tree:

At the beginning, the heap invariants apply everywhere except possibly between the first root and its distinguished ancestor. All other nodes are less than or equal to their distinguished ancestors.

After comparing the two roots, the merge proceeds in one of two ways:

  1. (The distinguished ancestor is greater or equal.) Nothing needs to be moved, and the result of the merge is the distinguished ancestor.
  2. (The first root is greater.) The first root's binary children (first child and next sibling) are exchanged (using the reverse bit), and then the first root and its distinguished ancestor are exchanged (by copying).

The second case works because, in the multi-way tree, each node keeps its children with it. The first root is promoted up the tree because it is greater than its distinguished ancestor. Thus, it is safely greater than all of the ancestor's previous children.

The previous ancestor, however, is not a safe parent for the first root's old children, because it is less than the first root and so it's not guaranteed to be greater than or equal to all of its children.

By swapping the binary children, the appropriate subset of the demoted ancestor's old children (which are safely less than or equal to it) are demoted with it. The demoted ancestor's new siblings are the first root's old children, promoted, which are safely less than or equal to the promoted first root.

After this operation, it is uncertain whether the invariant is maintained between the new distinguished ancestor and its distinguished ancestor, so the operation is repeated until the root is reached.

Weak-heap sort

Weak heaps may be used to sort an array, in essentially the same way as a conventional heapsort. [3] First, a weak heap is built out of all of the elements of the array, and then the root is repeatedly exchanged with the last element, which is sifted down to its proper place.

A weak heap of n elements can be formed in n − 1 merges. It can be done on various orders, but a simple bottom-up implementation works from the end of the array to the beginning, merging each node with its distinguished ancestor. Note that finding the distinguished ancestor is simplified because the reverse bits in all parents of the heaps being merged are unmodified from their initial state ("not reversed"), and so do not need to be consulted.

As with heapsort, if the array to be sorted is larger than the CPU cache, performance is improved if subtrees are merged as soon as two of the same size become available, rather than merging all subtrees on one level before proceeding to the next. [4]

Sifting down in a weak heap can be done in h = ⌈log2n comparisons, as opposed to 2log2n for a binary heap, or 1.5log2n for the "bottom-up heapsort" variant. This is done by "merging up": after swapping the root with the last element of the heap, find the last (height 1) child of the root. Merge this with the root (its distinguished ancestor), resulting in a valid height-2 heap at the global root. Then go to the previous sibling (binary parent) of the last merged node, and merge again. Repeat until the root is reached, when it will be correct for the complete tree.

Priority queue operations

In a weak max-heap, the maximum value can be found (in constant time) as the value associated with the root node; similarly, in a weak min-heap, the minimum value can be found at the root.

As with binary heaps, weak heaps can support the typical operations of a priority queue data structure: insert, delete-min, delete, or decrease-key, in logarithmic time per operation.

Sifting up is done using the same process as in binary heaps. The new node is added at the leaf level, then compared with its distinguished ancestor and swapped if necessary (the merge operation). This is repeated until no more swaps are necessary or the root is reached.

Variants of the weak heap structure allow constant amortized time insertions and decrease-keys, matching the time for Fibonacci heaps. [2]

History and applications

Weak heaps were introduced by Dutton (1993), as part of a variant heap sort algorithm that (unlike the standard heap sort using binary heaps) could be used to sort n items using only n log2n + O(n) comparisons. [3] [5] They were later investigated as a more generally applicable priority queue data structure. [6] [7]

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

<span class="mw-page-title-main">Tree (data structure)</span> Abstract data type simulating a hierarchical tree structure and represented as a set of linked nodes

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n) operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

Adaptive Huffman coding is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

A skew heap is a heap data structure implemented as a binary tree. Skew heaps are advantageous because of their ability to merge more quickly than binary heaps. In contrast with binary heaps, there are no structural constraints, so there is no guarantee that the height of the tree is logarithmic. Only two conditions must be satisfied:

In computer science, adaptive heap sort is a comparison-based sorting algorithm of the adaptive sort family. It is a variant of heap sort that performs better when the data contains existing order. Published by Christos Levcopoulos and Ola Petersson in 1992, the algorithm utilizes a new measure of presortedness, Osc, as the number of oscillations. Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

References

  1. Edelkamp, Stefan (26 May 2011), Pieterse, Vreda; Black, Paul E. (eds.), "weak-heap", Dictionary of Algorithms and Data Structures, retrieved 2015-12-01
  2. 1 2 Edelkamp, Stefan; Elmasry, Amr; Katajainen, Jyrki (October 2012), "The weak-heap data structure: variants and applications" (PDF), Journal of Discrete Algorithms, 16: 187–205, CiteSeerX   10.1.1.455.1213 , doi:10.1016/j.jda.2012.04.010, MR   2960353 .
  3. 1 2 3 Dutton, Ronald D. (1993), "Weak-heap sort", BIT, 33 (3): 372–381, doi:10.1007/bf01990520 .
  4. Bojesen, Jesper; Katajainen, Jyrki; Spork, Maz (2000). "Performance Engineering Case Study: Heap Construction" (PostScript). ACM Journal of Experimental Algorithmics. 5 (15). CiteSeerX   10.1.1.35.3248 . doi:10.1145/351827.384257. Alternate PDF source.
  5. Edelkamp, Stefan; Wegener, Ingo (2000), "On the performance of WEAK-HEAPSORT", Stacs 2000 (PDF), Lecture Notes in Computer Science, vol. 1770, Springer-Verlag, pp. 254–266, CiteSeerX   10.1.1.21.1863 , doi:10.1007/3-540-46541-3_21, ISBN   978-3-540-67141-1 .
  6. Bruun, Asger; Edelkamp, Stefan; Katajainen, Jyrki; Rasmussen, Jens (2010). "Policy-Based Benchmarking of Weak Heaps and Their Relatives" (PDF). Proceedings of the 9th International Symposium on Experimental Algorithms (SEA 2010). Lecture Notes in Computer Science. Vol. 6049. Springer-Verlag. pp. 424–435. doi:10.1007/978-3-642-13193-6_36. ISBN   978-3-642-13192-9. Archived from the original (PDF) on 2017-08-11..
  7. Edelkamp, Stefan; Elmasry, Amr; Katajainen, Jyrki (2012), "The Weak-heap Family of Priority Queues in Theory and Praxis" (PDF), Proceedings of the Eighteenth Computing: The Australasian Theory Symposium (CATS 2012), vol. 128, Darlinghurst, Australia: Australian Computer Society, Inc., pp. 103–112, ISBN   978-1-921770-09-8 .

Further reading