Shadow heap

Last updated February 21, 2019

In computer science, a shadow heap is a mergeable heap data structure which supports efficient heap merging in the amortized sense. More specifically, shadow heaps make use of the shadow merge algorithm to achieve insertion in O(f(n)) amortized time and deletion in O((log n log log n)/f(n)) amortized time, for any choice of 1 ≤ f(n) ≤ log log n.^[1]

Computer science is the study of mathematical algorithms and processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

In computer science, a mergeable heap is an abstract data type, which is a heap supporting a merge operation.

In computer science, a data structure is a data organization, management and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

Shadow merge

Shadow merge is an algorithm for merging two binary heaps efficiently if these heaps are implemented as arrays. Specifically, the running time of shadow merge on two heaps $A$ and $B$ is $O(|A|+\min\{\log |B|\log \log |B|,\log |A|\log |B|\})$ .

In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems. Algorithms can perform calculation, data processing, and automated reasoning tasks.

Binary heap heap data structure that takes the form of a binary tree

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

In computer science, an array data structure, or simply an array, is a data structure consisting of a collection of elements, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

Algorithm

We wish to merge the two binary min-heaps $A$ and $B$ . The algorithm is as follows:

Concatenate the array $A$ at the end of the array $B$ to obtain an array $C$ .
Identify the shadow of $A$ in $C$ ; that is, the ancestors of the last $|A|$ nodes in $C$ which destroy the heap property.
Identify the following two parts of the shadow from $C$ $Shadow heap$ :
- The path $P$ : the set of nodes in the shadow for which there are at most 2 at any depth of $C$ ;
- The subtree $T$ : the remainder of the shadow.
Extract and sort the smallest $|P|$ nodes from the shadow into an array $S$ .
Transform $S$ $Shadow heap$ as follows:
- If $|S|>|C|$ , then starting from the smallest element in the sorted array, sequentially insert each element of $S$ into $C$ , replacing them with $C$ 's smallest elements.
- If $|S|\leq |C|$ , then extract and sort the $|P|$ smallest elements from $C$ , and merge this sorted list with $S$ .
Replace the elements of $S$ into their original positions in $C$ .
Make a heap out of $T$ .

Running time

Again, let $P$ denote the path, and $T$ denote the subtree of the concatenated heap $C$ . The number of nodes in $P$ is at most twice the depth of $C$ , which is $O(\log |B|)$ . Moreover, the number of nodes in $T$ at depth $d$ is at most 3/4 the number of nodes at depth $d+1$ , so the subtree has size $O(|A|)$ . Since there are at most 2 nodes at each level on $P$ , then reading the smallest $|P|$ elements of the shadow into the sorted array $S$ takes $O(\log |B|)$ time.

If $|S|>|C|$ , then combining $P$ and $C$ as in step 5 above takes time $O(\log |A|\log |B|)$ . Otherwise, the time taken in this step is $O(|A|+\log |B|\log \log |B|)$ . Finally, making a heap of the subtree $T$ takes $O(|A|)$ time. This amounts to a total running time for shadow merging of $O(|A|+\min\{\log |A|\log |B|,\log |B|\log \log |B|\})$ .

Structure

A shadow heap $H$ consists of threshold function $f(H)$ , and an array for which the usual array-implemented binary heap property is upheld in its first entries, and for which the heap property is not necessarily upheld in the other entries. Thus, the shadow heap is essentially a binary heap $B$ adjacent to an array $A$ . To add an element to the shadow heap, place it in the array $A$ . If the array becomes too large according to the specified threshold, we first build a heap out of $A$ using Floyd's algorithm for heap construction,^[2] and then merge this heap with $B$ using shadow merge. Finally, the merging of shadow heaps is simply done through sequential insertion of one heap into the other using the above insertion procedure.

Analysis

We are given a shadow heap $H=(B,A)$ , with threshold function $\log |H|\leq f(H)\leq \log |H|\log \log |H|$ as above. Suppose that the threshold function is such that any change in $|B|$ induces no larger a change than in $f(H)$ . We derive the desired running time bounds for the mergeable heap operations using the potential method for amortized analysis. The potential $\Psi (H)$ of the heap is chosen to be:

In computational complexity theory, the potential method is a method used to analyze the amortized time and space complexity of a data structure, a measure of its performance over sequences of operations that smooths out the cost of infrequent but expensive operations.

In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case run time per operation, rather than per algorithm, can be too pessimistic.

\Psi (H)=|A|(1+\min\{\log |B|\log \log |B|,\log |B|\log |A|\}/f(H))

Using this potential, we can obtain the desired amortized running times:

create(H): initializes a new empty shadow heap $H$

Here, the potential

\Psi

is unchanged, so the amortized cost of creation is

O(1)

, the actual cost.

insert(x, H): inserts $x$ into the shadow heap $H$

There are two cases:

If the merge is employed, then the drop in the potential function is exactly the actual cost of merging $B$ and $A$ , so the amortized cost is $O(1)$ .
If the merge is not done, then the amortized cost is $O(1+\min\{\log |B|\log \log |B|,\log |B|\log |A|\}/f(H))$

By choice of the threshold function, we thus obtain that the amortized cost of insertion is:

O(\log |H|\log \log |H|/f(H))

delete_min(H): deletes the minimum priority element from $H$

Finding and deleting the minimum takes actual time

O(|A|+\log |B|)

. Moreover, the potential function can only increase after this deletion if the value of

f(H)

decreases. By choice of

f

, we have that the amortized cost of this operation is the same as the actual cost.

Related algorithms & data structures

A naive binary heap merging algorithm will merge the two heaps $A$ and $B$ in time $O(|B|)$ by simply concatenating both heaps and making a heap out of the resulting array using Floyd's algorithm for heap construction. Alternatively, the heaps can simply be merged by sequentially inserting each element of $A$ into $B$ , taking time $O(|A|\log |B|)$ .

Sack and Strothotte proposed an algorithm for merging the binary heaps in $O(|A|+\log |A|\log |B|)$ time.^[3] Their algorithm is known to be more efficient than the second naive solution described above roughly when $|A|>\log |B|$ . Shadow merge performs asymptotically better than their algorithm, even in the worst case.

Jörg-Rüdiger Wolfgang Sack is a professor of computer science at Carleton University, where he holds the SUN–NSERC chair in Applied Parallel Computing. Sack received a master's degree from the University of Bonn in 1979 and a Ph.D. in 1984 from McGill University, under the supervision of Godfried Toussaint. He is co-editor-in-chief of the journals Computational Geometry: Theory and Applications and the Journal of Spatial Information Science, co-editor of the Handbook of Computational Geometry, and co-editor of the proceedings of the biennial Algorithms and Data Structures Symposium (WADS). Sack's research interests include computational geometry, parallel algorithms, and geographic information systems.

Thomas Strothotte is a German-Canadian computer scientist and university administrator living in Germany. He is the President of the Kühne Logistics University in Hamburg.

There are several other heaps which support faster merge times. For instance, Fibonacci heaps can be merged in $O(1)$ time. Since binary heaps require $\Omega (|A|)$ time to merge,^[4] shadow merge remains efficient.

Related Research Articles

AVL tree one kind of self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take $O(log n)$ time in both the average and worst cases, where $is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.$

In computer science, binary search trees (BST), sometimes called ordered or sorted binary trees, are a particular type of container: data structures that store "items" in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key.

Heap (data structure) tree-based data structure in computer science

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: if P is a parent node of C, then the key of P is either greater than or equal to or less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued, while in other implementations, ordering of elements with the same priority is undefined.

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a heap similar to a binary heap but also supports quick merging of two heaps. This is achieved by using a special tree structure. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. Binomial heaps were invented in 1978 by J. Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a self-balancingbinary search tree is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson and again by Igal Galperin and Ronald L. Rivest. It provides worst-case O(log n) lookup time, and O(log n) amortized insertion and deletion time.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node has an s-value which is the distance to the nearest leaf. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

The $d$ -ary heap or $d$ -heap is a priority queue data structure, a generalization of the binary heap in which the nodes have $d$ children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan and Jensen et al., $d$ -ary heaps were invented by Donald B. Johnson in 1975.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, a queap is a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time.

In computer science, the order-maintenance problem involves maintaining a totally ordered set supporting the following operations:

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in multiple sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. 2-way merges are also referred to as binary merges.

A weak heap is a combination of the binary heap and binomial heap data structures for implementing priority queues. It can be stored in an array as an implicit binary tree like the former, and has the efficiency guarantees of the latter.

An oblivious data structure is a data structure that gives no information about the sequence or pattern of the operations that have been applied except for the final result of the operations.

References

↑ Kuszmaul, Christopher L. (2000). Efficient Merge and Insert Operations for Binary Heaps and Trees (PDF) (Technical report). NASA Advanced Supercomputing Division. 00-003.
↑ Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae, IOS Press, 120 (1): 75–92, doi:10.3233/FI-2012-751
↑ Sack, Jörg-R.; Strothotte, Thomas (1985), "An Algorithm for Merging Heaps", Acta Informatica, Springer-Verlag, 22 (2): 171–186, doi:10.1007/BF00264229 .
↑ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). MIT Press and McGraw-Hill. ISBN 0-262-03141-8.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Kuszmaul, Christopher L. (2000). Efficient Merge and Insert Operations for Binary Heaps and Trees (PDF) (Technical report). NASA Advanced Supercomputing Division. 00-003.

[2] Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae, IOS Press, 120 (1): 75–92, doi:10.3233/FI-2012-751

[3] Sack, Jörg-R.; Strothotte, Thomas (1985), "An Algorithm for Merging Heaps", Acta Informatica, Springer-Verlag, 22 (2): 171–186, doi:10.1007/BF00264229 .

[4] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). MIT Press and McGraw-Hill. ISBN 0-262-03141-8.

v t e Data structures
Types	Collection Container
Abstract	Associative array Multimap List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set
Arrays	Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix
Linked	Association list Linked list Skip list Unrolled linked list XOR linked list
Trees	B-tree Binary search tree AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Trie Hash tree
Graphs	Binary decision diagram Directed acyclic graph Directed acyclic word graph
List of data structures