D-ary heap

Last updated

The d-ary heap or d-heap is a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. [1] [2] [3] Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan [2] and Jensen et al., [4] d-ary heaps were invented by Donald B. Johnson in 1975. [1]

Contents

This data structure allows decrease priority operations to be performed more quickly than binary heaps, at the expense of slower delete minimum operations. This tradeoff leads to better running times for algorithms such as Dijkstra's algorithm in which decrease priority operations are more common than delete min operations. [1] [5] Additionally, d-ary heaps have better memory cache behavior than binary heaps, allowing them to run more quickly in practice despite having a theoretically larger worst-case running time. [6] Like binary heaps, d-ary heaps are an in-place data structure that use no additional storage beyond that needed to store the array of items in the heap. [2] [7]

Data structure

The d-ary heap consists of an array of n items, each of which has a priority associated with it. These items may be viewed as the nodes in a complete d-ary tree, listed in breadth first traversal order: the item at position 0 of the array (using zero-based numbering) forms the root of the tree, the items at positions 1 through d are its children, the next d2 items are its grandchildren, etc. Thus, the parent of the item at position i (for any i > 0) is the item at position (i 1)/d and its children are the items at positions di + 1 through di + d. According to the heap property, in a min-heap, each item has a priority that is at least as large as its parent; in a max-heap, each item has a priority that is no larger than its parent. [2] [3]

The minimum priority item in a min-heap (or the maximum priority item in a max-heap) may always be found at position 0 of the array. To remove this item from the priority queue, the last item x in the array is moved into its place, and the length of the array is decreased by one. Then, while item x and its children do not satisfy the heap property, item x is swapped with one of its children (the one with the smallest priority in a min-heap, or the one with the largest priority in a max-heap), moving it downward in the tree and later in the array, until eventually the heap property is satisfied. The same downward swapping procedure may be used to increase the priority of an item in a min-heap, or to decrease the priority of an item in a max-heap. [2] [3]

To insert a new item into the heap, the item is appended to the end of the array, and then while the heap property is violated it is swapped with its parent, moving it upward in the tree and earlier in the array, until eventually the heap property is satisfied. The same upward-swapping procedure may be used to decrease the priority of an item in a min-heap, or to increase the priority of an item in a max-heap. [2] [3]

To create a new heap from an array of n items, one may loop over the items in reverse order, starting from the item at position n 1 and ending at the item at position 0, applying the downward-swapping procedure for each item. [2] [3]

Analysis

In a d-ary heap containing n items, both the upward-swapping procedure and the downward-swapping procedure may perform as many as logdn = log n / log d swaps. In the upward-swapping procedure, each swap involves a single comparison of an item with its parent, and takes constant time. Therefore, the time to insert a new item into the heap, to decrease the priority of an item in a min-heap, or to increase the priority of an item in a max-heap, is O(log n / log d). In the downward-swapping procedure, each swap involves d comparisons and takes O(d) time: it takes d 1 comparisons to determine the minimum or maximum of the children and then one more comparison against the parent to determine whether a swap is needed. Therefore, the time to delete the root item, to increase the priority of an item in a min-heap, or to decrease the priority of an item in a max-heap, is O(d log n / log d). [2] [3]

When creating a d-ary heap from a set of n items, most of the items are in positions that will eventually hold leaves of the d-ary tree, and no downward swapping is performed for those items. At most n/d + 1 items are non-leaves, and may be swapped downwards at least once, at a cost of O(d) time to find the child to swap them with. At most n/d2 + 1 nodes may be swapped downward two times, incurring an additional O(d) cost for the second swap beyond the cost already counted in the first term, etc. Therefore, the total amount of time to create a heap in this way is

[2] [3]

The exact value of the above (the worst-case number of comparisons during the construction of d-ary heap) is known to be equal to:

, [8]

where sd(n) is the sum of all digits of the standard base-d representation of n and ed(n) is the exponent of d in the factorization of n. This reduces to

, [8]

for d = 2, and to

, [8]

for d = 3.

The space usage of the d-ary heap, with insert and delete-min operations, is linear, as it uses no extra storage other than an array containing a list of the items in the heap. [2] [7] If changes to the priorities of existing items need to be supported, then one must also maintain pointers from the items to their positions in the heap, which again uses only linear storage. [2]

Applications

When operating on a graph with m edges and n vertices, both Dijkstra's algorithm for shortest paths and Prim's algorithm for minimum spanning trees use a min-heap in which there are n delete-min operations and as many as m decrease-priority operations. By using a d-ary heap with d = m/n, the total times for these two types of operations may be balanced against each other, leading to a total time of O(m logm/nn) for the algorithm, an improvement over the O(m log n) running time of binary heap versions of these algorithms whenever the number of edges is significantly larger than the number of vertices. [1] [5] An alternative priority queue data structure, the Fibonacci heap, gives an even better theoretical running time of O(m + n log n), but in practice d-ary heaps are generally at least as fast, and often faster, than Fibonacci heaps for this application. [9]

4-heaps may perform better than binary heaps in practice, even for delete-min operations. [2] [3] Additionally, a d-ary heap typically runs much faster than a binary heap for heap sizes that exceed the size of the computer's cache memory; [10] this may be due to the binary heap implicating more cache misses or virtual memory page faults, which can consume more processing time than the extra work of the few additional comparisons at each level entailed by a d-ary heap. [6] [11]

Related Research Articles

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure for implementing heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n) operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

<span class="mw-page-title-main">Treap</span> Random search tree data structure

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue. It is an example of a mergeable heap, as it supports merging two heaps in logarithmic time. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, a skew binomial heap is a data structure for priority queue operations. It is a variant of the binomial heap that supports constant-time insertion operations in the worst case, rather than amortized time.

In computer science, the Brodal queue is a heap/priority queue structure with very low worst case time bounds: for insertion, find-minimum, meld and decrease-key and for delete-minimum and general deletion. They are the first heap variant to achieve these bounds without resorting to amortization of operational costs. Brodal queues are named after their inventor Gerth Stølting Brodal.

<span class="mw-page-title-main">Kinetic heap</span>

A Kinetic Heap is a kinetic data structure, obtained by the kinetization of a heap. It is designed to store elements where the priority is changing as a continuous function of time. As a type of kinetic priority queue, it maintains the maximum priority element stored in it. The kinetic heap data structure works by storing the elements as a tree that satisfies the following heap property – if B is a child node of A, then the priority of the element in A must be higher than the priority of the element in B. This heap property is enforced using certificates along every edge so, like other kinetic data structures, a kinetic heap also contains a priority queue to maintain certificate failure times.

In computer science, a monotone priority queue is a variant of the priority queue abstract data type in which the priorities of extracted items are required to form a monotonic sequence. That is, for a priority queue in which each successively extracted item is the one with the minimum priority, the minimum priority should be monotonically increasing. Conversely for a max-heap the maximum priority should be monotonically decreasing. The assumption of monotonicity arises naturally in several applications of priority queues, and can be used as a simplifying assumption to speed up certain types of priority queues.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

This is a comparison of the performance of notable data structures, as measured by the complexity of their logical operations. For a more comprehensive listing of data structures, see List of data structures.

References

  1. 1 2 3 4 Johnson, D. B. (1975), "Priority queues with update and finding minimum spanning trees", Information Processing Letters, 4 (3): 53–57, doi:10.1016/0020-0190(75)90001-0 .
  2. 1 2 3 4 5 6 7 8 9 10 11 12 Tarjan, R. E. (1983), "3.2. d-heaps", Data Structures and Network Algorithms, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 44, Society for Industrial and Applied Mathematics, pp. 34–38. Note that Tarjan uses 1-based numbering, not 0-based numbering, so his formulas for the parent and children of a node need to be adjusted when 0-based numbering is used.
  3. 1 2 3 4 5 6 7 8 Weiss, M. A. (2007), "d-heaps", Data Structures and Algorithm Analysis (2nd ed.), Addison-Wesley, p. 216, ISBN   978-0-321-37013-6 .
  4. Jensen, C.; Katajainen, J.; Vitale, F. (2004), An extended truth about heaps (PDF), archived from the original (PDF) on 2007-06-09, retrieved 2008-02-05.
  5. 1 2 Tarjan (1983), pp. 77 and 91.
  6. 1 2 Naor, D.; Martel, C. U.; Matloff, N. S. (October 1991), "Performance of priority queue structures in a virtual memory environment", Computer Journal, 34 (5): 428–437, doi: 10.1093/comjnl/34.5.428 .
  7. 1 2 Mortensen, C. W.; Pettie, S. (2005), "The complexity of implicit and space efficient priority queues", Algorithms and Data Structures: 9th International Workshop, WADS 2005, Waterloo, Canada, August 15–17, 2005, Proceedings , Lecture Notes in Computer Science, vol. 3608, Springer-Verlag, pp. 49–60, doi:10.1007/11534273_6, ISBN   978-3-540-28101-6 .
  8. 1 2 3 Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae, 120 (1), IOS Press: 75–92, doi:10.3233/FI-2012-751 .
  9. Cherkassky, Boris V.; Goldberg, Andrew V.; Radzik, Tomasz (May 1996), "Shortest paths algorithms: Theory and experimental evaluation", Mathematical Programming, 73 (2): 129–174, CiteSeerX   10.1.1.48.752 , doi:10.1007/BF02592101
  10. Larkin, Daniel; Sen, Siddhartha; Tarjan, Robert (2014). "A Back-to-Basics Empirical Study of Priority Queues". Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments: 61–72. arXiv: 1403.0252 . Bibcode:2014arXiv1403.0252L. doi:10.1137/1.9781611973198.7. ISBN   978-1-61197-319-8. S2CID   15216766.
  11. Kamp, Poul-Henning (11 June 2010), "You're doing it wrong", ACM Queue, 8 (6).