Heap (data structure)

Last updated
Example of a binary max-heap with node keys being integers between 1 and 100 Max-Heap-new.svg
Example of a binary max-heap with node keys being integers between 1 and 100

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. [1] The node at the "top" of the heap (with no parents) is called the root node.

Contents

The heap is one maximally efficient implementation of an abstract data type called a priority queue, and in fact, priority queues are often referred to as "heaps", regardless of how they may be implemented. In a heap, the highest (or lowest) priority element is always stored at the root. However, a heap is not a sorted structure; it can be regarded as being partially ordered. A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority, or when insertions need to be interspersed with removals of the root node.

A common implementation of a heap is the binary heap, in which the tree is a complete [2] binary tree (see figure). The heap data structure, specifically the binary heap, was introduced by J. W. J. Williams in 1964, as a data structure for the heapsort sorting algorithm. [3] Heaps are also crucial in several efficient graph algorithms such as Dijkstra's algorithm. When a heap is a complete binary tree, it has the smallest possible height—a heap with N nodes and a branches for each node always has logaN height.

Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap relation mentioned above applies only between nodes and their parents, grandparents. The maximum number of children each node can have depends on the type of heap.

Heaps are typically constructed in-place in the same array where the elements are stored, with their structure being implicit in the access pattern of the operations. Heaps differ in this way from other data structures with similar or in some cases better theoretic bounds such as Radix trees in that they require no additional memory beyond that used for storing the keys.

Operations

The common operations involving heaps are:

Basic
Creation
Inspection
Internal

Implementation

Heaps are usually implemented with an array, as follows:

Example of a complete binary max-heap with node keys being integers from 1 to 100 and how it would be stored in an array. Heap-as-array.svg
Example of a complete binary max-heap with node keys being integers from 1 to 100 and how it would be stored in an array.

For a binary heap, in the array, the first index contains the root element. The next two indices of the array contain the root's children. The next four indices contain the four children of the root's two child nodes, and so on. Therefore, given a node at index i, its children are at indices and , and its parent is at index ⌊(i−1)/2⌋. This simple indexing scheme makes it efficient to move "up" or "down" the tree.

Balancing a heap is done by sift-up or sift-down operations (swapping elements which are out of order). As we can build a heap from an array without requiring extra memory (for the nodes, for example), heapsort can be used to sort an array in-place.

After an element is inserted into or deleted from a heap, the heap property may be violated, and the heap must be re-balanced by swapping elements within the array.

Although different types of heaps implement the operations differently, the most common way is as follows:

Construction of a binary (or d-ary) heap out of a given array of elements may be performed in linear time using the classic Floyd algorithm, with the worst-case number of comparisons equal to 2N − 2s2(N) − e2(N) (for a binary heap), where s2(N) is the sum of all digits of the binary representation of N and e2(N) is the exponent of 2 in the prime factorization of N. [7] This is faster than a sequence of consecutive insertions into an originally empty heap, which is log-linear. [a]

Variants

Comparison of theoretic bounds for variants

Here are time complexities [8] of various heap data structures. The abbreviation am. indicates that the given complexity is amortized, otherwise it is a worst-case complexity. For the meaning of "O(f)" and "Θ(f)" see Big O notation. Names of operations assume a max-heap.

Operationfind-maxdelete-maxincrease-keyinsertmeldmake-heap [b]
Binary [8] Θ(1)Θ(log n)Θ(log n)Θ(log n)Θ(n)Θ(n)
Skew [9] Θ(1)O(log n) am.O(log n) am.O(log n) am.O(log n) am.Θ(n) am.
Leftist [10] Θ(1)Θ(log n)Θ(log n)Θ(log n)Θ(log n)Θ(n)
Binomial [8] [12] Θ(1)Θ(log n)Θ(log n)Θ(1) am.Θ(log n) [c] Θ(n)
Skew binomial [13] Θ(1)Θ(log n)Θ(log n)Θ(1)Θ(log n) [c] Θ(n)
2–3 heap [15] Θ(1)O(log n) am.Θ(1)Θ(1) am.O(log n) [c] Θ(n)
Bottom-up skew [9] Θ(1)O(log n) am.O(log n) am.Θ(1) am.Θ(1) am.Θ(n) am.
Pairing [16] Θ(1)O(log n) am.o(log n) am. [d] Θ(1)Θ(1)Θ(n)
Rank-pairing [19] Θ(1)O(log n) am.Θ(1) am.Θ(1)Θ(1)Θ(n)
Fibonacci [8] [20] Θ(1)O(log n) am.Θ(1) am.Θ(1)Θ(1)Θ(n)
Strict Fibonacci [21] [e] Θ(1)Θ(log n)Θ(1)Θ(1)Θ(1)Θ(n)
Brodal [22] [e] Θ(1)Θ(log n)Θ(1)Θ(1)Θ(1)Θ(n) [23]
  1. Each insertion takes O(log(k)) in the existing size of the heap, thus . Since , a constant factor (half) of these insertions are within a constant factor of the maximum, so asymptotically we can assume ; formally the time is . This can also be readily seen from Stirling's approximation.
  2. make-heap is the operation of building a heap from a sequence of n unsorted elements. It can be done in Θ(n) time whenever meld runs in O(log n) time (where both complexities can be amortized). [9] [10] Another algorithm achieves Θ(n) for binary heaps. [11]
  3. 1 2 3 For persistent heaps (not supporting increase-key), a generic transformation reduces the cost of meld to that of insert, while the new cost of delete-max is the sum of the old costs of delete-max and meld. [14] Here, it makes meld run in Θ(1) time (amortized, if the cost of insert is) while delete-max still runs in O(log n). Applied to skew binomial heaps, it yields Brodal-Okasaki queues, persistent heaps with optimal worst-case complexities. [13]
  4. Lower bound of [17] upper bound of [18]
  5. 1 2 Brodal queues and strict Fibonacci heaps achieve optimal worst-case complexities for heaps. They were first described as imperative data structures. The Brodal-Okasaki queue is a persistent data structure achieving the same optimum, except that increase-key is not supported.

Applications

The heap data structure has many applications.

Programming language implementations

See also

Related Research Articles

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure for implementing heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n) operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

In computer science, a binomial heap is a data structure that acts as a priority queue. It is an example of a mergeable heap, as it supports merging two heaps in logarithmic time. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

The d-ary heap or d-heap is a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan and Jensen et al., d-ary heaps were invented by Donald B. Johnson in 1975.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

In computer science, a double-ended priority queue (DEPQ) or double-ended heap is a data structure similar to a priority queue or heap, but allows for efficient removal of both the maximum and minimum, according to some ordering on the keys (items) stored in the structure. Every element in a DEPQ has a priority or value. In a DEPQ, it is possible to remove the elements in both ascending as well as descending order.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, a skew binomial heap is a data structure for priority queue operations. It is a variant of the binomial heap that supports constant-time insertion operations in the worst case, rather than amortized time.

In computer science, the Brodal queue is a heap/priority queue structure with very low worst case time bounds: for insertion, find-minimum, meld and decrease-key and for delete-minimum and general deletion. They are the first heap variant to achieve these bounds without resorting to amortization of operational costs. Brodal queues are named after their inventor Gerth Stølting Brodal.

<span class="mw-page-title-main">Kinetic heap</span>

A Kinetic Heap is a kinetic data structure, obtained by the kinetization of a heap. It is designed to store elements where the priority is changing as a continuous function of time. As a type of kinetic priority queue, it maintains the maximum priority element stored in it. The kinetic heap data structure works by storing the elements as a tree that satisfies the following heap property – if B is a child node of A, then the priority of the element in A must be higher than the priority of the element in B. This heap property is enforced using certificates along every edge so, like other kinetic data structures, a kinetic heap also contains a priority queue to maintain certificate failure times.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

This is a comparison of the performance of notable data structures, as measured by the complexity of their logical operations. For a more comprehensive listing of data structures, see List of data structures.

In computer science, a strict Fibonacci heap is a priority queue data structure with low worst case time bounds. It matches the amortized time bounds of the Fibonacci heap in the worst case. To achieve these time bounds, strict Fibonacci heaps maintain several invariants by performing restoring transformations after every operation. These transformations can be done in constant time by using auxiliary data structures to track invariant violations, and the pigeonhole principle guarantees that these can be fixed. Strict Fibonacci heaps were invented in 2012 by Gerth S. Brodal, George Lagogiannis, and Robert E. Tarjan.

References

  1. Black (ed.), Paul E. (2004-12-14). Entry for heap in Dictionary of Algorithms and Data Structures . Online version. U.S. National Institute of Standards and Technology, 14 December 2004. Retrieved on 2017-10-08 from https://xlinux.nist.gov/dads/HTML/heap.html.
  2. CORMEN, THOMAS H. (2009). INTRODUCTION TO ALGORITHMS. United States of America: The MIT Press Cambridge, Massachusetts London, England. pp. 151–152. ISBN   978-0-262-03384-8.
  3. Williams, J. W. J. (1964), "Algorithm 232 - Heapsort", Communications of the ACM , 7 (6): 347–348, doi:10.1145/512274.512284
  4. The Python Standard Library, 8.4. heapq — Heap queue algorithm, heapq.heappush
  5. The Python Standard Library, 8.4. heapq — Heap queue algorithm, heapq.heappop
  6. The Python Standard Library, 8.4. heapq — Heap queue algorithm, heapq.heapreplace
  7. Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae, 120 (1), IOS Press: 75–92, doi:10.3233/FI-2012-751 .
  8. 1 2 3 4 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). MIT Press and McGraw-Hill. ISBN   0-262-03141-8.
  9. 1 2 3 Sleator, Daniel Dominic; Tarjan, Robert Endre (February 1986). "Self-Adjusting Heaps". SIAM Journal on Computing . 15 (1): 52–69. CiteSeerX   10.1.1.93.6678 . doi:10.1137/0215004. ISSN   0097-5397.
  10. 1 2 Tarjan, Robert (1983). "3.3. Leftist heaps". Data Structures and Network Algorithms. pp. 38–42. doi:10.1137/1.9781611970265. ISBN   978-0-89871-187-5.
  11. Hayward, Ryan; McDiarmid, Colin (1991). "Average Case Analysis of Heap Building by Repeated Insertion" (PDF). J. Algorithms. 12: 126–153. CiteSeerX   10.1.1.353.7888 . doi:10.1016/0196-6774(91)90027-v. Archived from the original (PDF) on 2016-02-05. Retrieved 2016-01-28.
  12. "Binomial Heap | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2019-09-30.
  13. 1 2 Brodal, Gerth Stølting; Okasaki, Chris (November 1996), "Optimal purely functional priority queues", Journal of Functional Programming, 6 (6): 839–857, doi: 10.1017/s095679680000201x
  14. Okasaki, Chris (1998). "10.2. Structural Abstraction". Purely Functional Data Structures (1st ed.). pp. 158–162. ISBN   9780521631242.
  15. Takaoka, Tadao (1999), Theory of 2–3 Heaps (PDF), p. 12
  16. Iacono, John (2000), "Improved upper bounds for pairing heaps", Proc. 7th Scandinavian Workshop on Algorithm Theory (PDF), Lecture Notes in Computer Science, vol. 1851, Springer-Verlag, pp. 63–77, arXiv: 1110.4428 , CiteSeerX   10.1.1.748.7812 , doi:10.1007/3-540-44985-X_5, ISBN   3-540-67690-2
  17. Fredman, Michael Lawrence (July 1999). "On the Efficiency of Pairing Heaps and Related Data Structures" (PDF). Journal of the Association for Computing Machinery . 46 (4): 473–501. doi:10.1145/320211.320214.
  18. Pettie, Seth (2005). Towards a Final Analysis of Pairing Heaps (PDF). FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science. pp. 174–183. CiteSeerX   10.1.1.549.471 . doi:10.1109/SFCS.2005.75. ISBN   0-7695-2468-0.
  19. Haeupler, Bernhard; Sen, Siddhartha; Tarjan, Robert E. (November 2011). "Rank-pairing heaps" (PDF). SIAM J. Computing. 40 (6): 1463–1485. doi:10.1137/100785351.
  20. Fredman, Michael Lawrence; Tarjan, Robert E. (July 1987). "Fibonacci heaps and their uses in improved network optimization algorithms" (PDF). Journal of the Association for Computing Machinery . 34 (3): 596–615. CiteSeerX   10.1.1.309.8927 . doi:10.1145/28869.28874.
  21. Brodal, Gerth Stølting; Lagogiannis, George; Tarjan, Robert E. (2012). Strict Fibonacci heaps (PDF). Proceedings of the 44th symposium on Theory of Computing - STOC '12. pp. 1177–1184. CiteSeerX   10.1.1.233.1740 . doi:10.1145/2213977.2214082. ISBN   978-1-4503-1245-5.
  22. Brodal, Gerth S. (1996), "Worst-Case Efficient Priority Queues" (PDF), Proc. 7th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 52–58
  23. Goodrich, Michael T.; Tamassia, Roberto (2004). "7.3.6. Bottom-Up Heap Construction". Data Structures and Algorithms in Java (3rd ed.). pp. 338–341. ISBN   0-471-46983-1.
  24. Frederickson, Greg N. (1993), "An Optimal Algorithm for Selection in a Min-Heap", Information and Computation (PDF), vol. 104, Academic Press, pp. 197–214, doi: 10.1006/inco.1993.1030 , archived from the original (PDF) on 2012-12-03, retrieved 2010-10-31