Heapsort

Last updated

Heapsort
Sorting heapsort anim.gif
A run of heapsort sorting an array of randomly permuted values. In the first stage of the algorithm the array elements are reordered to satisfy the heap property. Before the actual sorting takes place, the heap tree structure is shown briefly for illustration.
Class Sorting algorithm
Data structure Array
Worst-case performance
Best-case performance (distinct keys) [1] [2]
or (equal keys)
Average performance
Worst-case space complexity total auxiliary

In computer science, heapsort is an efficient, comparison-based sorting algorithm that reorganizes an input array into a heap (a data structure where each node is greater than its children) and then repeatedly removes the largest node from that heap, placing it at the end of the array. [3]

Contents

Although somewhat slower in practice on most machines than a well-implemented quicksort, it has the advantages of very simple implementation and a more favorable worst-case O(n log n) runtime. Most real-world quicksort variants include an implementation of heapsort as a fallback should they detect that quicksort is becoming degenerate. Heapsort is an in-place algorithm, but it is not a stable sort.

Heapsort was invented by J. W. J. Williams in 1964. [4] The paper also introduced the binary heap as a useful data structure in its own right. [5] In the same year, Robert W. Floyd published an improved version that could sort an array in-place, continuing his earlier research into the treesort algorithm. [5]

Overview

The heapsort algorithm can be divided into two phases: heap construction, and heap extraction.

The heap is an implicit data structure which takes no space beyond the array of objects to be sorted; the array is interpreted as a complete binary tree where each array element is a node and each node's parent and child links are defined by simple arithmetic on the array indexes. For a zero-based array, the root node is stored at index 0, and the nodes linked to node i are

  iLeftChild(i)  = 2⋅i + 1   iRightChild(i) = 2⋅i + 2   iParent(i)     = floor((i−1) / 2) 

where the floor function rounds down to the preceding integer. For a more detailed explanation, see Binary heap § Heap implementation.

This binary tree is a max-heap when each node is greater than or equal to both of its children. Equivalently, each node is less than or equal to its parent. This rule, applied throughout the tree, results in the maximum node being located at the root of the tree.

In the first phase, a heap is built out of the data (see Binary heap § Building a heap).

In the second phase, the heap is converted to a sorted array by repeatedly removing the largest element from the heap (the root of the heap), and placing it at the end of the array. The heap is updated after each removal to maintain the heap property. Once all objects have been removed from the heap, the result is a sorted array.

Heapsort is normally performed in place. During the first phase, the array is divided into an unsorted prefix and a heap-ordered suffix (initially empty). Each step shrinks the prefix and expands the suffix. When the prefix is empty, this phase is complete. During the second phase, the array is divided into a heap-ordered prefix and a sorted suffix (initially empty). Each step shrinks the prefix and expands the suffix. When the prefix is empty, the array is sorted.

Algorithm

The heapsort algorithm begins by rearranging the array into a binary max-heap. The algorithm then repeatedly swaps the root of the heap (the greatest element remaining in the heap) with its last element, which is then declared to be part of the sorted suffix. Then the heap, which was damaged by replacing the root, is repaired so that the greatest element is again at the root. This repeats until only one value remains in the heap.

The steps are:

  1. Call the heapify() function on the array. This builds a heap from an array in O(n) operations.
  2. Swap the first element of the array (the largest element in the heap) with the final element of the heap. Decrease the considered range of the heap by one.
  3. Call the siftDown() function on the array to move the new first element to its correct place in the heap.
  4. Go back to step (2) until the remaining array is a single element.

The heapify() operation is run once, and is O(n) in performance. The siftDown() function is called n times and requires O(log n) work each time, due to its traversal starting from the root node. Therefore, the performance of this algorithm is O(n + n log n) = O(n log n).

The heart of the algorithm is the siftDown() function. This constructs binary heaps out of smaller heaps, and may be thought of in two equivalent ways:

To establish the max-heap property at the root, up to three nodes must be compared (the root and its two children), and the greatest must be made the root. This is most easily done by finding the greatest child, then comparing that child to the root. There are three cases:

  1. If there are no children (the two original heaps are empty), the heap property trivially holds, and no further action is required.
  2. If root is greater than or equal to the greatest child, the heap property holds and likewise, no further action is required.
  3. If the root is less than the greatest child, exchange the two nodes. The heap property now holds at the newly promoted node (it is greater than or equal to both of its children, and in fact greater than any descendant), but may be violated between the newly demoted ex-root and its new children. To correct this, repeat the siftDown() operation on the subtree rooted at the newly demoted ex-root.

The number of iterations in any one siftdown() call is bounded by the height of the tree, which is log2n = O(log n).

Pseudocode

The following is a simple way to implement the algorithm in pseudocode. Arrays are zero-based and swap is used to exchange two elements of the array. Movement 'down' means from the root towards the leaves, or from lower indices to higher. Note that during the sort, the largest element is at the root of the heap at a[0], while at the end of the sort, the largest element is in a[end].

procedure heapsort(a, count) isinput: an unordered array a of length count(Build the heap in array a so that largest value is at the root)     heapify(a, count)      (The following loop maintains the invariants that a[0:end−1] is a heap, andevery element a[end:count−1] beyond end is greater than everything before it,i.e. a[end:count−1] is in sorted order.)     end ← count     while end > 1 do(the heap size is reduced by one)         end ← end − 1         (a[0] is the root and largest value. The swap moves it in front of the sorted elements.)         swap(a[end], a[0])         (the swap ruined the heap property, so restore it)         siftDown(a, 0, end)

The sorting routine uses two subroutines, heapify and siftDown. The former is the common in-place heap construction routine, while the latter is a common subroutine for implementing heapify.

(Put elements of 'a' in heap order, in-place)procedure heapify(a, count) is(start is initialized to the first leaf node)(the last element in a 0-based array is at index count-1; find the parent of that element)     start ← iParent(count-1) + 1          while start > 0 do(go to the last non-heap node)         start ← start − 1         (sift down the node at index 'start' to the proper place such that all nodes below the start index are in heap order)         siftDown(a, start, count)     (after sifting down the root all nodes/elements are in heap order)(Repair the heap whose root element is at index 'start', assuming the heaps rooted at its children are valid)procedure siftDown(a, root, end) iswhile iLeftChild(root) < end do(While the root has at least one child)         child ← iLeftChild(root)       (Left child of root)(If there is a right child and that child is greater)if child+1 < end and a[child] < a[child+1] then             child ← child + 1              if a[root] < a[child] then             swap(a[root], a[child])             root ← child         (repeat to continue sifting down the child now)else(The root holds the largest element. Since we may assume the heaps rooted at the children are valid, this means that we are done.)return

The heapify procedure operates by building small heaps and repeatedly merging them using siftDown. It starts with the leaves, observing that they are trivial but valid heaps by themselves, and then adds parents. Starting with element n/2 and working backwards, each internal node is made the root of a valid heap by sifting down. The last step is sifting down the first element, after which the entire array obeys the heap property.

To see that this takes O(n) time, count the worst-case number of siftDown iterations. The last half of the array requires zero iterations, the preceding quarter requires at most one iteration, the eighth before that requires at most two iterations, the sixteenth before that requires at most three, and so on.

Looked at a different way, if we assume every siftDown call requires the maximum number of iterations, the first half of the array requires one iteration, the first quarter requires one more (total 2), the first eighth requires yet another (total 3), and so on.

This totals n/2 + n/4 + n/8 + ⋯ = n⋅(1/2 + 1/4 + 1/8 + ⋯), where the infinite sum is a well-known geometric series whose sum is 1, thus the product is simply n.

The above is an approximation. The exact worst-case number of comparisons during the heap-construction phase of heapsort is known to be equal to 2n − 2s2(n) − e2(n), where s2(n) is the number of 1 bits in the binary representation of n and e2(n) is the number of trailing 0 bits. [6] [7]

Standard implementation

Although it is convenient to think of the two phases separately, most implementations combine the two, allowing a single instance of siftDownto be expanded inline. [8] :Algorithm H Two variables (here, start and end) keep track of the bounds of the heap area. The portion of the array before start is unsorted, while the portion beginning at end is sorted. Heap construction decreases start until it is zero, after which heap extraction decreases end until it is 1 and the array is entirely sorted.

procedure heapsort(a, count) isinput: an unordered array a of length count          start ← floor(count/2)     end ← count     while end > 1 doif start > 0 then(Heap construction)             start ← start − 1         else(Heap extraction)             end ← end − 1             swap(a[end], a[0])             (The following is siftDown(a, start, end))         root ← start         while iLeftChild(root) < end do             child ← iLeftChild(root)             (If there is a right child and that child is greater)if child+1 < end and a[child] < a[child+1] then                 child ← child + 1                  if a[root] < a[child] then                 swap(a[root], a[child])                 root ← child         (repeat to continue sifting down the child now)elsebreak(return to outer loop)

Variations

Williams' heap construction

The description above uses Floyd's improved heap-construction algorithm, which operates in O(n) time and uses the same siftDown primitive as the heap-extraction phase. Although this algorithm, being both faster and simpler to program, is used by all practical heapsort implementations, Williams' original algorithm may be easier to understand, and is needed to implement a more general binary heap priority queue.

Rather than merging many small heaps, Williams' algorithm maintains one single heap at the front of the array and repeatedly appends an additional element using a siftUp primitive. Being at the end of the array, the new element is a leaf and has no children to worry about, but may violate the heap property by being greater than its parent. In this case, exchange it with its parent and repeat the test until the parent is greater or there is no parent (we have reached the root). In pseudocode, this is:

procedure siftUp(a, end) isinput:a is the array, which heap-ordered up to end-1.end is the node to sift up.while end > 0          parent := iParent(end)          if a[parent] < a[end] then(out of max-heap order)              swap(a[parent], a[end])              end := parent          (continue sifting up)elsereturnprocedure heapify(a, count) is(start with a trivial single-element heap)      end := 1            while end < count          (sift up the node at index end to the proper place such that all nodes above the end index are in heap order)          siftUp(a, end)          end := end + 1      (after sifting up the last node all nodes are in heap order)
Difference in time complexity between the "siftDown" version and the "siftUp" version. Binary heap bottomup vs topdown.svg
Difference in time complexity between the "siftDown" version and the "siftUp" version.

To understand why this algorithm can take asymptotically more time to build a heap (O(n log n) vs. O(n) worst case), note that in Floyd's algorithm, almost all the calls to siftDown operations apply to very small heaps. Half the heaps are height-1 trivial heaps and can be skipped entirely, half of the remainder are height-2, and so on. Only two calls are on heaps of size n/2, and only one siftDown operation is done on the full n-element heap. The overall average siftDown operation takes O(1) time.

In contrast, in Williams' algorithm most of the calls to siftUp are made on large heaps of height O(log n). Half of the calls are made with a heap size of n/2 or more, three-quarters are made with a heap size of n/4 or more, and so on. Although the average number of steps is similar to Floyd's technique, [9] :3 pre-sorted input will cause the worst case: each added node is sifted up to the root, so the average call to siftUp will require approximately (log2n − 1)/2 + (log2n − 2)/4 + (log2n − 3)/8 + ⋯= log2n − (1 + 1/2 + 1/4 + ⋯)= log2n − 2 iterations.

Because it is dominated by the second heap-extraction phase, the heapsort algorithm itself has O(n log n) time complexity using either version of heapify.

Bottom-up heapsort

Bottom-up heapsort is a variant that reduces the number of comparisons required by a significant factor. While ordinary "top-down" heapsort requires 2n log2n + O(n) comparisons worst-case and on average, [10] the bottom-up variant requires n log2n + O(1) comparisons on average, [10] and 1.5n log2n + O(n) in the worst case. [11]

If comparisons are cheap (e.g. integer keys) then the difference is unimportant, [12] as top-down heapsort compares values that have already been loaded from memory. If, however, comparisons require a function call or other complex logic, then bottom-up heapsort is advantageous.

This is accomplished by using a more elaborate siftDown procedure. The change improves the linear-time heap-building phase slightly, [13] but is more significant in the second phase. Like top-down heapsort, each iteration of the second phase extracts the top of the heap, a[0], and fills the gap it leaves with a[end], then sifts this latter element down the heap. But this element came from the lowest level of the heap, meaning it is one of the smallest elements in the heap, so the sift-down will likely take many steps to move it back down. [14] In top-down heapsort, each step of siftDown requires two comparisons, to find the minimum of three elements: the new node and its two children.

Bottom-up heapsort conceptually replaces the root with a value of −∞ and sifts it down using only one comparison per level (since no child can possibly be less than −∞) until the leaves are reached, then replaces the −∞ with the correct value and sifts it up (again, using one comparison per level) until the correct position is found.

This places the root in the same location as top-down siftDown, but fewer comparisons are required to find that location. For any single siftDown operation, the bottom-up technique is advantageous if the number of downward movements is at least 23 of the height of the tree (when the number of comparisons is 43 times the height for both techniques), and it turns out that this is more than true on average, even for worst-case inputs. [11]

A naïve implementation of this conceptual algorithm would cause some redundant data copying, as the sift-up portion undoes part of the sifting down. A practical implementation searches downward for a leaf where −∞ would be placed, then upward for where the root should be placed. Finally, the upward traversal continues to the root's starting position, performing no more comparisons but exchanging nodes to complete the necessary rearrangement. This optimized form performs the same number of exchanges as top-down siftDown.

Because it goes all the way to the bottom and then comes back up, it is called heapsort with bounce by some authors. [15]

function leafSearch(a, i, end) is     j ← i     while iRightChild(j) < end do(Determine which of j's two children is the greater)if a[iRightChild(j)] > a[iLeftChild(j)] then             j ← iRightChild(j)         else             j ← iLeftChild(j)     (At the last level, there might be only one child)if iLeftChild(j) < end then         j ← iLeftChild(j)     return j

The return value of the leafSearch is used in the modified siftDown routine: [11]

procedure siftDown(a, i, end) is     j ← leafSearch(a, i, end)     while a[i] > a[j] do         j ← iParent(j)     while j > i do         swap(a[i], a[j])         j ← iParent(j)

Bottom-up heapsort was announced as beating quicksort (with median-of-three pivot selection) on arrays of size ≥16000. [10]

A 2008 re-evaluation of this algorithm showed it to be no faster than top-down heapsort for integer keys, presumably because modern branch prediction nullifies the cost of the predictable comparisons that bottom-up heapsort manages to avoid. [12]

A further refinement does a binary search in the upward search, and sorts in a worst case of (n+1)(log2(n+1) + log2 log2(n+1) + 1.82) + O(log2n) comparisons, approaching the information-theoretic lower bound of n log2n − 1.4427n comparisons. [16]

A variant that uses two extra bits per internal node (n−1 bits total for an n-element heap) to cache information about which child is greater (two bits are required to store three cases: left, right, and unknown) [13] uses less than n log2n + 1.1n compares. [17]

Other variations

Comparison with other sorts

Heapsort primarily competes with quicksort, another very efficient general purpose in-place unstable comparison-based sort algorithm.

Heapsort's primary advantages are its simple, non-recursive code, minimal auxiliary storage requirement, and reliably good performance: its best and worst cases are within a small constant factor of each other, and of the theoretical lower bound on comparison sorts. While it cannot do better than O(n log n) for pre-sorted inputs, it does not suffer from quicksort's O(n2) worst case, either.

Real-world quicksort implementations use a variety of heuristics to avoid the worst case, but that makes their implementation far more complex, and implementations such as introsort and pattern-defeating quicksort [29] use heapsort as a last-resort fallback if they detect degenerate behaviour. Thus, their worst-case performance is slightly worse than if heapsort had been used from the beginning.

Heapsort's primary disadvantages are its poor locality of reference and its inherently serial nature; the accesses to the implicit tree are widely scattered and mostly random, and there is no straightforward way to convert it to a parallel algorithm.

The worst-case performance guarantees make heapsort popular in real-time computing, and systems concerned with maliciously chosen inputs [30] such as the Linux kernel. [31] The combination of small implementation and dependably "good enough" performance make it popular in embedded systems, and generally any application where sorting is not a performance bottleneck. E.g. heapsort is ideal for sorting a list of filenames for display, but a database management system would probably want a more aggressively optimized sorting algorithm.

A well-implemented quicksort is usually 2–3 times faster than heapsort. [19] [32] Although quicksort requires fewer comparisons, this is a minor factor. (Results claiming twice as many comparisons are measuring the top-down version; see § Bottom-up heapsort.) The main advantage of quicksort is its much better locality of reference: partitioning is a linear scan with good spatial locality, and the recursive subdivision has good temporal locality. With additional effort, quicksort can also be implemented in mostly branch-free code, [33] [34] and multiple CPUs can be used to sort subpartitions in parallel. Thus, quicksort is preferred when the additional performance justifies the implementation effort.

The other major O(n log n) sorting algorithm is merge sort, but that rarely competes directly with heapsort because it is not in-place. Merge sort's requirement for Ω(n) extra space (roughly half the size of the input) is usually prohibitive except in the situations where merge sort has a clear advantage:

Example

The examples sort the values { 6, 5, 3, 1, 8, 7, 2, 4 } in increasing order using both heap-construction algorithms. The elements being compared are shown in a bold font. There are typically two when sifting up, and three when sifting down, although there may be fewer when the top or bottom of the tree is reached.

Heap construction (Williams' algorithm)

An example of heapsort using Williams' heap-construction algorithm. Heapsort-example.gif
An example of heapsort using Williams' heap-construction algorithm.
HeapUnsortedSwap elements
65, 3, 1, 8, 7, 2, 4
6, 53, 1, 8, 7, 2, 4
6, 5, 31, 8, 7, 2, 4
6, 5, 3, 18, 7, 2, 4
6, 5, 3, 1, 87, 2, 45 ↔ 8
6, 8, 3, 1, 57, 2, 46 ↔ 8
8, 6, 3, 1, 57, 2, 4
8, 6, 3, 1, 5, 72, 43 ↔ 7
8, 6, 7, 1, 5, 32, 4
8, 6, 7, 1, 5, 3, 24
8, 6, 7, 1, 5, 3, 2, 41 ↔ 4
8, 6, 7, 4, 5, 3, 2, 1

Heap construction (Floyd's algorithm)

UnsortedHeapSwap elements
6, 5, 3, 18, 7, 2, 4
6, 5, 31, 8, 7, 2, 41 ↔ 4
6, 5, 34, 8, 7, 2, 1
6, 53, 4, 8, 7, 2, 13 ↔ 7
6, 57, 4, 8, 3, 2, 1
65, 7, 4, 8, 3, 2, 15 ↔ 8
68, 7, 4, 5, 3, 2, 1
6, 8, 7, 4, 5, 3, 2, 16 ↔ 8
8, 6, 7, 4, 5, 3, 2, 1

Heap extraction

HeapSorted arraySwap elementsDetails
8, 6, 7, 4, 5, 3, 2, 18 ↔ 1Add 8 to the sorted array by swapping it with 1
1, 6, 7, 4, 5, 3, 281 ↔ 7Swap 1 and 7 as they are not in order in the heap
7, 6, 1, 4, 5, 3, 281 ↔ 3Swap 1 and 3 as they are not in order in the heap
7, 6, 3, 4, 5, 1, 281 has no children; siftDown complete
7, 6, 3, 4, 5, 1, 287 ↔ 2Add 7 to the sorted array by swapping it with 2
2, 6, 3, 4, 5, 17, 82 ↔ 6Swap 2 and 6 as they are not in order in the heap
6, 2, 3, 4, 5, 17, 82 ↔ 5Swap 2 and 5 as they are not in order in the heap
6, 5, 3, 4, 2, 17, 82 has no children; siftDown complete
6, 5, 3, 4, 2, 17, 86 ↔ 1Add 6 to the sorted array by swapping it with 1
1, 5, 3, 4, 26, 7, 81 ↔ 5Swap 1 and 5 as they are not in order in the heap
5, 1, 3, 4, 26, 7, 81 ↔ 4Swap 1 and 4 as they are not in order in the heap
5, 4, 3, 1, 26, 7, 81 has no children; siftDown complete
5, 4, 3, 1, 26, 7, 85 ↔ 2Add 5 to the sorted array by swapping it with 2
2, 4, 3, 15, 6, 7, 82 ↔ 4Swap 2 and 4 as they are not in order in the heap
4, 2, 3, 15, 6, 7, 82 is greater than 1; siftDown complete
4, 2, 3, 15, 6, 7, 84 ↔ 1Add 4 to the sorted array by swapping it with 1
1, 2, 34, 5, 6, 7, 81 ↔ 3Swap 1 and 3 as they are not in order in the heap
3, 2, 14, 5, 6, 7, 81 has no children; siftDown complete
3, 2, 14, 5, 6, 7, 81 ↔ 3Add 3 to the sorted array by swapping it with 1
1, 23, 4, 5, 6, 7, 81 ↔ 2Swap 1 and 2 as they are not in order in the heap
2, 13, 4, 5, 6, 7, 81 has no children; siftDown complete
2, 13, 4, 5, 6, 7, 82 ↔ 1Add 2 to the sorted array by swapping it with 1
12, 3, 4, 5, 6, 7, 8Add 1 to the sorted array
1, 2, 3, 4, 5, 6, 7, 8Completed

Notes

  1. Bollobás, B.; Fenner, T. I.; Frieze, A. M. (1996). "On the Best Case of Heapsort" (PDF). Journal of Algorithms. 20 (11): 205–217. doi:10.1006/jagm.1996.0011.
  2. Sedgewick, Robert; Schaffer, Russel W. (October 1990). The Best Case of Heapsort (Technical report). Princeton University. TR-293-90.
  3. Cormen, Thomas H.; Leiserson, Charles Eric; Rivest, Ronald L.; Stein, Clifford (2022). Introduction to algorithms (4th ed.). Cambridge, Massachusetts: The MIT Press. p. 170. ISBN   978-0-262-04630-5.
  4. Williams 1964
  5. 1 2 Brass, Peter (2008). Advanced Data Structures. Cambridge University Press. p. 209. ISBN   978-0-521-88037-4.
  6. Suchenek, Marek A. (2012). "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program". Fundamenta Informaticae . 120 (1): 75–92. doi:10.3233/FI-2012-751.
  7. Suchenek, Marek A. (7 April 2015). "A Complete Worst-Case Analysis of Heapsort with Experimental Verification of Its Results, A manuscript". arXiv: 1504.01459 [cs.DS].
  8. Knuth 1997
  9. 1 2 3 Bojesen, Jesper; Katajainen, Jyrki; Spork, Maz (2000). "Performance Engineering Case Study: Heap Construction" (PostScript). ACM Journal of Experimental Algorithmics. 5 (15): 15–es. CiteSeerX   10.1.1.35.3248 . doi:10.1145/351827.384257. S2CID   30995934. Alternate PDF source.
  10. 1 2 3 Wegener, Ingo (13 September 1993). "BOTTOM-UP HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small)" (PDF). Theoretical Computer Science. 118 (1): 81–98. doi: 10.1016/0304-3975(93)90364-y . Although this is a reprint of work first published in 1990 (at the Mathematical Foundations of Computer Science conference), the technique was published by Carlsson in 1987. [16]
  11. 1 2 3 Fleischer, Rudolf (February 1994). "A tight lower bound for the worst case of Bottom-Up-Heapsort" (PDF). Algorithmica. 11 (2): 104–115. doi:10.1007/bf01182770. hdl: 11858/00-001M-0000-0014-7B02-C . S2CID   21075180. Also available as Fleischer, Rudolf (April 1991). A tight lower bound for the worst case of Bottom-Up-Heapsort (PDF) (Technical report). MPI-INF. MPI-I-91-104.
  12. 1 2 Mehlhorn, Kurt; Sanders, Peter (2008). "Priority Queues" (PDF). Algorithms and Data Structures: The Basic Toolbox. Springer. p. 142. ISBN   978-3-540-77977-3.
  13. 1 2 McDiarmid, C. J. H.; Reed, B. A. (September 1989). "Building heaps fast" (PDF). Journal of Algorithms. 10 (3): 352–365. doi:10.1016/0196-6774(89)90033-3.
  14. 1 2 MacKay, David J. C. (December 2005). "Heapsort, Quicksort, and Entropy" . Retrieved 12 February 2021.
  15. Moret, Bernard; Shapiro, Henry D. (1991). "8.6 Heapsort". Algorithms from P to NP Volume 1: Design and Efficiency. Benjamin/Cummings. p. 528. ISBN   0-8053-8008-6. For lack of a better name we call this enhanced program 'heapsort with bounce.'
  16. 1 2 Carlsson, Scante (March 1987). "A variant of heapsort with almost optimal number of comparisons" (PDF). Information Processing Letters. 24 (4): 247–250. doi:10.1016/0020-0190(87)90142-6. S2CID   28135103. Archived from the original (PDF) on 27 December 2016.
  17. Wegener, Ingo (March 1992). "The worst-case complexity of McDiarmid and Reed's variant of BOTTOM-UP HEAPSORT is less than n log n + 1.1n". Information and Computation. 97 (1): 86–96. doi: 10.1016/0890-5401(92)90005-Z .
  18. Tenenbaum, Aaron M.; Augenstein, Moshe J. (1981). "Chapter 8: Sorting". Data Structures Using Pascal. Prentice-Hall. p. 405. ISBN   0-13-196501-8. Write a sorting routine similar to the heapsort except that it uses a ternary heap.
  19. 1 2 3 LaMarca, Anthony; Ladner, Richard E. (April 1999). "The Influence of Caches on the Performance of Sorting" (PDF). Journal of Algorithms. 31 (1): 66–104. CiteSeerX   10.1.1.456.3616 . doi:10.1006/jagm.1998.0985. S2CID   206567217. See particularly figure 9c on p. 98.
  20. Chen, Jingsen; Edelkamp, Stefan; Elmasry, Amr; Katajainen, Jyrki (27–31 August 2012). "In-place Heap Construction with Optimized Comparisons, Moves, and Cache Misses" (PDF). Mathematical Foundations of Computer Science 2012. 37th international conference on Mathematical Foundations of Computer Science. Lecture Notes in Computer Science. Vol. 7464. Bratislava, Slovakia. pp. 259–270. doi:10.1007/978-3-642-32589-2_25. ISBN   978-3-642-32588-5. S2CID   1462216. Archived from the original (PDF) on 29 December 2016. See particularly Fig. 3.
  21. Cantone, Domenico; Concotti, Gianluca (1–3 March 2000). QuickHeapsort, an efficient mix of classical sorting algorithms. 4th Italian Conference on Algorithms and Complexity. Lecture Notes in Computer Science. Vol. 1767. Rome. pp. 150–162. ISBN   3-540-67159-5.
  22. Cantone, Domenico; Concotti, Gianluca (August 2002). "QuickHeapsort, an efficient mix of classical sorting algorithms" (PDF). Theoretical Computer Science. 285 (1): 25–42. doi: 10.1016/S0304-3975(01)00288-2 . Zbl   1016.68042.
  23. Diekert, Volker; Weiß, Armin (August 2016). "QuickHeapsort: Modifications and improved analysis". Theory of Computing Systems. 59 (2): 209–230. arXiv: 1209.4214 . doi:10.1007/s00224-015-9656-y. S2CID   792585.
  24. Dijkstra, Edsger W. Smoothsort – an alternative to sorting in situ (EWD-796a) (PDF). E.W. Dijkstra Archive. Center for American History, University of Texas at Austin. ( transcription )
  25. Levcopoulos, Christos; Petersson, Ola (1989). "Heapsort—Adapted for Presorted Files". WADS '89: Proceedings of the Workshop on Algorithms and Data Structures. Lecture Notes in Computer Science. Vol. 382. London, UK: Springer-Verlag. pp. 499–509. doi:10.1007/3-540-51542-9_41. ISBN   978-3-540-51542-5. Heapsort—Adapted for presorted files (Q56049336).
  26. Schwartz, Keith (27 December 2010). "CartesianTreeSort.hh". Archive of Interesting Code. Retrieved 5 March 2019.
  27. Katajainen, Jyrki (23 September 2013). Seeking for the best priority queue: Lessons learnt. Algorithm Engineering (Seminar 13391). Dagstuhl. pp. 19–20, 24.
  28. Katajainen, Jyrki (2–3 February 1998). The Ultimate Heapsort. Computing: the 4th Australasian Theory Symposium. Australian Computer Science Communications. Vol. 20, no. 3. Perth. pp. 87–96.
  29. Peters, Orson R. L. (9 June 2021). "Pattern-defeating Quicksort". arXiv: 2106.05123 [cs.DS].
    Peters, Orson R. L. "pdqsort". Github. Retrieved 2 October 2023.
  30. Morris, John (1998). "Comparing Quick and Heap Sorts". Data Structures and Algorithms (Lecture notes). University of Western Australia. Retrieved 12 February 2021.
  31. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/sort.c#n205 Linux kernel source
  32. Maus, Arne [in Norwegian] (14 May 2014). "Sorting by generating the sorting permutation, and the effect of caching on sorting". See Fig. 1 on p. 6.
  33. Edelkamp, Stefan; Weiß, Armin (30 January 2019). "BlockQuicksort: Avoiding Branch Mispredictions in Quicksort" (PDF). Journal of Experimental Algorithmics. 24 1.4. arXiv: 1604.06697 . doi:10.1145/3274660.
  34. Aumüller, Martin; Hass, Nikolaj (7–8 January 2019). Simple and Fast BlockQuicksort using Lomuto’s Partitioning Scheme. Twenty-First Workshop on Algorithm Engineering and Experiments (ALENEX). San Diego. arXiv: 1810.12047 . doi: 10.1137/1.9781611975499.2 .

Related Research Articles

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure for implementing heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n) operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

Introsort or introspective sort is a hybrid sorting algorithm that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertion sort when the number of elements is below some threshold. This combines the good parts of the three algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(n log n) runtime due to the heap sort. Since the three algorithms it uses are comparison sorts, it is also a comparison sort.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

sort is a generic function in the C++ Standard Library for doing comparison sorting. The function originated in the Standard Template Library (STL).

In computer science, adaptive heap sort is a comparison-based sorting algorithm of the adaptive sort family. It is a variant of heap sort that performs better when the data contains existing order. Published by Christos Levcopoulos and Ola Petersson in 1992, the algorithm utilizes a new measure of presortedness, Osc, as the number of oscillations. Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high.

<span class="mw-page-title-main">Tree sort</span> Type of sorting algorithm

A tree sort is a sort algorithm that builds a binary search tree from the elements to be sorted, and then traverses the tree (in-order) so that the elements come out in sorted order. Its typical use is sorting elements online: after each insertion, the set of elements seen so far is available in sorted order.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, partial sorting is a relaxed variant of the sorting problem. Total sorting is the problem of returning a list of items such that its elements all appear in order, while partial sorting is returning a list of the k smallest elements in order. The other elements may also be sorted, as in an in-place partial sort, or may be discarded, which is common in streaming partial sorts. A common practical example of partial sorting is computing the "Top 100" of some list.

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. Two-way merges are also referred to as binary merges. The k-way merge is also an external sorting algorithm.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

Proportion extend sort is an in-place, comparison-based sorting algorithm which attempts to improve on the performance, particularly the worst-case performance, of quicksort.

References