Adaptive heap sort

Last updated

In computer science, adaptive heap sort is a comparison-based sorting algorithm of the adaptive sort family. It is a variant of heap sort that performs better when the data contains existing order. Published by Christos Levcopoulos and Ola Petersson in 1992, the algorithm utilizes a new measure of presortedness, Osc, as the number of oscillations. [1] Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high. [1]

Contents

Heapsort

Heap sort is a sorting algorithm that utilizes binary heap data structure. The method treats an array as a complete binary tree and builds up a Max-Heap/Min-Heap to achieve sorting. [2] It usually involves the following four steps.

  1. Build a Max-Heap(Min-Heap): put all the data into the heap so that all nodes are either greater than or equal (less than or equal to for Min-Heap) to each of its child nodes.
  2. Swap the first element of the heap with the last element of the heap.
  3. Remove the last element from the heap and put it at the end of the list. Adjust the heap so that the first element ends up at the right place in the heap.
  4. Repeat Step 2 and 3 until the heap has only one element. Put this last element at the end of the list and output the list. The data in the list will be sorted.

Below is a C/C++ implementation that builds up a Max-Heap and sorts the array after the heap is built.

/*A C/C++ sample heap sort code that sort an array to an increasing order*/// A function that build up a max-heap binary treevoidheapify(intarray[],intstart,intend){intparent=start;intchild=parent*2+1;while(child<=end){if(child+1<=end)// when there are two child nodes{if(array[child+1]>array[child]){child++;//take the bigger child node}}if(array[parent]>array[child]){return;//if the parent node is greater, then it's already heapified}if(array[parent]<array[child])// when child node is greater than parent node{swap(array[parent],array[child]);// switch parent and child nodeparent=child;child=child*2+1;//continue the loop, compare the child node and its child nodes}}}// heap_sort functionvoidheap_sort(intarray[],intlen){for(inti=len/2-1;i>=0;i--)//Step 1: build up the max-heap{heapify(array,i,len);}for(inti=len-1;i>=0;i--)//Step 4: repeat step 2 and 3 till finished{swap(array[0],array[i]);// Step 2: put the max at the end of the arrayheapify(array,0,i-1);// Step 3: remove the max from the tree and heapify again}}intmain(){//the array that will be sortedintarray[]={42,1283,123,654,239847,45,97,85,763,90,770,616,328,1444,911,315,38,5040,1};intarray_len=sizeof(array)/sizeof(*array);//length of the arrayheap_sort(array,array_len);return0;}

Measures of presortedness

Measures of presortedness measures the existing order in a given sequence. [3] These measures of presortedness decides the number of data that will be put in to the heap during the sorting process as well as the lower bound of running time. [4]

Oscillations (Osc)

For sequence , Cross(xi) is defined as the number edges of the line plot of X that are intersected by a horizontal line through the point (i, xi). Mathematically, it is defined as . The oscillation(Osc) of X is just the total number of intersections, defined as . [1]

Other measures

Besides the original Osc measurement, other known measures include the number of inversions Inv, the number of runs Runs, the number of blocks Block, and the measures Max, Exc and Rem. Most of these different measurements are related for adaptive heap sort. Some measures dominate the others: every Osc-optimal algorithm is Inv optimal and Runs optimal; every Inv-optimal algorithm is Max optimal; and every Block-optimal algorithm is Exc optimal and Rem optimal. [4]

Algorithm

Adaptive heap sort is a variant of heap sort that seeks optimality (asymptotically optimal) with respect to the lower bound derived with the measure of presortedness by taking advantage of the existing order in the data. In heap sort, for a data , we put all n elements into the heap and then keep extracting the maximum (or minimum) for n times. Since the time of each max-extraction action is the logarithmic in the size of the heap, the total running time of standard heap sort is . [2] For adaptive heap sort, instead of putting all the elements into the heap, only the possible maximums of the data (max-candidates) will be put into the heap so that fewer runs are required when each time we try to locate the maximum (or minimum).

First, a Cartesian tree is built from the input in time by putting the data into a binary tree and making each node in the tree is greater(or smaller) than all its children nodes, and the root of the Cartesian tree is inserted into an empty binary heap. Then repeatedly extract the maximum from the binary heap, retrieve the maximum in the Cartesian tree, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap. If the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than for inputs that are already nearly sorted. [5]

Below is an implementation in pseudo-code: [1]

Input: an array of n elements that need to be sorted  Construct the Cartesian tree l(x) Insert the root of l(x) into a heap  for i = from 1 to n {     Perform ExtractMax on the heap      if the max element extracted has any children in l(x)     {         retrieve the children in l(x)         insert the children element into the heap     } }

Drawbacks

Despite decades of research, there's still a gap between the theory of adaptive heap sort and its practical use. Because the algorithm makes use of Cartesian trees and pointer manipulation, it has low cache-efficiency and high memory requirements, both of which deteriorate the performance of implementations. [4]

See also

Related Research Articles

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer-based sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n) operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

<span class="mw-page-title-main">Comparison sort</span> Type of sorting algorithm that works by comparing pairs of elements

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator forms a total preorder over the data, with:

  1. if ab and bc then ac (transitivity)
  2. for all a and b, ab or ba (connexity).

A sorting algorithm falls into the adaptive sort family if it takes advantage of existing order in its input. It benefits from the presortedness in the input sequence – or a limited amount of disorder for various definitions of measures of disorder – and sorts faster. Adaptive sorting is usually performed by modifying existing sorting algorithms.

<span class="mw-page-title-main">Tree sort</span> Type of sorting algorithm

A tree sort is a sort algorithm that builds a binary search tree from the elements to be sorted, and then traverses the tree (in-order) so that the elements come out in sorted order. Its typical use is sorting elements online: after each insertion, the set of elements seen so far is available in sorted order.

The d-ary heap or d-heap is a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan and Jensen et al., d-ary heaps were invented by Donald B. Johnson in 1975.

<span class="mw-page-title-main">Segment tree</span> Computer science data structure

In computer science, the segment tree is a data structure used for storing information about intervals or segments. It allows querying which of the stored segments contain a given point. A similar data structure is the interval tree.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

References

  1. 1 2 3 4 Levcopoulos, C.; Petersson, O. (1993-05-01). "Adaptive Heapsort". Journal of Algorithms. 14 (3): 395–413. doi:10.1006/jagm.1993.1021. ISSN   0196-6774.
  2. 1 2 Schaffer, R.; Sedgewick, R. (1993-07-01). "The Analysis of Heapsort". Journal of Algorithms. 15 (1): 76–100. doi:10.1006/jagm.1993.1031. ISSN   0196-6774.
  3. Mannila, Heikki (April 1985). "Measures of Presortedness and Optimal Sorting Algorithms". IEEE Transactions on Computers. C-34 (4): 318–325. doi:10.1109/TC.1985.5009382. ISSN   0018-9340.
  4. 1 2 3 Edelkamp, Stefan; Elmasry, Amr; Katajainen, Jyrki (2011). Iliopoulos, Costas S.; Smyth, William F. (eds.). "Two Constant-Factor-Optimal Realizations of Adaptive Heapsort". Combinatorial Algorithms. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 7056: 195–208. doi:10.1007/978-3-642-25011-8_16. ISBN   9783642250118. S2CID   10325857.
  5. "Archive of Interesting Code". www.keithschwarz.com. Retrieved 2019-10-31.