Spreadsort

Last updated

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. [1] It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, [2] and HTML documentation . [3]

Contents

Quicksort identifies a pivot element in the list and then partitions the list into two sublists, those elements less than the pivot and those greater than the pivot. Spreadsort generalizes this idea by partitioning the list into n/c partitions at each step, where n is the total number of elements in the list and c is a small constant (in practice usually between 4 and 8 when comparisons are slow, or much larger in situations where they are fast). It uses distribution-based techniques to accomplish this, first locating the minimum and maximum value in the list, and then dividing the region between them into n/c equal-sized bins. Where caching is an issue, it can help to have a maximum number of bins in each recursive division step, causing this division process to take multiple steps. Though this causes more iterations, it reduces cache misses and can make the algorithm run faster overall.

In the case where the number of bins is at least the number of elements, spreadsort degenerates to bucket sort and the sort completes. Otherwise, each bin is sorted recursively. The algorithm uses heuristic tests to determine whether each bin would be more efficiently sorted by spreadsort or some other classical sort algorithm, then recursively sorts the bin.

Like other distribution-based sorts, spreadsort has the weakness that the programmer is required to provide a means of converting each element into a numeric key, for the purpose of identifying which bin it falls in. Although it is possible to do this for arbitrary-length elements such as strings by considering each element to be followed by an infinite number of minimum values, and indeed for any datatype possessing a total order, this can be more difficult to implement correctly than a simple comparison function, especially on complex structures. Poor implementation of this value function can result in clustering that harms the algorithm's relative performance.

Performance

The worst-case performance of spreadsort is O(n log n) for small data sets, as it uses introsort as a fallback. In the case of distributions where the size of the key in bits k times 2 is roughly the square of the log of the list size n or smaller (2k < (log n)2), it does better in the worst case, achieving O(nk - log n) worst-case time for the originally published version, and O(n·((k/s) + s)) for the cache aware version. For many real sorting problems with over 1000 items, including string sorting, this asymptotic worst-case is better than O(n log n).

Experiments were done comparing an optimized version of spreadsort to the highly optimized C++ std::sort, implemented with introsort. On lists of integers and floats spreadsort shows a roughly 2–7× runtime improvement for random data on various operating systems.

In space performance, spreadsort is worse than most in-place algorithms: in its simplest form, it is not an in-place algorithm, using O(n) extra space; in experiments, about 20% more than quicksort using a c of 4–8. With a cache-aware form (as included in Boost.Sort), less memory is used and there is an upper bound on memory usage of the maximum bin count times the maximum number of recursions, which ends up being a few kilobytes times the size of the key in bytes. Although it uses asymptotically more space than the O(log n) overhead of quicksort or the O(1) overhead of heapsort, it uses considerably less space than the basic form of mergesort, which uses auxiliary space equal to the space occupied by the list.

Implementation

unsignedRoughLog2(DATATYPEinput){unsignedcharcResult=0;// The && is necessary on some compilers to avoid infinite loops; it doesn't// significantly impair performanceif(input>=0)while((input>>cResult)&&(cResult<DATA_SIZE))cResult++;elsewhile(((input>>cResult)<-1)&&(cResult<DATA_SIZE))cResult++;returncResult;}SIZETYPEGetMaxCount(unsignedlogRange,unsigneduCount){unsignedlogSize=RoughLog2Size(uCount);unsigneduRelativeWidth=(LOG_CONST*logRange)/((logSize>MAX_SPLITS)?MAX_SPLITS:logSize);// Don't try to bitshift more than the size of an elementif(DATA_SIZE<=uRelativeWidth)uRelativeWidth=DATA_SIZE-1;return1<<((uRelativeWidth<(LOG_MEAN_BIN_SIZE+LOG_MIN_SPLIT_COUNT))?(LOG_MEAN_BIN_SIZE+LOG_MIN_SPLIT_COUNT):uRelativeWidth);}voidFindExtremes(DATATYPE*Array,SIZETYPEuCount,DATATYPE&piMax,DATATYPE&piMin){SIZETYPEu;piMin=piMax=Array[0];for(u=1;u<uCount;++u){if(Array[u]>piMax)piMax=Array[u];elseif(Array[u]<piMin)piMin=Array[u];}}//---------------------SpreadSort Source-----------------Bin*SpreadSortCore(DATATYPE*Array,SIZETYPEuCount,SIZETYPE&uBinCount,DATATYPE&iMax,DATATYPE&iMin){// This step is roughly 10% of runtime but it helps avoid worst-case// behavior and improves behavior with real data.  If you know the// maximum and minimum ahead of time, you can pass those values in// and skip this step for the first iterationFindExtremes((DATATYPE*)Array,uCount,iMax,iMin);if(iMax==iMin)returnNULL;DATATYPEdivMin,divMax;SIZETYPEu;intLogDivisor;Bin*BinArray;Bin*CurrentBin;unsignedlogRange;logRange=RoughLog2Size((SIZETYPE)iMax-iMin);if((LogDivisor=logRange-RoughLog2Size(uCount)+LOG_MEAN_BIN_SIZE)<0)LogDivisor=0;// The below if statement is only necessary on systems with high memory// latency relative to processor speed (most modern processors)if((logRange-LogDivisor)>MAX_SPLITS)LogDivisor=logRange-MAX_SPLITS;divMin=iMin>>LogDivisor;divMax=iMax>>LogDivisor;uBinCount=divMax-divMin+1;// Allocate the bins and determine their sizesBinArray=calloc(uBinCount,sizeof(Bin));// Memory allocation failure check and clean return with sorted resultsif(!BinArray){printf("Using std::sort because of memory allocation failure\n");std::sort(Array,Array+uCount);returnNULL;}// Calculating the size of each bin; this takes roughly 10% of runtimefor(u=0;u<uCount;++u)BinArray[(Array[u]>>LogDivisor)-divMin].uCount++;// Assign the bin positionsBinArray[0].CurrentPosition=(DATATYPE*)Array;for(u=0;u<uBinCount-1;u++){BinArray[u+1].CurrentPosition=BinArray[u].CurrentPosition+BinArray[u].uCount;BinArray[u].uCount=BinArray[u].CurrentPosition-Array;}BinArray[u].uCount=BinArray[u].CurrentPosition-Array;// Swap into place.  This dominates runtime, especially in the swap;// std::sort calls are the other main time-user.for(u=0;u<uCount;++u){for(CurrentBin=BinArray+((Array[u]>>LogDivisor)-divMin);(CurrentBin->uCount>u);CurrentBin=BinArray+((Array[u]>>LogDivisor)-divMin))SWAP(Array+u,CurrentBin->CurrentPosition++);// Now that we've found the item belonging in this position,// increment the bucket countif(CurrentBin->CurrentPosition==Array+u)++(CurrentBin->CurrentPosition);}// If we've bucketsorted, the array is sorted and we should skip recursionif(!LogDivisor){free(BinArray);returnNULL;}returnBinArray;}voidSpreadSortBins(DATATYPE*Array,SIZETYPEuCount,SIZETYPEuBinCount,constDATATYPE&iMax,constDATATYPE&iMin,Bin*BinArray,SIZETYPEuMaxCount){SIZETYPEu;for(u=0;u<uBinCount;u++){SIZETYPEcount=(BinArray[u].CurrentPosition-Array)-BinArray[u].uCount;// Don't sort unless there are at least two items to compareif(count<2)continue;if(count<uMaxCount)std::sort(Array+BinArray[u].uCount,BinArray[u].CurrentPosition);elseSpreadSortRec(Array+BinArray[u].uCount,count);}free(BinArray);}voidSpreadSortRec(DATATYPE*Array,SIZETYPEuCount){if(uCount<2)return;DATATYPEiMax,iMin;SIZETYPEuBinCount;Bin*BinArray=SpreadSortCore(Array,uCount,uBinCount,iMax,iMin);if(!BinArray)return;SpreadSortBins(Array,uCount,uBinCount,iMax,iMin,BinArray,GetMaxCount(RoughLog2Size((SIZETYPE)iMax-iMin),uCount));}

Two Levels are as Good as Any

An interesting result for algorithms of this general type (splitting based on the radix, then comparison-based sorting) is that they are O(n) for any bounded and integrable probability density function. [4] This result can be obtained by forcing Spreadsort to always iterate one more time if the bin size after the first iteration is above some constant value. If the key density function is known to be Riemann integrable and bounded, this modification of Spreadsort can attain some performance improvement over the basic algorithm, and will have better worst-case performance. If this restriction cannot usually be depended on, this change will add a little extra runtime overhead to the algorithm and gain little. Other similar algorithms are Flashsort (which is simpler) and Adaptive Left Radix. [5] Adaptive Left Radix is apparently quite similar, the main difference being recursive behavior, with Spreadsort checking for worst-case situations and using std::sort to avoid performance problems where necessary, and Adaptive Left Radix recursing continuously until done or the data is small enough to use insertion sort.

Related Research Articles

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Shellsort</span> Sorting algorithm which uses multiple comparison intervals

Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange or sorting by insertion. The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far-apart elements, it can move some out-of-place elements into the position faster than a simple nearest-neighbor exchange. Donald Shell published the first version of this sort in 1959. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their time complexity remains an open problem.

<span class="mw-page-title-main">Bucket sort</span> Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time to O(N log N) for highly composite N (smooth numbers). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.

Introsort or introspective sort is a hybrid sorting algorithm that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertion sort when the number of elements is below some threshold. This combines the good parts of the three algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(n log n) runtime due to the heap sort. Since the three algorithms it uses are comparison sorts, it is also a comparison sort.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<span class="mw-page-title-main">Binary GCD algorithm</span>

The binary GCD algorithm, also known as Stein's algorithm or the binary Euclidean algorithm, is an algorithm that computes the greatest common divisor (GCD) of two nonnegative integers. Stein's algorithm uses simpler arithmetic operations than the conventional Euclidean algorithm; it replaces division with arithmetic shifts, comparisons, and subtraction.

<span class="mw-page-title-main">Comparison sort</span> Type of sorting algorithm that works by comparing pairs of elements

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator forms a total preorder over the data, with:

  1. if ab and bc then ac (transitivity)
  2. for all a and b, ab or ba (connexity).
<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

sort is a generic function in the C++ Standard Library for doing comparison sorting. The function originated in the Standard Template Library (STL).

An American flag sort is an efficient, in-place variant of radix sort that distributes items into buckets. Non-comparative sorting algorithms such as radix sort and American flag sort are typically used to sort large objects such as strings, for which comparison is not a unit-time operation. American flag sort iterates through the bits of the objects, considering several bits of each object at a time. For each set of bits, American flag sort makes two passes through the array of objects: first to count the number of objects that will fall in each bin, and second to place each object in its bucket. This works especially well when sorting a byte at a time, using 256 buckets. With some optimizations, it is twice as fast as quicksort for large sets of strings.

<span class="mw-page-title-main">Proxmap sort</span>

ProxmapSort, or Proxmap sort, is a sorting algorithm that works by partitioning an array of data items, or keys, into a number of "subarrays". The name is short for computing a "proximity map," which indicates for each key K the beginning of a subarray where K will reside in the final sorted order. Keys are placed into each subarray using insertion sort. If keys are "well distributed" among the subarrays, sorting occurs in linear time. The computational complexity estimates involve the number of subarrays and the proximity mapping function, the "map key," used. It is a form of bucket and radix sort.

In computer science, partial sorting is a relaxed variant of the sorting problem. Total sorting is the problem of returning a list of items such that its elements all appear in order, while partial sorting is returning a list of the k smallest elements in order. The other elements may also be sorted, as in an in-place partial sort, or may be discarded, which is common in streaming partial sorts. A common practical example of partial sorting is computing the "Top 100" of some list.

Interpolation sort is a sorting algorithm that is a kind of bucket sort. It uses an interpolation formula to assign data to the bucket. A general interpolation formula is:

Proportion extend sort is an in-place, comparison-based sorting algorithm which attempts to improve on the performance, particularly the worst-case performance, of quicksort.

References

  1. Steven J. Ross. The Spreadsort High-performance General-case Sorting Algorithm. Parallel and Distributed Processing Techniques and Applications, Volume 3, pp. 11001106. Las Vegas Nevada. 2002.
  2. "Boost.Sort github repository". boostorg/sort.
  3. "HTML Spreadsort Documentation" . Retrieved 30 August 2017.
  4. Tamminen, Markku (March 1985). "Two Levels are as Good as Any". J. Algorithms. 6 (1): 138–144. doi:10.1016/0196-6774(85)90024-0.
  5. Maus, Arne (2002). ARL, a faster in-place, cache friendly sorting algorithm (PDF) (Technical report). CiteSeerX   10.1.1.399.8357 .