Radix sort

Last updated

Radix sort
Class Sorting algorithm
Data structure Array
Worst-case performance , where is the number of keys, and is the key length.
Worst-case space complexity
Optimalexactly correct

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

Contents

Radix sort can be applied to data that can be sorted lexicographically, be they integers, words, punch cards, playing cards, or the mail.

History

Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines. [1] Radix sorting algorithms came into common use as a way to sort punched cards as early as 1923. [2]

The first memory-efficient computer algorithm for this sorting method was developed in 1954 at MIT by Harold H. Seward. Computerized radix sorts had previously been dismissed as impractical because of the perceived need for variable allocation of buckets of unknown size. Seward's innovation was to use a linear scan to determine the required bucket sizes and offsets beforehand, allowing for a single static allocation of auxiliary memory. The linear scan is closely related to Seward's other algorithm — counting sort.

In the modern era, radix sorts are most commonly applied to collections of binary strings and integers. It has been shown in some benchmarks to be faster than other more general-purpose sorting algorithms, sometimes 50% to three times faster. [3] [4] [5]

An IBM card sorter performing a radix sort on a large set of punched cards. Cards are fed into a hopper below the operator's chin and are sorted into one of the machine's 13 output baskets, based on the data punched into one column on the cards. The crank near the input hopper is used to move the read head to the next column as the sort progresses. The rack in back holds cards from the previous sorting pass. SEACComputer 038.jpg
An IBM card sorter performing a radix sort on a large set of punched cards. Cards are fed into a hopper below the operator's chin and are sorted into one of the machine's 13 output baskets, based on the data punched into one column on the cards. The crank near the input hopper is used to move the read head to the next column as the sort progresses. The rack in back holds cards from the previous sorting pass.

Digit order

Radix sorts can be implemented to start at either the most significant digit (MSD) or least significant digit (LSD). For example, with 1234, one could start with 1 (MSD) or 4 (LSD).

LSD radix sorts typically use the following sorting order: short keys come before longer keys, and then keys of the same length are sorted lexicographically. This coincides with the normal order of integer representations, like the sequence [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. LSD sorts are generally stable sorts.

MSD radix sorts are most suitable for sorting strings or fixed-length integer representations. A sequence like [b, c, e, d, f, g, ba] would be sorted as [b, ba, c, d, e, f, g]. If lexicographic ordering is used to sort variable-length integers in base 10, then numbers from 1 to 10 would be output as [1, 10, 2, 3, 4, 5, 6, 7, 8, 9], as if the shorter keys were left-justified and padded on the right with blank characters to make the shorter keys as long as the longest key. MSD sorts are not necessarily stable if the original ordering of duplicate keys must always be maintained.

Other than the traversal order, MSD and LSD sorts differ in their handling of variable length input. LSD sorts can group by length, radix sort each group, then concatenate the groups in size order. MSD sorts must effectively 'extend' all shorter keys to the size of the largest key and sort them accordingly, which can be more complicated than the grouping required by LSD.

However, MSD sorts are more amenable to subdivision and recursion. Each bucket created by an MSD step can itself be radix sorted using the next most significant digit, without reference to any other buckets created in the previous step. Once the last digit is reached, concatenating the buckets is all that is required to complete the sort.

Examples

Least significant digit

Input list:

[170, 45, 75, 90, 2, 802, 2, 66]

Starting from the rightmost (last) digit, sort the numbers based on that digit:

[{170, 90}, {2, 802, 2}, {45, 75}, {66}]

Sorting by the next left digit:

[{02, 802, 02}, {45}, {66}, {170, 75}, {90}]
Notice that an implicit digit 0 is prepended for the two 2s so that 802 maintains its position between them.

And finally by the leftmost digit:

[{002, 002, 045, 066, 075, 090}, {170}, {802}]
Notice that a 0 is prepended to all of the 1- or 2-digit numbers.

Each step requires just a single pass over the data, since each item can be placed in its bucket without comparison with any other element.

Some radix sort implementations allocate space for buckets by first counting the number of keys that belong in each bucket before moving keys into those buckets. The number of times that each digit occurs is stored in an array.

Although it is always possible to pre-determine the bucket boundaries using counts, some implementations opt to use dynamic memory allocation instead.

Most significant digit, forward recursive

Input list, fixed width numeric strings with leading zeros:

[170, 045, 075, 025, 002, 024, 802, 066]

First digit, with brackets indicating buckets:

[{045, 075, 025, 002, 024, 066}, {170}, {802}]
Notice that 170 and 802 are already complete because they are all that remain in their buckets, so no further recursion is needed

Next digit:

[{ {002}, {025, 024}, {045}, {066}, {075} }, 170, 802]

Final digit:

[ 002, { {024}, {025} }, 045, 066, 075 , 170, 802]

All that remains is concatenation:

[002, 024, 025, 045, 066, 075, 170, 802]

Complexity and performance

Radix sort operates in time, where is the number of keys, and is the key length. LSD variants can achieve a lower bound for of 'average key length' when splitting variable length keys into groups as discussed above.

Optimized radix sorts can be very fast when working in a domain that suits them. [6] They are constrained to lexicographic data, but for many practical applications this is not a limitation. Large key sizes can hinder LSD implementations when the induced number of passes becomes the bottleneck. [2]

Specialized variants

In-place MSD radix sort implementations

Binary MSD radix sort, also called binary quicksort, can be implemented in-place by splitting the input array into two bins - the 0s bin and the 1s bin. The 0s bin is grown from the beginning of the array, whereas the 1s bin is grown from the end of the array. The 0s bin boundary is placed before the first array element. The 1s bin boundary is placed after the last array element. The most significant bit of the first array element is examined. If this bit is a 1, then the first element is swapped with the element in front of the 1s bin boundary (the last element of the array), and the 1s bin is grown by one element by decrementing the 1s boundary array index. If this bit is a 0, then the first element remains at its current location, and the 0s bin is grown by one element. The next array element examined is the one in front of the 0s bin boundary (i.e. the first element that is not in the 0s bin or the 1s bin). This process continues until the 0s bin and the 1s bin reach each other. The 0s bin and the 1s bin are then sorted recursively based on the next bit of each array element. Recursive processing continues until the least significant bit has been used for sorting. [7] [8] Handling signed two's complement integers requires treating the most significant bit with the opposite sense, followed by unsigned treatment of the rest of the bits.

In-place MSD binary-radix sort can be extended to larger radix and retain in-place capability. Counting sort is used to determine the size of each bin and their starting index. Swapping is used to place the current element into its bin, followed by expanding the bin boundary. As the array elements are scanned the bins are skipped over and only elements between bins are processed, until the entire array has been processed and all elements end up in their respective bins. The number of bins is the same as the radix used - e.g. 16 bins for 16-radix. Each pass is based on a single digit (e.g. 4-bits per digit in the case of 16-radix), starting from the most significant digit. Each bin is then processed recursively using the next digit, until all digits have been used for sorting. [9] [10]

Neither in-place binary-radix sort nor n-bit-radix sort, discussed in paragraphs above, are stable algorithms.

Stable MSD radix sort implementations

MSD radix sort can be implemented as a stable algorithm, but requires the use of a memory buffer of the same size as the input array. This extra memory allows the input buffer to be scanned from the first array element to last, and move the array elements to the destination bins in the same order. Thus, equal elements will be placed in the memory buffer in the same order they were in the input array. The MSD-based algorithm uses the extra memory buffer as the output on the first level of recursion, but swaps the input and output on the next level of recursion, to avoid the overhead of copying the output result back to the input buffer. Each of the bins are recursively processed, as is done for the in-place MSD radix sort. After the sort by the last digit has been completed, the output buffer is checked to see if it is the original input array, and if it's not, then a single copy is performed. If the digit size is chosen such that the key size divided by the digit size is an even number, the copy at the end is avoided. [11]

Hybrid approaches

Radix sort, such as the two-pass method where counting sort is used during the first pass of each level of recursion, has a large constant overhead. Thus, when the bins get small, other sorting algorithms should be used, such as insertion sort. A good implementation of insertion sort is fast for small arrays, stable, in-place, and can significantly speed up radix sort.

Application to parallel computing

This recursive sorting algorithm has particular application to parallel computing, as each of the bins can be sorted independently. In this case, each bin is passed to the next available processor. A single processor would be used at the start (the most significant digit). By the second or third digit, all available processors would likely be engaged. Ideally, as each subdivision is fully sorted, fewer and fewer processors would be utilized. In the worst case, all of the keys will be identical or nearly identical to each other, with the result that there will be little to no advantage to using parallel computing to sort the keys.

In the top level of recursion, opportunity for parallelism is in the counting sort portion of the algorithm. Counting is highly parallel, amenable to the parallel_reduce pattern, and splits the work well across multiple cores until reaching memory bandwidth limit. This portion of the algorithm has data-independent parallelism. Processing each bin in subsequent recursion levels is data-dependent, however. For example, if all keys were of the same value, then there would be only a single bin with any elements in it, and no parallelism would be available. For random inputs all bins would be near equally populated and a large amount of parallelism opportunity would be available. [12]

There are faster parallel sorting algorithms available, for example optimal complexity O(log(n)) are those of the Three Hungarians and Richard Cole [13] [14] and Batcher's bitonic merge sort has an algorithmic complexity of O(log2(n)), all of which have a lower algorithmic time complexity to radix sort on a CREW-PRAM. The fastest known PRAM sorts were described in 1991 by David M W Powers with a parallelized quicksort that can operate in O(log(n)) time on a CRCW-PRAM with n processors by performing partitioning implicitly, as well as a radixsort that operates using the same trick in O(k), where k is the maximum keylength. [15] However, neither the PRAM architecture or a single sequential processor can actually be built in a way that will scale without the number of constant fan-out gate delays per cycle increasing as O(log(n)), so that in effect a pipelined version of Batcher's bitonic mergesort and the O(log(n)) PRAM sorts are all O(log2(n)) in terms of clock cycles, with Powers acknowledging that Batcher's would have lower constant in terms of gate delays than his Parallel quicksort and radix sort, or Cole's merge sort, for a keylength-independent sorting network of O(nlog2(n)). [16]

Tree-based radix sort

Radix sorting can also be accomplished by building a tree (or radix tree) from the input set, and doing a pre-order traversal. This is similar to the relationship between heapsort and the heap data structure. This can be useful for certain data types, see burstsort.

See also

Related Research Articles

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

Merge algorithms are a family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Bucket sort</span> Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum key value and the minimum key value, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. It is often used as a subroutine in radix sort, another sorting algorithm, which can handle larger keys more efficiently.

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

The Cooley–Tukey algorithm, named after J. W. Cooley and John Tukey, is the most common fast Fourier transform (FFT) algorithm. It re-expresses the discrete Fourier transform (DFT) of an arbitrary composite size in terms of N1 smaller DFTs of sizes N2, recursively, to reduce the computation time to O(N log N) for highly composite N (smooth numbers). Because of the algorithm's importance, specific variants and implementation styles have become known by their own names, as described below.

Introsort or introspective sort is a hybrid sorting algorithm that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertion sort when the number of elements is below some threshold. This combines the good parts of the three algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(n log n) runtime due to the heap sort. Since the three algorithms it uses are comparison sorts, it is also a comparison sort.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

An American flag sort is an efficient, in-place variant of radix sort that distributes items into buckets. Non-comparative sorting algorithms such as radix sort and American flag sort are typically used to sort large objects such as strings, for which comparison is not a unit-time operation. American flag sort iterates through the bits of the objects, considering several bits of each object at a time. For each set of bits, American flag sort makes two passes through the array of objects: first to count the number of objects that will fall in each bin, and second to place each object in its bucket. This works especially well when sorting a byte at a time, using 256 buckets. With some optimizations, it is twice as fast as quicksort for large sets of strings.

Burstsort and its variants are cache-efficient algorithms for sorting strings. They are variants of the traditional radix sort but faster for large data sets of common strings, first published in 2003, with some optimizing versions published in later years.

Flashsort is a distribution sorting algorithm showing linear computational complexity O(n) for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are.

A hybrid algorithm is an algorithm that combines two or more other algorithms that solve the same problem, either choosing one based on some characteristic of the data, or switching between them over the course of the algorithm. This is generally done to combine desired features of each, so that the overall algorithm is better than the individual components.

Multi-key quicksort, also known as three-way radix quicksort, is an algorithm for sorting strings. This hybrid of quicksort and radix sort was originally suggested by P. Shackleton, as reported in one of C.A.R. Hoare's seminal papers on quicksort; its modern incarnation was developed by Jon Bentley and Robert Sedgewick in the mid-1990s. The algorithm is designed to exploit the property that in many problems, strings tend to have shared prefixes.

References

  1. US 395781 and UK 327
  2. 1 2 Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN   0-201-89685-0. Section 5.2.5: Sorting by Distribution, pp. 168–179.
  3. "I Wrote a Faster Sorting Algorithm". 28 December 2016.
  4. "Is radix sort faster than quicksort for integer arrays?". erik.gorset.no.
  5. "Function template integer_sort - 1.62.0". www.boost.org.
  6. Sinha, Ranjan; Zobel, Justin. "Efficient Trie-Based Sorting of Large Sets of Strings". CiteSeerX   10.1.1.12.2367 . Retrieved 24 August 2023.
  7. R. Sedgewick, "Algorithms in C++", third edition, 1998, p. 424-427
  8. Duvanenko, Victor J. "Algorithm Improvement through Performance Measurement: Part 2". Dr. Dobb's.
  9. Duvanenko, Victor J. "Algorithm Improvement through Performance Measurement: Part 3". Dr. Dobb's.
  10. Duvanenko, Victor J. "Parallel In-Place Radix Sort Simplified". Dr. Dobb's.
  11. Duvanenko, Victor J. "Algorithm Improvement through Performance Measurement: Part 4". Dr. Dobb's.
  12. Duvanenko, Victor J. "Parallel In-Place N-bit-Radix Sort". Dr. Dobb's.
  13. A. Gibbons and W. Rytter, Efficient Parallel Algorithms. Cambridge University Press, 1988.
  14. H. Casanova et al, Parallel Algorithms. Chapman & Hall, 2008.
  15. David M. W. Powers, Parallelized Quicksort and Radixsort with Optimal Speedup, Proceedings of International Conference on Parallel Computing Technologies. Novosibirsk. 1991.
  16. David M. W. Powers, Parallel Unification: Practical Complexity, Australasian Computer Architecture Workshop, Flinders University, January 1995