Sorting network

Last updated
A simple sorting network consisting of four wires and five connectors SimpleSortingNetwork2.svg
A simple sorting network consisting of four wires and five connectors

In computer science, comparator networks are abstract devices built up of a fixed number of "wires", carrying values, and comparator modules that connect pairs of wires, swapping the values on the wires if they are not in a desired order. Such networks are typically designed to perform sorting on fixed numbers of values, in which case they are called sorting networks.

Contents

Sorting networks differ from general comparison sorts in that they are not capable of handling arbitrarily large inputs, and in that their sequence of comparisons is set in advance, regardless of the outcome of previous comparisons. In order to sort larger amounts of inputs, new sorting networks must be constructed. This independence of comparison sequences is useful for parallel execution and for implementation in hardware. Despite the simplicity of sorting nets, their theory is surprisingly deep and complex. Sorting networks were first studied circa 1954 by Armstrong, Nelson and O'Connor, [1] who subsequently patented the idea. [2]

Sorting networks can be implemented either in hardware or in software. Donald Knuth describes how the comparators for binary integers can be implemented as simple, three-state electronic devices. [1] Batcher, in 1968, suggested using them to construct switching networks for computer hardware, replacing both buses and the faster, but more expensive, crossbar switches. [3] Since the 2000s, sorting nets (especially bitonic mergesort) are used by the GPGPU community for constructing sorting algorithms to run on graphics processing units. [4]

Introduction

Demonstration of a comparator in a sorting network. Sorting-network-comparator-demonstration.svg
Demonstration of a comparator in a sorting network.

A sorting network consists of two types of items: comparators and wires. The wires are thought of as running from left to right, carrying values (one per wire) that traverse the network all at the same time. Each comparator connects two wires. When a pair of values, traveling through a pair of wires, encounter a comparator, the comparator swaps the values if and only if the top wire's value is greater or equal to the bottom wire's value.

In a formula, if the top wire carries x and the bottom wire carries y, then after hitting a comparator the wires carry and , respectively, so the pair of values is sorted. [5] :635 A network of wires and comparators that will correctly sort all possible inputs into ascending order is called a sorting network or Kruskal hub. By reflecting the network, it is also possible to sort all inputs into descending order.

The full operation of a simple sorting network is shown below. It is evident why this sorting network will correctly sort the inputs; note that the first four comparators will "sink" the largest value to the bottom and "float" the smallest value to the top. The final comparator sorts out the middle two wires.

SimpleSortingNetworkFullOperation.svg

Depth and efficiency

The efficiency of a sorting network can be measured by its total size, meaning the number of comparators in the network, or by its depth, defined (informally) as the largest number of comparators that any input value can encounter on its way through the network. Noting that sorting networks can perform certain comparisons in parallel (represented in the graphical notation by comparators that lie on the same vertical line), and assuming all comparisons to take unit time, it can be seen that the depth of the network is equal to the number of time steps required to execute it. [5] :636–637

Insertion and Bubble networks

We can easily construct a network of any size recursively using the principles of insertion and selection. Assuming we have a sorting network of size n, we can construct a network of size n + 1 by "inserting" an additional number into the already sorted subnet (using the principle behind insertion sort). We can also accomplish the same thing by first "selecting" the lowest value from the inputs and then sort the remaining values recursively (using the principle behind bubble sort).

A sorting network constructed recursively that first sinks the largest value to the bottom and then sorts the remaining wires. Based on bubble sort Recursive-bubble-sorting-network.svg
A sorting network constructed recursively that first sinks the largest value to the bottom and then sorts the remaining wires. Based on bubble sort
A sorting network constructed recursively that first sorts the first n wires, and then inserts the remaining value. Based on insertion sort Recursive-insertion-sorting-network.svg
A sorting network constructed recursively that first sorts the first n wires, and then inserts the remaining value. Based on insertion sort

The structure of these two sorting networks are very similar. A construction of the two different variants, which collapses together comparators that can be performed simultaneously shows that, in fact, they are identical. [1]

Bubble sorting network Six-wire-bubble-sorting-network.svg
Bubble sorting network
Insertion sorting network Six-wire-insertion-sorting-network.svg
Insertion sorting network
When allowing for parallel comparators, bubble sort and insertion sort are identical Six-wire-pyramid-sorting-network.svg
When allowing for parallel comparators, bubble sort and insertion sort are identical

The insertion network (or equivalently, bubble network) has a depth of 2n - 3, [1] where n is the number of values. This is better than the O(n log n) time needed by random-access machines, but it turns out that there are much more efficient sorting networks with a depth of just O(log2n), as described below.

Zero-one principle

While it is easy to prove the validity of some sorting networks (like the insertion/bubble sorter), it is not always so easy. There are n! permutations of numbers in an n-wire network, and to test all of them would take a significant amount of time, especially when n is large. The number of test cases can be reduced significantly, to 2n, using the so-called zero-one principle. While still exponential, this is smaller than n! for all n ≥ 4, and the difference grows quite quickly with increasing n.

The zero-one principle states that, if a sorting network can correctly sort all 2n sequences of zeros and ones, then it is also valid for arbitrary ordered inputs. This not only drastically cuts down on the number of tests needed to ascertain the validity of a network, it is of great use in creating many constructions of sorting networks as well.

The principle can be proven by first observing the following fact about comparators: when a monotonically increasing function f is applied to the inputs, i.e., x and y are replaced by f(x) and f(y), then the comparator produces min(f(x), f(y)) = f(min(x, y)) and max(f(x), f(y)) = f(max(x, y)). By induction on the depth of the network, this result can be extended to a lemma stating that if the network transforms the sequence a1, ..., an into b1, ..., bn, it will transform f(a1), ..., f(an) into f(b1), ..., f(bn). Suppose that some input a1, ..., an contains two items ai < aj, and the network incorrectly swaps these in the output. Then it will also incorrectly sort f(a1), ..., f(an) for the function

This function is monotonic, so we have the zero-one principle as the contrapositive. [5] :640–641

Constructing sorting networks

Various algorithms exist to construct sorting networks of depth O(log2n) (hence size O(n log2n)) such as Batcher odd–even mergesort, bitonic sort, Shell sort, and the Pairwise sorting network. These networks are often used in practice.

It is also possible to construct networks of depth O(log n) (hence size O(n log n)) using a construction called the AKS network, after its discoverers Ajtai, Komlós, and Szemerédi. [6] While an important theoretical discovery, the AKS network has very limited practical application because of the large linear constant hidden by the Big-O notation. [5] :653 These are partly due to a construction of an expander graph.

A simplified version of the AKS network was described by Paterson in 1990, who noted that "the constants obtained for the depth bound still prevent the construction being of practical value". [7]

A more recent construction called the zig-zag sorting network of size O(n log n) was discovered by Goodrich in 2014. [8] While its size is much smaller than that of AKS networks, its depth O(n log n) makes it unsuitable for a parallel implementation.

Optimal sorting networks

For small, fixed numbers of inputs n, optimal sorting networks can be constructed, with either minimal depth (for maximally parallel execution) or minimal size (number of comparators). These networks can be used to increase the performance of larger sorting networks resulting from the recursive constructions of, e.g., Batcher, by halting the recursion early and inserting optimal nets as base cases. [9] The following table summarizes the optimality results for small networks for which the optimal depth is known:

n1234567891011121314151617
Depth [10] 013355667788999910
Size, upper bound [11] 01359121619252935394551566071
Size, lower bound (if different) [12] 4347515560

For larger networks neither the optimal depth nor the optimal size are currently known. The bounds known so far are provided in the table below:

n181920212223242526272829303132
Depth, upper bound [10] [13] [14] 111111121212121313141414141414
Depth, lower bound [10] 101010101010101010101010101010
Size, upper bound [14] 77859199107114120131139148155165172180185
Size, lower bound [12] 65707580859095100105110115120125130135

The first sixteen depth-optimal networks are listed in Knuth's Art of Computer Programming , [1] and have been since the 1973 edition; however, while the optimality of the first eight was established by Floyd and Knuth in the 1960s, this property wasn't proven for the final six until 2014 [15] (the cases nine and ten having been decided in 1991 [9] ).

For one to twelve inputs, minimal (i.e. size-optimal) sorting networks are known, and for higher values, lower bounds on their sizes S(n) can be derived inductively using a lemma due to Van Voorhis [1] (p. 240): S(n) ≥ S(n − 1) + ⌈log2n. The first ten optimal networks have been known since 1969, with the first eight again being known as optimal since the work of Floyd and Knuth, but optimality of the cases n = 9 and n = 10 took until 2014 to be resolved. [11] The optimality of the smallest known sorting networks for n = 11 and n = 12 was resolved in 2020. [16] [1]

Some work in designing optimal sorting network has been done using genetic algorithms: D. Knuth mentions that the smallest known sorting network for n = 13 was found by Hugues Juillé in 1995 "by simulating an evolutionary process of genetic breeding" [1] (p. 226), and that the minimum depth sorting networks for n = 9 and n = 11 were found by Loren Schwiebert in 2001 "using genetic methods" [1] (p. 229).

Complexity of testing sorting networks

Unless P=NP, the problem of testing whether a candidate network is a sorting network is likely to remain difficult for networks of large sizes, due to the problem being co-NP-complete. [17]

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In Boolean logic, the majority function is the Boolean function that evaluates to false when half or more arguments are false and true otherwise, i.e. the value of the function equals the value of the majority of the inputs. Representing true values as 1 and false values as 0, we may use the (real-valued) formula:

<span class="mw-page-title-main">Search algorithm</span> Any algorithm which solves the search problem

In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with either discrete or continuous values.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Shellsort</span> Sorting algorithm which uses multiple comparison intervals

Shellsort, also known as Shell sort or Shell's method, is an in-place comparison sort. It can be seen as either a generalization of sorting by exchange or sorting by insertion. The method starts by sorting pairs of elements far apart from each other, then progressively reducing the gap between elements to be compared. By starting with far apart elements, it can move some out-of-place elements into position faster than a simple nearest neighbor exchange. Donald Shell published the first version of this sort in 1959. The running time of Shellsort is heavily dependent on the gap sequence it uses. For many practical variants, determining their time complexity remains an open problem.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<span class="mw-page-title-main">Schönhage–Strassen algorithm</span> Multiplication algorithm

The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for large integers, published by Arnold Schönhage and Volker Strassen in 1971. It works by recursively applying number-theoretic transforms over the integers modulo 2n+1. The run-time bit complexity to multiply two n-digit numbers using the algorithm is in big O notation.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".

In computer science, the longest increasing subsequence problem aims to find a subsequence of a given sequence in which the subsequence's elements are sorted in an ascending order and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous or unique. The longest increasing subsequences are studied in the context of various disciplines related to mathematics, including algorithmics, random matrix theory, representation theory, and physics. The longest increasing subsequence problem is solvable in time where denotes the length of the input sequence.

In computer science, the all nearest smaller values problem is the following task: for each position in a sequence of numbers, search among the previous positions for the last position that contains a smaller value. This problem can be solved efficiently both by parallel and non-parallel algorithms: Berkman, Schieber & Vishkin (1993), who first identified the procedure as a useful subroutine for other parallel programs, developed efficient algorithms to solve it in the Parallel Random Access Machine model; it may also be solved in linear time on a non-parallel computer using a stack-based algorithm. Later researchers have studied algorithms to solve it in other models of parallel computation.

In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

In computational complexity theory, CC (Comparator Circuits) is the complexity class containing decision problems which can be solved by comparator circuits of polynomial size.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

<i>X</i> + <i>Y</i> sorting Problem of sorting pairs of numbers by their sum

In computer science, sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.

In the design and analysis of algorithms for combinatorial optimization, parametric search is a technique invented by Nimrod Megiddo (1983) for transforming a decision algorithm into an optimization algorithm. It is frequently used for solving optimization problems in computational geometry.

In computer science, merge-insertion sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson. It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.

The Garsia–Wachs algorithm is an efficient method for computers to construct optimal binary search trees and alphabetic Huffman codes, in linearithmic time. It is named after Adriano Garsia and Michelle L. Wachs.

References

  1. 1 2 3 4 5 6 7 8 9 Knuth, D. E. (1997). The Art of Computer Programming, Volume 3: Sorting and Searching (Second ed.). Addison–Wesley. pp. 219–247. ISBN   978-0-201-89685-5. Section 5.3.4: Networks for Sorting.
  2. US 3029413,O'Connor, Daniel G.&Nelson, Raymond J.,"Sorting system with n-line sorting switch",published 10 April 1962
  3. Batcher, K. E. (1968). Sorting networks and their applications. Proc. AFIPS Spring Joint Computer Conference. pp. 307–314.
  4. Owens, J. D.; Houston, M.; Luebke, D.; Green, S.; Stone, J. E.; Phillips, J. C. (2008). "GPU Computing". Proceedings of the IEEE. 96 (5): 879–899. doi:10.1109/JPROC.2008.917757. S2CID   17091128.
  5. 1 2 3 4 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. (1990). Introduction to Algorithms (1st ed.). MIT Press and McGraw-Hill. ISBN   0-262-03141-8.
  6. Ajtai, M.; Komlós, J.; Szemerédi, E. (1983). An O(n log n) sorting network. STOC '83. Proceedings of the fifteenth annual ACM symposium on Theory of computing. pp. 1–9. doi:10.1145/800061.808726. ISBN   0-89791-099-0.
  7. Paterson, M. S. (1990). "Improved sorting networks with O(log N) depth". Algorithmica. 5 (1–4): 75–92. doi:10.1007/BF01840378. S2CID   2064561.
  8. Goodrich, Michael (March 2014). Zig-zag Sort: A Simple Deterministic Data-Oblivious Sorting Algorithm Running in O(n log n) Time. Proceedings of the 46th Annual ACM Symposium on Theory of Computing - STOC '14. pp. 684–693. arXiv: 1403.2777 . doi:10.1145/2591796.2591830. ISBN   9781450327107. S2CID   947950.
  9. 1 2 Parberry, Ian (1991). "A Computer Assisted Optimal Depth Lower Bound for Nine-Input Sorting Networks" (PDF). Mathematical Systems Theory. 24: 101–116. CiteSeerX   10.1.1.712.219 . doi:10.1007/bf02090393. S2CID   7077160.
  10. 1 2 3 Codish, Michael; Cruz-Filipe, Luís; Ehlers, Thorsten; Müller, Mike; Schneider-Kamp, Peter (2015). Sorting Networks: to the End and Back Again. arXiv: 1507.01428 . Bibcode:2015arXiv150701428C.
  11. 1 2 Codish, Michael; Cruz-Filipe, Luís; Frank, Michael; Schneider-Kamp, Peter (2014). Twenty-Five Comparators is Optimal when Sorting Nine Inputs (and Twenty-Nine for Ten). Proc. Int'l Conf. Tools with AI (ICTAI). pp. 186–193. arXiv: 1405.5754 . Bibcode:2014arXiv1405.5754C.
  12. 1 2 Obtained by Van Voorhis lemma and the value S(11) = 35
  13. Ehlers, Thorsten (February 2017). "Merging almost sorted sequences yields a 24-sorter". Information Processing Letters. 118: 17–20. doi:10.1016/j.ipl.2016.08.005.
  14. 1 2 Dobbelaere, Bert. "SorterHunter". GitHub. Retrieved 2 Jan 2022.
  15. Bundala, D.; Závodný, J. (2014). Optimal Sorting Networks. Language and Automata Theory and Applications. Lecture Notes in Computer Science. Vol. 8370. pp. 236–247. arXiv: 1310.6271 . doi:10.1007/978-3-319-04921-2_19. ISBN   978-3-319-04920-5. S2CID   16860013.
  16. Harder, Jannis (2020). "An Answer to the Bose-Nelson Sorting Problem for 11 and 12 Channels". arXiv: 2012.04400 [cs.DS].
  17. Parberry, Ian (1991). On the Computational Complexity of Optimal Sorting Network Verification. Proc. PARLE '91: Parallel Architectures and Languages Europe, Volume I: Parallel Architectures and Algorithms, Eindhoven, the Netherlands. pp. 252–269.