Selection sort

Last updated
Selection sort
Class Sorting algorithm
Data structure Array
Worst-case performance comparisons, swaps
Best-case performance comparisons, swap
Average performance comparisons, swaps
Worst-case space complexity auxiliary
OptimalNo

In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(n2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity and has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

Contents

The algorithm divides the input list into two parts: a sorted sublist of items which is built up from left to right at the front (left) of the list and a sublist of the remaining unsorted items that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.

The time efficiency of selection sort is quadratic, so there are a number of sorting techniques which have better time complexity than selection sort.

Example

Here is an example of this sort algorithm sorting five elements:

Sorted sublistUnsorted sublistLeast element in unsorted list
()(11, 25, 12, 22, 64)11
(11)(25, 12, 22, 64)12
(11, 12)(25, 22, 64)22
(11, 12, 22)(25, 64)25
(11, 12, 22, 25)(64)64
(11, 12, 22, 25, 64)()
Selection sort animation. Red is current min. Yellow is sorted list. Blue is current item. Selection-Sort-Animation.gif
Selection sort animation. Red is current min. Yellow is sorted list. Blue is current item.

(Nothing appears changed on these last two lines because the last two numbers were already in order.)

Selection sort can also be used on list structures that make add and remove efficient, such as a linked list. In this case it is more common to remove the minimum element from the remainder of the list, and then insert it at the end of the values sorted so far. For example:

arr[] = 64 25 12 22 11  // Find the minimum element in arr[0...4] // and place it at beginning 11 25 12 22 64  // Find the minimum element in arr[1...4] // and place it at beginning of arr[1...4] 11 12 25 22 64  // Find the minimum element in arr[2...4] // and place it at beginning of arr[2...4] 11 12 22 25 64  // Find the minimum element in arr[3...4] // and place it at beginning of arr[3...4] 11 12 22 25 64  

Implementations

Below is an implementation in C.

/* a[0] to a[aLength-1] is the array to sort */inti,j;intaLength;// initialise to a's length/* advance the position through the entire array *//*   (could do i < aLength-1 because single element is also min element) */for(i=0;i<aLength-1;i++){/* find the min element in the unsorted a[i .. aLength-1] *//* assume the min is the first element */intjMin=i;/* test against elements after i to find the smallest */for(j=i+1;j<aLength;j++){/* if this element is less, then it is the new minimum */if(a[j]<a[jMin]){/* found new minimum; remember its index */jMin=j;}}if(jMin!=i){swap(&a[i],&a[jMin]);}}

Complexity

Selection sort is not difficult to analyze compared to other sorting algorithms, since none of the loops depend on the data in the array. Selecting the minimum requires scanning elements (taking comparisons) and then swapping it into the first position. Finding the next lowest element requires scanning the remaining elements (taking comparisons) and so on. Therefore, the total number of comparisons is

By arithmetic progression,

which is of complexity in terms of number of comparisons.

Comparison to other sorting algorithms

Among quadratic sorting algorithms (sorting algorithms with a simple average-case of Θ(n2)), selection sort almost always outperforms bubble sort and gnome sort. Insertion sort is very similar in that after the kth iteration, the first elements in the array are in sorted order. Insertion sort's advantage is that it only scans as many elements as it needs in order to place the st element, while selection sort must scan all remaining elements to find the st element.

Simple calculation shows that insertion sort will therefore usually perform about half as many comparisons as selection sort, although it can perform just as many or far fewer depending on the order the array was in prior to sorting. It can be seen as an advantage for some real-time applications that selection sort will perform identically regardless of the order of the array, while insertion sort's running time can vary considerably. However, this is more often an advantage for insertion sort in that it runs much more efficiently if the array is already sorted or "close to sorted."

While selection sort is preferable to insertion sort in terms of number of writes ( swaps versus up to swaps, with each swap being two writes), this is roughly twice the theoretical minimum achieved by cycle sort, which performs at most n writes. This can be important if writes are significantly more expensive than reads, such as with EEPROM or Flash memory, where every write lessens the lifespan of the memory.

Selection sort can be implemented without unpredictable branches for the benefit of CPU branch predictors, by finding the location of the minimum with branch-free code and then performing the swap unconditionally.

Finally, selection sort is greatly outperformed on larger arrays by divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays (i.e. fewer than 10–20 elements). A useful optimization in practice for the recursive algorithms is to switch to insertion sort or selection sort for "small enough" sublists.

Variants

Heapsort greatly improves the basic algorithm by using an implicit heap data structure to speed up finding and removing the lowest datum. If implemented correctly, the heap will allow finding the next lowest element in time instead of for the inner loop in normal selection sort, reducing the total running time to .

A bidirectional variant of selection sort (called double selection sort or sometimes cocktail sort due to its similarity to cocktail shaker sort) finds both the minimum and maximum values in the list in every pass. This requires three comparisons per two items (a pair of elements is compared, then the greater is compared to the maximum and the lesser is compared to the minimum) rather than regular selection sort's one comparison per item, but requires only half as many passes, a net 25% savings.

Selection sort can be implemented as a stable sort if, rather than swapping in step 2, the minimum value is inserted into the first position and the intervening values shifted up. However, this modification either requires a data structure that supports efficient insertions or deletions, such as a linked list, or it leads to performing writes.

In the bingo sort variant, items are sorted by repeatedly looking through the remaining items to find the greatest value and moving all items with that value to their final location. [1] Like counting sort, this is an efficient variant if there are many duplicate values: selection sort does one pass through the remaining items for each item moved, while Bingo sort does one pass for each value. After an initial pass to find the greatest value, subsequent passes move every item with that value to its final location while finding the next value as in the following pseudocode (arrays are zero-based and the for-loop includes both the top and bottom limits, as in Pascal):

bingo(arrayA){ This procedure sorts in ascending order by  repeatedly moving maximal items to the end. }beginlast:=length(A)-1;{ The first iteration is written to look very similar to the subsequent ones,      but without swaps. }nextMax:=A[last];fori:=last-1downto0doifA[i]>nextMaxthennextMax:=A[i];while(last>0)and(A[last]=nextMax)dolast:=last-1;whilelast>0dobeginprevMax:=nextMax;nextMax:=A[last];fori:=last-1downto0doifA[i]>nextMaxthenifA[i]<>prevMaxthennextMax:=A[i];elsebeginswap(A[i],A[last]);last:=last-1;endwhile(last>0)and(A[last]=nextMax)dolast:=last-1;end;end;

Thus, if on average there are more than two items with the same value, bingo sort can be expected to be faster because it executes the inner loop fewer times than selection sort.

See also

Related Research Articles

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

Merge algorithms are a family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Bucket sort</span> Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<span class="mw-page-title-main">Comparison sort</span> Type of sorting algorithm that works by comparing pairs of elements

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator forms a total preorder over the data, with:

  1. if ab and bc then ac (transitivity)
  2. for all a and b, ab or ba (connexity).
<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database.

<span class="mw-page-title-main">Bubble sort</span> Simple comparison sorting algorithm

Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, swapping their values if needed. These passes through the list are repeated until no swaps have to be performed during a pass, meaning that the list has become fully sorted. The algorithm, which is a comparison sort, is named for the way the larger elements "bubble" up to the top of the list.

<span class="mw-page-title-main">Cycle sort</span> Comparison sorting algorithm

Cycle sort is an in-place, unstable sorting algorithm, a comparison sort that is theoretically optimal in terms of the total number of writes to the original array, unlike any other in-place sorting algorithm. It is based on the idea that the permutation to be sorted can be factored into cycles, which can individually be rotated to give a sorted result.

In computer science, a min-max heap is a complete binary tree data structure which combines the usefulness of both a min-heap and a max-heap, that is, it provides constant time retrieval and logarithmic time removal of both the minimum and maximum elements in it. This makes the min-max heap a very useful data structure to implement a double-ended priority queue. Like binary min-heaps and max-heaps, min-max heaps support logarithmic insertion and deletion and can be built in linear time. Min-max heaps are often represented implicitly in an array; hence it's referred to as an implicit data structure.

In computer science, the Floyd-Rivest algorithm is a selection algorithm developed by Robert W. Floyd and Ronald L. Rivest that has an optimal expected number of comparisons within lower-order terms. It is functionally equivalent to quickselect, but runs faster in practice on average. It has an expected running time of O(n) and an expected number of comparisons of n + min(k, nk) + O(n1/2 log1/2n).

<span class="mw-page-title-main">Block sort</span> Efficient sorting algorithm that combines insert and merge operations

Block sort, or block merge sort, is a sorting algorithm combining at least two merge operations with an insertion sort to arrive at O(n log n) (see Big O notation) in-place stable sorting time. It gets its name from the observation that merging two sorted lists, A and B, is equivalent to breaking A into evenly sized blocks, inserting each A block into B under special rules, and merging AB pairs.

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. Two-way merges are also referred to as binary merges.The k- way merge is also an external sorting algorithm.

Interpolation sort is a sorting algorithm that is a kind of bucket sort. It uses an interpolation formula to assign data to the bucket. A general interpolation formula is:

References

  1. PD-icon.svg This article incorporates public domain material from Paul E. Black. "Bingo sort". Dictionary of Algorithms and Data Structures . NIST.