Patience sorting

Last updated
Patience sorting
Class Sorting algorithm
Data structure Array
Worst-case performance O(n log n)
Best-case performance O(n); occurs when the input is pre-sorted [1]
Optimal?

In computer science, patience sorting is a sorting algorithm inspired by, and named after, the card game patience. A variant of the algorithm efficiently computes the length of a longest increasing subsequence in a given array.

Contents

Overview

The algorithm's name derives from a simplified variant of the patience card game. The game begins with a shuffled deck of cards. The cards are dealt one by one into a sequence of piles on the table, according to the following rules. [2]

  1. Initially, there are no piles. The first card dealt forms a new pile consisting of the single card.
  2. Each subsequent card is placed on the leftmost existing pile whose top card has a value greater than or equal to the new card's value, or to the right of all of the existing piles, thus forming a new pile.
  3. When there are no more cards remaining to deal, the game ends.

This card game is turned into a two-phase sorting algorithm, as follows. Given an array of n elements from some totally ordered domain, consider this array as a collection of cards and simulate the patience sorting game. When the game is over, recover the sorted sequence by repeatedly picking off the minimum visible card; in other words, perform a k-way merge of the p piles, each of which is internally sorted.

Analysis

The first phase of patience sort, the card game simulation, can be implemented to take O(n log n) comparisons in the worst case for an n-element input array: there will be at most n piles, and by construction, the top cards of the piles form an increasing sequence from left to right, so the desired pile can be found by binary search. [1] The second phase, the merging of piles, can be done in time as well using a priority queue. [1]

When the input data contain natural "runs", i.e., non-decreasing subarrays, then performance can be strictly better. In fact, when the input array is already sorted, all values form a single pile and both phases run in O(n) time. The average-case complexity is still O(n log n): any uniformly random sequence of values will produce an expected number of piles, [3] which take time to produce and merge. [1]

An evaluation of the practical performance of patience sort is given by Chandramouli and Goldstein, who show that a naive version is about ten to twenty times slower than a state-of-the-art quicksort on their benchmark problem. They attribute this to the relatively small amount of research put into patience sort, and develop several optimizations that bring its performance to within a factor of two of that of quicksort. [1]

If values of cards are in the range 1, . . . , n, there is an efficient implementation with worst-case running time for putting the cards into piles, relying on a Van Emde Boas tree. [3]

Relations to other problems

Patience sorting is closely related to a card game called Floyd's game. This game is very similar to the game sketched earlier: [2]

  1. The first card dealt forms a new pile consisting of the single card.
  2. Each subsequent card is placed on some existing pile whose top card has a value no less than the new card's value, or to the right of all of the existing piles, thus forming a new pile.
  3. When there are no more cards remaining to deal, the game ends.

The object of the game is to finish with as few piles as possible. The difference with the patience sorting algorithm is that there is no requirement to place a new card on the leftmost pile where it is allowed. Patience sorting constitutes a greedy strategy for playing this game.

Aldous and Diaconis suggest defining 9 or fewer piles as a winning outcome for n = 52, which happens with approximately 5% probability. [4]

Algorithm for finding a longest increasing subsequence

First, execute the sorting algorithm as described above. The number of piles is the length of a longest subsequence. Whenever a card is placed on top of a pile, put a back-pointer to the top card in the previous pile (that, by assumption, has a lower value than the new card has). In the end, follow the back-pointers from the top card in the last pile to recover a decreasing subsequence of the longest length; its reverse is an answer to the longest increasing subsequence algorithm.

S. Bespamyatnikh and M. Segal [3] give a description of an efficient implementation of the algorithm, incurring no additional asymptotic cost over the sorting one (as the back-pointers storage, creation and traversal require linear time and space). They further show how to report all the longest increasing subsequences from the same resulting data structures.

History

Patience sorting was named by C. L. Mallows, who attributed its invention to A.S.C. Ross in the early 1960s. [1] According to Aldous and Diaconis, [4] patience sorting was first recognized as an algorithm to compute the longest increasing subsequence length by Hammersley. [5] A.S.C. Ross and independently Robert W. Floyd recognized it as a sorting algorithm. Initial analysis was done by Mallows. [6] Floyd's game was developed by Floyd in correspondence with Donald Knuth. [2]

Use

The patience sorting algorithm can be applied to process control. Within a series of measurements, the existence of a long increasing subsequence can be used as a trend marker. A 2002 article in SQL Server magazine includes a SQL implementation, in this context, of the patience sorting algorithm for the length of the longest increasing subsequence. [7]

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm which can be thought of as "an implementation of selection sort using the right data structure." Like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to efficiently find the largest element in each step.

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

<span class="mw-page-title-main">Merge sort</span> Divide and conquer-based sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

<span class="mw-page-title-main">Shuffling</span> Procedure used to randomize a deck of playing cards

Shuffling is a procedure used to randomize a deck of playing cards to provide an element of chance in card games. Shuffling is often followed by a cut, to help ensure that the shuffler has not manipulated the outcome.

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

In computer science, an in-place algorithm is an algorithm that operates directly on the input data structure without requiring extra space proportional to the input size. In other words, it modifies the input in place, without creating a separate copy of the data structure. An algorithm which is not in-place is sometimes called not-in-place or out-of-place.

In computer science, a skip list is a probabilistic data structure that allows average complexity for search as well as average complexity for insertion within an ordered sequence of elements. Thus it can get the best features of a sorted array while maintaining a linked list-like structure that allows insertion, which is not possible with a static array. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous one. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. Via the linked hierarchy, these two elements link to elements of the next sparsest subsequence, where searching is continued until finally searching in the full sequence. The elements that are skipped over may be chosen probabilistically or deterministically, with the former being more common.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<span class="mw-page-title-main">Quickselect</span> Algorithm for the kth smallest element in an array

In computer science, quickselect is a selection algorithm to find the kth smallest element in an unordered list, also known as the kth order statistic. Like the related quicksort sorting algorithm, it was developed by Tony Hoare, and thus is also known as Hoare's selection algorithm. Like quicksort, it is efficient in practice and has good average-case performance, but has poor worst-case performance. Quickselect and its variants are the selection algorithms most often used in efficient real-world implementations.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

sort is a generic function in the C++ Standard Library for doing comparison sorting. The function originated in the Standard Template Library (STL).

In computer science, the longest increasing subsequence problem aims to find a subsequence of a given sequence in which the subsequence's elements are sorted in an ascending order and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous or unique. The longest increasing subsequences are studied in the context of various disciplines related to mathematics, including algorithmics, random matrix theory, representation theory, and physics. The longest increasing subsequence problem is solvable in time where denotes the length of the input sequence.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In the mathematics of shuffling playing cards, the Gilbert–Shannon–Reeds model is a probability distribution on riffle shuffle permutations that has been reported to be a good match for experimentally observed outcomes of human shuffling, and that forms the basis for a recommendation that a deck of cards should be riffled seven times in order to thoroughly randomize it. It is named after the work of Edgar Gilbert, Claude Shannon, and J. Reeds, reported in a 1955 technical report by Gilbert and in a 1981 unpublished manuscript of Reeds.

In computer science, splaysort is an adaptive comparison sorting algorithm based on the splay tree data structure.

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. Two-way merges are also referred to as binary merges.The k- way merge also external sorting algorithm.

The Baik–Deift–Johansson theorem is a result from probabilistic combinatorics. It deals with the subsequences of a randomly uniformly drawn permutation from the set . The theorem makes a statement about the distribution of the length of the longest increasing subsequence in the limit. The theorem was influential in probability theory since it connected the KPZ-universality with the theory of random matrices.

References

  1. 1 2 3 4 5 6 Chandramouli, Badrish; Goldstein, Jonathan (2014). Patience is a Virtue: Revisiting Merge and Sort on Modern Processors (PDF). SIGMOD/PODS.
  2. 1 2 3 Burstein, Alexander; Lankham, Isaiah (2006). "Combinatorics of patience sorting piles" (PDF). Séminaire Lotharingien de Combinatoire. 54A. arXiv: math/0506358 . Bibcode:2005math......6358B.
  3. 1 2 3 Bespamyatnikh, Sergei; Segal, Michael (2000). "Enumerating Longest Increasing Subsequences and Patience Sorting". Information Processing Letters . 76 (1–2): 7–11. CiteSeerX   10.1.1.40.5912 . doi:10.1016/s0020-0190(00)00124-1.
  4. 1 2 Aldous, David; Diaconis, Persi (1999). "Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem". Bulletin of the American Mathematical Society. New Series. 36 (4): 413–432. doi: 10.1090/s0273-0979-99-00796-x .
  5. Hammersley, John (1972). A few seedlings of research. Proc. Sixth Berkeley Symp. Math. Statist. and Probability. Vol. 1. University of California Press. pp. 345–394.
  6. Mallows, C. L. (1973). "Patience sorting". Bull. Inst. Math. Appl. 9: 216–224.
  7. Kass, Steve (April 30, 2002). "Statistical Process Control". SQL Server Pro. Retrieved 23 April 2014.