Splaysort

Last updated

In computer science, splaysort is an adaptive comparison sorting algorithm based on the splay tree data structure. [1]

Contents

Algorithm

The steps of the algorithm are:

  1. Initialize an empty splay tree
  2. For each data item in the input order, insert it into the splay tree
  3. Traverse the splay tree in inorder to find the sorted order of the data

Thus, the algorithm may be seen as a form of insertion sort or tree sort, using a splay tree to speed up each insertion.

Analysis

Based on the amortized analysis of splay trees, the worst case running time of splaysort, on an input with n data items, is O(n log n), matching the time bounds for efficient non-adaptive algorithms such as quicksort, heap sort, and merge sort.

For an input sequence in which most items are placed close to their predecessor in the sorted order, or are out of order with only a small number of other items, splaysort can be faster than O(n log n), showing that it is an adaptive sort. To quantify this, let dx be the number of positions in the input that separate x from its predecessor, and let ix be the number of items that appear on one side of x in the input and on the other side of x in the output (the number of inversions that involve x). Then it follows from the dynamic finger theorem for splay trees that the total time for splaysort is bounded by

and by

. [2]

Splaysort can also be shown to be adaptive to the entropy of the input sequence. [3]

Experimental results

In experiments by Moffat, Eddy & Petersson (1996), splaysort was slower than quicksort on tables of random numbers by a factor of 1.5 to 2, and slower than mergesort by smaller factors. For data consisting of larger records, again in a random order, the additional amount of data movement performed by quicksort significantly slowed it down compared to pointer-based algorithms, and the times for splaysort and mergesort were very close to each other. However, for nearly presorted input sequences (measured in terms of the number of contiguous monotone subsequences in the data, the number of inversions, the number of items that must be removed to make a sorted subsequence, or the number of non-contiguous monotone subsequences into which the input can be partitioned) splaysort became significantly more efficient than the other algorithms. [1]

Elmasry & Hammad (2005) compared splaysort to several other algorithms that are adaptive to the total number of inversions in the input, as well as to quicksort. They found that, on the inputs that had few enough inversions to make an adaptive algorithm faster than quicksort, splaysort was the fastest algorithm. [4]

Variations

Saikkonen & Soisalon-Soininen (2012) modify splaysort to be more strongly adaptive to the number of contiguous monotone subsequences in the input, and report on experiments showing that the resulting algorithm is faster on inputs that are nearly presorted according to this measure. [5]

Related Research Articles

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

<span class="mw-page-title-main">Bucket sort</span> Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

Introsort or introspective sort is a hybrid sorting algorithm that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertion sort when the number of elements is below some threshold. This combines the good parts of the three algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(n log n) runtime due to the heap sort. Since the three algorithms it uses are comparison sorts, it is also a comparison sort.

In computer science, patience sorting is a sorting algorithm inspired by, and named after, the card game patience. A variant of the algorithm efficiently computes the length of a longest increasing subsequence in a given array.

<span class="mw-page-title-main">Comparison sort</span> Type of sorting algorithm that works by comparing pairs of elements

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator forms a total preorder over the data, with:

  1. if ab and bc then ac (transitivity)
  2. for all a and b, ab or ba (connexity).
<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

In computer science, adaptive heap sort is a comparison-based sorting algorithm of the adaptive sort family. It is a variant of heap sort that performs better when the data contains existing order. Published by Christos Levcopoulos and Ola Petersson in 1992, the algorithm utilizes a new measure of presortedness, Osc, as the number of oscillations. Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high.

A sorting algorithm falls into the adaptive sort family if it takes advantage of existing order in its input. It benefits from the presortedness in the input sequence – or a limited amount of disorder for various definitions of measures of disorder – and sorts faster. Adaptive sorting is usually performed by modifying existing sorting algorithms.

In computer science, the longest increasing subsequence problem aims to find a subsequence of a given sequence in which the subsequence's elements are sorted in an ascending order and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous or unique. The longest increasing subsequences are studied in the context of various disciplines related to mathematics, including algorithmics, random matrix theory, representation theory, and physics. The longest increasing subsequence problem is solvable in time where denotes the length of the input sequence.

<span class="mw-page-title-main">Tree sort</span> Type of sorting algorithm

A tree sort is a sort algorithm that builds a binary search tree from the elements to be sorted, and then traverses the tree (in-order) so that the elements come out in sorted order. Its typical use is sorting elements online: after each insertion, the set of elements seen so far is available in sorted order.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

Timsort is a hybrid, stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was implemented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsequences of the data that are already ordered (runs) and uses them to sort the remainder more efficiently. This is done by merging runs until certain criteria are fulfilled. Timsort was Python's standard sorting algorithm from version 2.3 to version 3.10, and is used to sort arrays of non-primitive type in Java SE 7, on the Android platform, in GNU Octave, on V8, Swift, and inspired the sorting algorithm used in Rust.

<span class="mw-page-title-main">Bubble sort</span> Simple comparison sorting algorithm

Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, swapping their values if needed. These passes through the list are repeated until no swaps have to be performed during a pass, meaning that the list has become fully sorted. The algorithm, which is a comparison sort, is named for the way the larger elements "bubble" up to the top of the list.

In computer science, merge-insertion sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson. It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.

References

  1. 1 2 Moffat, Alistair; Eddy, Gary; Petersson, Ola (July 1996), "Splaysort: Fast, Versatile, Practical", Software: Practice and Experience, 26 (7): 781–797, doi:10.1002/(SICI)1097-024X(199607)26:7<781::AID-SPE35>3.3.CO;2-2
  2. Cole, Richard (2000), "On the dynamic finger conjecture for splay trees. II. The proof", SIAM Journal on Computing , 30 (1): 44–85, CiteSeerX   10.1.1.36.2713 , doi:10.1137/S009753979732699X, MR   1762706 .
  3. Gagie, Travis (2005), Sorting a low-entropy sequence, arXiv: cs/0506027 , Bibcode:2005cs........6027G .
  4. Elmasry, Amr; Hammad, Abdelrahman (2005), "An empirical study for inversions-sensitive sorting algorithms", Experimental and Efficient Algorithms: 4th International Workshop, WEA 2005, Santorini Island, Greece, May 10-13, 2005, Proceedings, Lecture Notes in Computer Science, vol. 3503, Springer, pp. 597–601, doi:10.1007/11427186_52, ISBN   978-3-540-25920-6 .
  5. Saikkonen, Riku; Soisalon-Soininen, Eljas (2012), "A general method for improving insertion-based adaptive sorting", Algorithms and Computation: 23rd International Symposium, ISAAC 2012, Taipei, Taiwan, December 19-21, 2012, Proceedings, Lecture Notes in Computer Science, vol. 7676, Springer, pp. 217–226, doi:10.1007/978-3-642-35261-4_25, ISBN   978-3-642-35260-7 .