Interpolation sort

Last updated

Interpolation sort is a sorting algorithm that is a kind of bucket sort. It uses an interpolation formula to assign data to the bucket. A general interpolation formula is:

Contents

Interpolation = INT(((Array[i] - min) / (max - min)) * (ArraySize - 1))

Algorithm

Interpolation Sort
Class Sorting Algorithm
Data structure Array
Worst-case performance
Best-case performance
Average performance
Worst-case space complexity
Optimal

Interpolation sort (or histogram sort). [1] It is a sorting algorithm that uses the interpolation formula to disperse data divide and conquer. Interpolation sort is also a variant of bucket sort algorithm. [2] The interpolation sort method uses an array of record bucket lengths corresponding to the original number column. By operating the maintenance length array, the recursive algorithm can be prevented from changing the space complexity to due to memory stacking. The segmentation record of the length array can using secondary function dynamically declare and delete the memory space of the array. The space complexity required to control the recursive program is . Contains a two-dimensional array of dynamically allocated memories and an array of record lengths. However the execution complexity can still be maintained as an efficient sorting method of . [3] Array of dynamically allocated memory can be implemented by linked list, stack, queue, associative array, tree structure, etc. An array object such as JavaScript is applicable. The difference in data structure is related to the speed of data access and thus the time required for sorting.When the values in the ordered array are uniformly distributed approximately the arithmetic progression, the linear time of interpolation sort ordering is . [4]

Interpolation sort algorithm

  1. Set a bucket length array to record the length of the unsorted bucket. Initialize into the original array length.
  2. [Main Sort] If the bucket length array is cleared and sorted is completed. Execute [Divide function] if it is not cleared.
  3. [Divide function] Execute Divide by pop a bucket length from the end of the bucket length array. Find the maximum and minimum values in the bucket. If the maximum value is equal to the minimum value, the sorting is completed to stop Divide.
  4. Set up a two-dimensional array as all empty buckets. Divide into the bucket according to the interpolation number.
  5. After dividing into the buckets, push the length of the buckets into the array of bucket length. And put the items back into the original array one by one from all the buckets that are not empty.
  6. Return to [Main Sort].

Histogram sort algorithm

The NIST definition: An efficient 3-pass refinement of a bucket sort algorithm. [5]

  1. The first pass counts the number of items for each bucket in an auxiliary array, and then makes a running total so each auxiliary entry is the number of preceding items.
  2. The second pass puts each item in its proper bucket according to the auxiliary entry for the key of that item.
  3. The last pass sorts each bucket.

Practice

Interpolation sort implementation

JavaScript code:

Array.prototype.interpolationSort=function(){vardivideSize=newArray();varend=this.length;divideSize[0]=end;while(divideSize.length>0){divide(this);}// Repeat function divide to ArrayListfunctiondivide(A){varsize=divideSize.pop();varstart=end-size;varmin=A[start];varmax=A[start];for(vari=start+1;i<end;i++){if(A[i]<min){min=A[i];}else{if(A[i]>max){max=A[i];}}}if(min==max){end=end-size;}else{varp=0;varbucket=newArray(size);for(vari=0;i<size;i++){bucket[i]=newArray();}for(vari=start;i<end;i++){p=Math.floor(((A[i]-min)/(max-min))*(size-1));bucket[p].push(A[i]);}for(vari=0;i<size;i++){if(bucket[i].length>0){for(varj=0;j<bucket[i].length;j++){A[start++]=bucket[i][j];}divideSize.push(bucket[i].length);}}}}};

Interpolation sort recursive method

Worst-case space complexity :

Array.prototype.interpolationSort=function(){//-- Edit date:2019/08/31 --//varstart=0;varsize=this.length;varmin=this[0];varmax=this[0];for(vari=1;i<size;i++){if(this[i]<min){min=this[i];}else{if(this[i]>max){max=this[i];}}}if(min!=max){varbucket=newArray(size);for(vari=0;i<size;i++){bucket[i]=newArray();}varinterpolation=0;for(vari=0;i<size;i++){interpolation=Math.floor(((this[i]-min)/(max-min))*(size-1));bucket[interpolation].push(this[i]);}for(vari=0;i<size;i++){if(bucket[i].length>1){bucket[i].interpolationSort();}// Recursionfor(varj=0;j<bucket[i].length;j++){this[start++]=bucket[i][j];}}}};

Histogram sort implementation

Array.prototype.histogramSort=function(){//-- Edit date:2019/11/14 --//varend=this.length;varsortedArray=newArray(end);varinterpolation=newArray(end);varhitCount=newArray(end);vardivideSize=newArray();divideSize[0]=end;while(divideSize.length>0){distribute(this);}// Repeat function distribute to Arrayfunctiondistribute(A){varsize=divideSize.pop();varstart=end-size;varmin=A[start];varmax=A[start];for(vari=start+1;i<end;i++){if(A[i]<min){min=A[i];}else{if(A[i]>max){max=A[i];}}}if(min==max){end=end-size;}else{for(vari=start;i<end;i++){hitCount[i]=0;}for(vari=start;i<end;i++){interpolation[i]=start+Math.floor(((A[i]-min)/(max-min))*(size-1));hitCount[interpolation[i]]++;}for(vari=start;i<end;i++){if(hitCount[i]>0){divideSize.push(hitCount[i]);}}hitCount[end-1]=end-hitCount[end-1];for(vari=end-1;i>start;i--){hitCount[i-1]=hitCount[i]-hitCount[i-1];}for(vari=start;i<end;i++){sortedArray[hitCount[interpolation[i]]]=A[i];hitCount[interpolation[i]]++;}for(vari=start;i<end;i++){A[i]=sortedArray[i];}}}};

Variant

Interpolation tag sort

Interpolation Tag Sort
Class Sorting Algorithm
Data structure Array
Worst-case performance
Best-case performance
Average performance
Worst-case space complexity
Optimal

Interpolation Tag Sort is a variant of Interpolation Sort. Applying the bucket sorting and dividing method, the array data is distributed into a limited number of buckets by mathematical interpolation formula, and the bucket then recursively the original processing program until the sorting is completed.

Interpolation tag sort is a recursive sorting method for interpolation sorting. To avoid stacking overflow caused by recursion, the memory crashes. Instead, use a Boolean data type tag array to operate the recursive function to release the memory. The extra memory space required is close to . Contains a two-dimensional array of dynamically allocated memory and a Boolean data type tag array. Stack, queue, associative array, and tree structure can be implemented as buckets.

As the JavaScript array object is suitable for this sorting method, the difference in data structure is related to the speed of data access and thus the time required for sorting. The linear time Θ(n) is used when the values in the array to be sorted are evenly distributed. The bucket sort algorithm does not limit the sorting to the lower limit of . Interpolation tag sort average performance complexity is . [3]

Interpolation tag sort algorithm

  1. Set a tag array equal to the original array size and initialize to a false value.
  2. [Main Sort] Determines whether all buckets of the original array have been sorted. If the sorting is not completed, the [Divide function] is executed.
  3. [Divide function] Find the maximum and minimum values in the bucket. If the maximum value is equal to the minimum value, the sorting is completed and the division is stopped.
  4. Set up a two-dimensional array as all the empty buckets. Divide into the bucket according to the interpolation number.
  5. After dividing into the bucket, mark the starting position of the bucket as a true value in the tag array. And put the items back into the original array one by one from all the buckets that are not empty.
  6. Return to [Main Sort].

Practice

JavaScript code:

Array.prototype.InterpolaionTagSort=function(){// Whale Chen agrees to "Wikipedia CC BY-SA 3.0 License". Sign date: 2019-06-21 //varend=this.length;if(end>1){varstart=0;varTag=newArray(end);// Algorithm step-1 for(vari=0;i<end;i++){Tag[i]=false;}Divide(this);}while(end>1){// Algorithm step-2  while(Tag[--start]==false){}// Find the next bucket's startDivide(this);}functionDivide(A){varmin=A[start];varmax=A[start];for(vari=start+1;i<end;i++){if(A[i]<min){min=A[i];}else{if(A[i]>max){max=A[i];}}}if(min==max){end=start;}// Algorithm step-3 Start to be the next bucket's endelse{varinterpolation=0;varsize=end-start;varBucket=newArray(size);// Algorithm step-4         for(vari=0;i<size;i++){Bucket[i]=newArray();}for(vari=start;i<end;i++){interpolation=Math.floor(((A[i]-min)/(max-min))*(size-1));Bucket[interpolation].push(A[i]);}for(vari=0;i<size;i++){if(Bucket[i].length>0){// Algorithm step-5      Tag[start]=true;for(varj=0;j<Bucket[i].length;j++){A[start++]=Bucket[i][j];}}}}}// Algorithm step-6 };

In-place Interpolation Tag Sort

In-Place Interpolation Tag Sort
Class Sorting Algorithm
Data structure Array
Worst-case performance
Best-case performance
Average performance
Worst-case space complexity
Optimal

The in-place interpolation tag sort is an in-place algorithm of interpolation sort. In-place Interpolation Tag Sort can achieve sorting by only N times of swapping by maintaining N bit tags; however, the array to be sorted must be a continuous integer sequence and not repeated, or the series is completely evenly distributed to approximate The number of arithmetical progression.

The factor column data must not be repeated. For example, sorting 0~100 can be sorted in one step. The number of exchanges is: , the calculation time complexity is: , and the worst space complexity is . If the characteristics of the series meet the conditional requirements of this sorting method: "The array is a continuous integer or an arithmetical progression that does not repeat", the in-place interpolation tag sort will be an excellent sorting method that is extremely fast and saves memory space.

In-place Interpolation Tag Sort Algorithm

In-place Interpolation Tag Sort sorts non-repeating consecutive integer series, only one Boolean data type tag array with the same length as the original array, the array calculates the interpolation of the data from the beginning, and the interpolation points to a new position of the array. Position, the position that has been swapped is marked as true in the corresponding position of the tag array, and is incremented until the end of the array is sorted.

Algorithm process:

  1. Set an equal number of tag arrays to initialize to false values.
  2. Visit the array when tag[i] is false, calculate the position corresponding to the interpolation=p.
  3. Swap a[i] and a[p], let tag[p] = true.
  4. The tour array is completed and the sorting is completed.

Practice

JavaScript code:

Array.prototype.InPlaceTagSort=function(){// Edit Date: 2019-07-02varn=this.length;varTag=newArray(n);for(i=0;i<n;i++){Tag[i]=false;}varmin=this[0];varmax=this[0];for(i=1;i<n;i++){if(this[i]<min){min=this[i];}else{if(this[i]>max){max=this[i];}}}varp=0;vartemp=0;for(i=0;i<n;i++){while(Tag[i]==false){p=Math.floor(((this[i]-min)/(max-min))*(n-1));temp=this[i];this[i]=this[p];this[p]=temp;Tag[p]=true;}}};needSortArray.InPlaceTagSort();

The origin of In-place sorting performed in time

In "Mathematical Analysis of Algorithms", Donald Knuth remarked "... that research on computational complexity is an interesting way to sharpen our tools for more routine problems we face from day to day." [6]

Knuth further pointed out that, with respect to the sorting problem, time effective in-situ permutation is inherently connected with the problem of finding the cycle leaders, and in-situ permutations could easily be performed in time if we would be allowed to manipulate extra "tag" bits specifying how much of the permutation has been carried out at any time. Without such tag bits, he concludes "it seems reasonable to conjecture that every algorithm will require for in-situ permutation at least steps on the average." [6]

The In-place Interpolation Tag Sort is one of the sorting algorithms that prof. Donald Knuth said: "manipulate extra "tag" bits...finding the cycle leaders, and in-situ permutations could easily be performed in time".

Similar sorting method

  1. Flashsort
  2. Proxmap sort
  3. American flag sort

Bucket sort mixing other sorting methods and recursive algorithm

Bucket sort can be mixed with other sorting methods to complete sorting. If it is sorted by bucket sort and insert sort, also is a fairly efficient sorting method. But when the series appears a large deviation from the value: For example, when the maximum value of the series is greater than N times the next largest value. After the series of columns are processed, the distribution is that all the elements except the maximum value fall into the same bucket. The second sorting method uses insert sort. May cause execution complexity to fall into . This has lost the meaning and high-speed performance of using bucket sort.

Interpolation sort is a way of recursively using bucket sort. After performing recursion, still use bucket sort to disperse the series. This can avoid the above situation. If you want to make the recursive interpolation sort execution complexity fall into , it is necessary to present a factorial amplification in the entire series. In fact, there is very little chance that a series of special distributions will occur.

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(n2) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity and has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Bucket sort</span> Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

In computer science, counting sort is an algorithm for sorting a collection of objects according to keys that are small positive integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that possess distinct key values, and applying prefix sum on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum key value and the minimum key value, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. It is often used as a subroutine in radix sort, another sorting algorithm, which can handle larger keys more efficiently.

<span class="mw-page-title-main">Stooge sort</span> Inefficient recursive sorting algorithm

Stooge sort is a recursive sorting algorithm. It is notable for its exceptionally bad time complexity of = The running time of the algorithm is thus slower compared to reasonable sorting algorithms, and is slower than bubble sort, a canonical example of a fairly inefficient sort. It is however more efficient than Slowsort. The name comes from The Three Stooges.

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

An American flag sort is an efficient, in-place variant of radix sort that distributes items into buckets. Non-comparative sorting algorithms such as radix sort and American flag sort are typically used to sort large objects such as strings, for which comparison is not a unit-time operation. American flag sort iterates through the bits of the objects, considering several bits of each object at a time. For each set of bits, American flag sort makes two passes through the array of objects: first to count the number of objects that will fall in each bin, and second to place each object in its bucket. This works especially well when sorting a byte at a time, using 256 buckets. With some optimizations, it is twice as fast as quicksort for large sets of strings.

Flashsort is a distribution sorting algorithm showing linear computational complexity O(n) for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

<span class="mw-page-title-main">Proxmap sort</span>

ProxmapSort, or Proxmap sort, is a sorting algorithm that works by partitioning an array of data items, or keys, into a number of "subarrays". The name is short for computing a "proximity map," which indicates for each key K the beginning of a subarray where K will reside in the final sorted order. Keys are placed into each subarray using insertion sort. If keys are "well distributed" among the subarrays, sorting occurs in linear time. The computational complexity estimates involve the number of subarrays and the proximity mapping function, the "map key," used. It is a form of bucket and radix sort.

Slowsort is a sorting algorithm. It is of humorous nature and not useful. It is a reluctant algorithm based on the principle of multiply and surrender. It was published in 1984 by Andrei Broder and Jorge Stolfi in their paper Pessimal Algorithms and Simplexity Analysis.

<span class="mw-page-title-main">Block sort</span> Efficient sorting algorithm that combines insert and merge operations

Block sort, or block merge sort, is a sorting algorithm combining at least two merge operations with an insertion sort to arrive at O(n log n) (see Big O notation) in-place stable sorting time. It gets its name from the observation that merging two sorted lists, A and B, is equivalent to breaking A into evenly sized blocks, inserting each A block into B under special rules, and merging AB pairs.

The cache-oblivious distribution sort is a comparison-based sorting algorithm. It is similar to quicksort, but it is a cache-oblivious algorithm, designed for a setting where the number of elements to sort is too large to fit in a cache where operations are done. In the external memory model, the number of memory transfers it needs to perform a sort of items on a machine with cache of size and cache lines of length is , under the tall cache assumption that . This number of memory transfers has been shown to be asymptotically optimal for comparison sorts. This distribution sort also achieves the asymptotically optimal runtime complexity of .

References

  1. NIST Algorithm. "interpolation sort". Definition: See histogram sort.
  2. "Histogram sort". Another variant of bucket sort known as histogram sort or counting sort adds an initial pass that counts the number of elements that will fall into each bucket using a count array. Using this information, the array values can be arranged into a sequence of buckets in-place by a sequence of exchanges, leaving no space overhead for bucket storage.[ circular reference ]
  3. 1 2 "桶排序(Bucket sort)" (in Chinese). 平均時間複雜度(Average performance)[ circular reference ]
  4. "Bucket sort Average-case analysis". en.wikipedia. Note that if k is chosen to be , then bucket sort runs in average time, given a uniformly distributed input.[ circular reference ]
  5. NIST Algorithm. "histogramSort sort". Definition: An efficient 3-pass refinement of a bucket sort algorithm.
  6. 1 2 Karl-Dietrich Neubert (1998). "The FlashSort Algorithm" . Retrieved 2007-11-06.