Exponential search

Last updated
Exponential search
Class Search algorithm
Data structure Array
Worst-case performance O(log i)
Best-case performance O(1)
Average performance O(log i)
Worst-case space complexity O(1)
OptimalYes

In computer science, an exponential search (also called doubling search or galloping search or Struzik search) [1] is an algorithm, created by Jon Bentley and Andrew Chi-Chih Yao in 1976, for searching sorted, unbounded/infinite lists. [2] There are numerous ways to implement this, with the most common being to determine a range that the search key resides in and performing a binary search within that range. This takes O(log i) where i is the position of the search key in the list, if the search key is in the list, or the position where the search key should be, if the search key is not in the list.

Contents

Exponential search can also be used to search in bounded lists. Exponential search can even out-perform more traditional searches for bounded lists, such as binary search, when the element being searched for is near the beginning of the array. This is because exponential search will run in O(log i) time, where i is the index of the element being searched for in the list, whereas binary search would run in O(log n) time, where n is the number of elements in the list.

Algorithm

Exponential search allows for searching through a sorted, unbounded list for a specified input value (the search "key"). The algorithm consists of two stages. The first stage determines a range in which the search key would reside if it were in the list. In the second stage, a binary search is performed on this range. In the first stage, assuming that the list is sorted in ascending order, the algorithm looks for the first exponent, j, where the value 2j is greater than the search key. This value, 2j becomes the upper bound for the binary search with the previous power of 2, 2j - 1, being the lower bound for the binary search. [3]

// Returns the position of key in the array arr of length size.template<typenameT>intexponential_search(Tarr[],intsize,Tkey){if(size==0){returnNOT_FOUND;}intbound=1;while(bound<size&&arr[bound]<key){bound*=2;}returnbinary_search(arr,key,bound/2,min(bound+1,size));}

In each step, the algorithm compares the search key value with the key value at the current search index. If the element at the current index is smaller than the search key, the algorithm repeats, skipping to the next search index by doubling it, calculating the next power of 2. [3] If the element at the current index is larger than the search key, the algorithm now knows that the search key, if it is contained in the list at all, is located in the interval formed by the previous search index, 2j - 1, and the current search index, 2j. The binary search is then performed with the result of either a failure, if the search key is not in the list, or the position of the search key in the list.

Performance

The first stage of the algorithm takes O(log i) time, where i is the index where the search key would be in the list. This is because, in determining the upper bound for the binary search, the while loop is executed exactly times. Since the list is sorted, after doubling the search index times, the algorithm will be at a search index that is greater than or equal to i as . As such, the first stage of the algorithm takes O(log i) time.

The second part of the algorithm also takes O(log i) time. As the second stage is simply a binary search, it takes O(log n) where n is the size of the interval being searched. The size of this interval would be 2j - 2j - 1 where, as seen above, j = log i. This means that the size of the interval being searched is 2log i - 2log i - 1 = 2log i - 1. This gives us a run time of log (2log i - 1) = log (i) - 1 = O(log i).

This gives the algorithm a total runtime, calculated by summing the runtimes of the two stages, of O(log i) + O(log i) = 2 O(log i) = O(log i).

Alternatives

Bentley and Yao suggested several variations for exponential search. [2] These variations consist of performing a binary search, as opposed to a unary search, when determining the upper bound for the binary search in the second stage of the algorithm. This splits the first stage of the algorithm into two parts, making the algorithm a three-stage algorithm overall. The new first stage determines a value , much like before, such that is larger than the search key and is lower than the search key. Previously, was determined in a unary fashion by calculating the next power of 2 (i.e., adding 1 to j). In the variation, it is proposed that is doubled instead (e.g., jumping from 22 to 24 as opposed to 23). The first such that is greater than the search key forms a much rougher upper bound than before. Once this is found, the algorithm moves to its second stage and a binary search is performed on the interval formed by and , giving the more accurate upper bound exponent j. From here, the third stage of the algorithm performs the binary search on the interval 2j - 1 and 2j, as before. The performance of this variation is = O(log i).

Bentley and Yao generalize this variation into one where any number, k, of binary searches are performed during the first stage of the algorithm, giving the k-nested binary search variation. The asymptotic runtime does not change for the variations, running in O(log i) time, as with the original exponential search algorithm.

Also, a data structure with a tight version of the dynamic finger property can be given when the above result of the k-nested binary search is used on a sorted array. [4] Using this, the number of comparisons done during a search is log (d) + log log (d) + ... + O(log *d), where d is the difference in rank between the last element that was accessed and the current element being accessed.

Applications

An algorithm based on exponentially increasing the search band solves global pairwise alignment for O(ns), where n is the length of the sequences and s is the edit distance between them. [5] [6]

See also

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

<span class="mw-page-title-main">Floor and ceiling functions</span> Nearest integers from a number

In mathematics, the floor function (or greatest integer function) is the function that takes as input a real number x, and gives as output the greatest integer less than or equal to x, denoted x or floor(x). Similarly, the ceiling function maps x to the smallest integer greater than or equal to x, denoted x or ceil(x).

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Interpolation search</span> Searching algorithm

Interpolation search is an algorithm for searching for a key in an array that has been ordered by numerical values assigned to the keys. It was first described by W. W. Peterson in 1957. Interpolation search resembles the method by which people search a telephone directory for a name : in each step the algorithm calculates where in the remaining search space the sought item might be, based on the key values at the bounds of the search space and the value of the sought key, usually via a linear interpolation. The key value actually found at this estimated position is then compared to the key value being sought. If it is not equal, then depending on the comparison, the remaining search space is reduced to the part before or after the estimated position. This method will only work if calculations on the size of differences between key values are sensible.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<span class="mw-page-title-main">Iterated logarithm</span> Inverse function to a tower of powers

In computer science, the iterated logarithm of , written log* , is the number of times the logarithm function must be iteratively applied before the result is less than or equal to . The simplest formal definition is the result of this recurrence relation:

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

<span class="mw-page-title-main">Bitonic sorter</span> Parallel sorting algorithm

Bitonic mergesort is a parallel algorithm for sorting. It is also used as a construction method for building a sorting network. The algorithm was devised by Ken Batcher. The resulting sorting networks consist of comparators and have a delay of , where is the number of items to be sorted. This makes it a popular choice for sorting large numbers of elements on an architecture which itself contains a large number of parallel execution units running in lockstep, such as a typical GPU.

<span class="mw-page-title-main">Comparison sort</span> Type of sorting algorithm that works by comparing pairs of elements

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator forms a total preorder over the data, with:

  1. if ab and bc then ac (transitivity)
  2. for all a and b, ab or ba (connexity).
<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing p−1 < s elements from the result. These elements then divide the array into p approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In graph theory, the graph bandwidth problem is to label the n vertices vi of a graph G with distinct integers so that the quantity is minimized . The problem may be visualized as placing the vertices of a graph at distinct integer points along the x-axis so that the length of the longest edge is minimized. Such placement is called linear graph arrangement, linear graph layout or linear graph placement.

In computer science, merge-insertion sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson. It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.

<span class="mw-page-title-main">Parallel external memory</span>

In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine. It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.

References

  1. Baeza-Yates, Ricardo; Salinger, Alejandro (2010), "Fast intersection algorithms for sorted sequences", in Elomaa, Tapio; Mannila, Heikki; Orponen, Pekka (eds.), Algorithms and Applications: Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday, Lecture Notes in Computer Science, vol. 6060, Springer, pp. 45–61, Bibcode:2010LNCS.6060...45B, doi:10.1007/978-3-642-12476-1_3, ISBN   9783642124754 .
  2. 1 2 Bentley, Jon L.; Yao, Andrew C. (1976). "An almost optimal algorithm for unbounded searching". Information Processing Letters . 5 (3): 82–87. doi:10.1016/0020-0190(76)90071-5. ISSN   0020-0190.
  3. 1 2 Jonsson, Håkan (2011-04-19). "Exponential Binary Search". Archived from the original on 2020-06-01. Retrieved 2014-03-24.
  4. Andersson, Arne; Thorup, Mikkel (2007). "Dynamic ordered sets with exponential search trees". Journal of the ACM . 54 (3): 13. arXiv: cs/0210006 . doi:10.1145/1236457.1236460. ISSN   0004-5411. S2CID   8175703.
  5. Ukkonen, Esko (March 1985). "Finding approximate patterns in strings". Journal of Algorithms. 6 (1): 132–137. doi:10.1016/0196-6774(85)90023-9. ISSN   0196-6774.
  6. Šošić, Martin; Šikić, Mile (2016-08-23). "Edlib: a C/C++ library for fast, exact sequence alignment using edit distance". doi:10.1101/070649. S2CID   3818517.