Exponential search

Exponential search
Class	Search algorithm
Data structure	Array
Worst-case performance	O(log i)
Best-case performance	O(1)
Average performance	O(log i)
Worst-case space complexity	O(1)
Optimal	Yes

Last updated January 19, 2025

In computer science, an exponential search (also called doubling search or galloping search or Struzik search)^[1] is an algorithm, created by Jon Bentley and Andrew Chi-Chih Yao in 1976, for searching sorted, unbounded/infinite lists.^[2] There are numerous ways to implement this, with the most common being to determine a range that the search key resides in and performing a binary search within that range. This takes $O(\log i)$ time, where $i$ is the position of the search key in the list, if the search key is in the list, or the position where the search key should be, if the search key is not in the list.

Exponential search can also be used to search in bounded lists. Exponential search can even out-perform more traditional searches for bounded lists, such as binary search, when the element being searched for is near the beginning of the array. This is because exponential search will run in $O(\log i)$ time, where $i$ is the index of the element being searched for in the list, whereas binary search would run in $O(\log n)$ time, where $n$ is the number of elements in the list.

Algorithm

Exponential search allows for searching through a sorted, unbounded list for a specified input value (the search "key"). The algorithm consists of two stages. The first stage determines a range in which the search key would reside if it were in the list. In the second stage, a binary search is performed on this range. In the first stage, assuming that the list is sorted in ascending order, the algorithm looks for the first exponent, j, where the value 2^j is greater than the search key. This value, 2^j becomes the upper bound for the binary search with the previous power of 2, 2^{j - 1}, being the lower bound for the binary search.^[3]

// Returns the position of key in the array arr of length size.template<typenameT>intexponential_search(Tarr[],intsize,Tkey){if(size==0){returnNOT_FOUND;}intbound=1;while(bound<size&&arr[bound]<key){bound*=2;}returnbinary_search(arr,key,bound/2,min(bound,size));}

In each step, the algorithm compares the search key value with the key value at the current search index. If the element at the current index is smaller than the search key, the algorithm repeats, skipping to the next search index by doubling it, calculating the next power of 2.^[3] If the element at the current index is larger than the search key, the algorithm now knows that the search key, if it is contained in the list at all, is located in the interval formed by the previous search index, 2^{j - 1}, and the current search index, 2^j. The binary search is then performed with the result of either a failure, if the search key is not in the list, or the position of the search key in the list.

Performance

The first stage of the algorithm takes $O(\log i)$ time, where $i$ is the index where the search key would be in the list. This is because, in determining the upper bound for the binary search, the while loop is executed exactly $\lceil \log(i)\rceil$ times. Since the list is sorted, after doubling the search index $\lceil \log(i)\rceil$ times, the algorithm will be at a search index that is greater than or equal to i as $2^{\lceil \log(i)\rceil }\geq i$ . As such, the first stage of the algorithm takes $O(\log i)$ time.

The second part of the algorithm also takes $O(\log i)$ time. As the second stage is simply a binary search, it takes $O(\log n)$ where $n$ is the size of the interval being searched. The size of this interval would be 2^j - 2^{j - 1} where, as seen above, j = $\log i$ . This means that the size of the interval being searched is 2^{log i} - 2^{log i - 1} = 2^{log i - 1}. This gives us a runtime of log (2^{log i - 1}) = log (i) - 1 = $O(\log i)$ .

This gives the algorithm a total runtime, calculated by summing the runtimes of the two stages, of $O(\log i)$ + $O(\log i)$ = 2 $O(\log i)$ = $O(\log i)$ .

Alternatives

Bentley and Yao suggested several variations for exponential search.^[2] These variations consist of performing a binary search, as opposed to a unary search, when determining the upper bound for the binary search in the second stage of the algorithm. This splits the first stage of the algorithm into two parts, making the algorithm a three-stage algorithm overall. The new first stage determines a value $j'$ , much like before, such that $2^{j'}$ is larger than the search key and $2^{j'/2}$ is lower than the search key. Previously, $j'$ was determined in a unary fashion by calculating the next power of 2 (i.e., adding 1 to j). In the variation, it is proposed that $j'$ is doubled instead (e.g., jumping from 2² to 2⁴ as opposed to 2³). The first $j'$ such that $2^{j'}$ is greater than the search key forms a much rougher upper bound than before. Once this $j'$ is found, the algorithm moves to its second stage and a binary search is performed on the interval formed by $j'/2$ and $j'$ , giving the more accurate upper bound exponent j. From here, the third stage of the algorithm performs the binary search on the interval 2^{j - 1} and 2^j, as before. The performance of this variation is $\lfloor \log i\rfloor +2\lfloor \log(\lfloor \log i\rfloor +1)\rfloor +1=O(\log i)$ .

Bentley and Yao generalize this variation into one where any number, k, of binary searches are performed during the first stage of the algorithm, giving the k-nested binary search variation. The asymptotic runtime does not change for the variations, running in $O(\log i)$ time, as with the original exponential search algorithm.

Also, a data structure with a tight version of the dynamic finger property can be given when the above result of the k-nested binary search is used on a sorted array.^[4] Using this, the number of comparisons done during a search is log (d) + log log (d) + ... + O(log ^*d), where d is the difference in rank between the last element that was accessed and the current element being accessed.

Applications

An algorithm based on exponentially increasing the search band solves global pairwise alignment for $O(ns)$ , where $n$ is the length of the sequences and $s$ is the edit distance between them.^[5]^[6]

Related Research Articles

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

<span class="mw-page-title-main">Merge sort</span> Divide and conquer sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure for implementing heapsort.

Interpolation search is an algorithm for searching for a key in an array that has been ordered by numerical values assigned to the keys. It was first described by W. W. Peterson in 1957. Interpolation search resembles the method by which people search a telephone directory for a name : in each step the algorithm calculates where in the remaining search space the sought item might be, based on the key values at the bounds of the search space and the value of the sought key, usually via a linear interpolation. The key value actually found at this estimated position is then compared to the key value being sought. If it is not equal, then depending on the comparison, the remaining search space is reduced to the part before or after the estimated position. This method will only work if calculations on the size of differences between key values are sensible.

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

In computer science, a selection algorithm is an algorithm for finding the $th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .$

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

Bitonic mergesort is a parallel algorithm for sorting. It is also used as a construction method for building a sorting network. The algorithm was devised by Ken Batcher. The resulting sorting networks consist of $comparators and have a delay of, where is the number of items to be sorted. This makes it a popular choice for sorting large numbers of elements on an architecture which itself contains a large number of parallel execution units running in lockstep, such as a typical GPU.$

A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation that determines which of two elements should occur first in the final sorted list. The only requirement is that the operator forms a total preorder over the data, with:

if a ≤ b and b ≤ c then a ≤ c (transitivity)
for all a and b, a ≤ b or b ≤ a (connexity).

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size $s$ from the $n$ -element sequence, and determining the range of the buckets by sorting the sample and choosing $p -1 < s$ elements from the result. These elements then divide the array into $p$ approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations.

Bit length or bit width is the number of binary digits, called bits, necessary to represent an unsigned integer as a binary number. Formally, the bit length of a natural number $is$

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computer science, the reduction operator is a type of operator that is commonly used in parallel programming to reduce the elements of an array into a single result. Reduction operators are associative and often commutative. The reduction of sets of elements is an integral part of programming models such as Map Reduce, where a reduction operator is applied (mapped) to all elements before they are reduced. Other parallel algorithms use reduction operators as primary operations to solve more complex problems. Many reduction operators can be used for broadcasting to distribute data to all processors.

<span class="mw-page-title-main">Merge-insertion sort</span> Type of comparison sorting algorithm

In computer science, merge-insertion sort or the Ford–Johnson algorithm is a comparison sorting algorithm published in 1959 by L. R. Ford Jr. and Selmer M. Johnson. It uses fewer comparisons in the worst case than the best previously known algorithms, binary insertion sort and merge sort, and for 20 years it was the sorting algorithm with the fewest known comparisons. Although not of practical significance, it remains of theoretical interest in connection with the problem of sorting with a minimum number of comparisons. The same algorithm may have also been independently discovered by Stanisław Trybuła and Czen Ping.

In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine. It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.

References

↑ Baeza-Yates, Ricardo; Salinger, Alejandro (2010), "Fast intersection algorithms for sorted sequences", in Elomaa, Tapio; Mannila, Heikki; Orponen, Pekka (eds.), Algorithms and Applications: Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday, Lecture Notes in Computer Science, vol. 6060, Springer, pp. 45–61, Bibcode:2010LNCS.6060...45B, doi:10.1007/978-3-642-12476-1_3, ISBN 9783642124754 .
1 2 Bentley, Jon L.; Yao, Andrew C. (1976). "An almost optimal algorithm for unbounded searching". Information Processing Letters . 5 (3): 82–87. doi:10.1016/0020-0190(76)90071-5. ISSN 0020-0190.
1 2 Jonsson, Håkan (2011-04-19). "Exponential Binary Search". Archived from the original on 2020-06-01. Retrieved 2014-03-24.
↑ Andersson, Arne; Thorup, Mikkel (2007). "Dynamic ordered sets with exponential search trees". Journal of the ACM . 54 (3): 13. arXiv: cs/0210006 . doi:10.1145/1236457.1236460. ISSN 0004-5411. S2CID 8175703.
↑ Ukkonen, Esko (March 1985). "Finding approximate patterns in strings". Journal of Algorithms. 6 (1): 132–137. doi:10.1016/0196-6774(85)90023-9. ISSN 0196-6774.
↑ Šošić, Martin; Šikić, Mile (2016-08-23). "Edlib: a C/C++ library for fast, exact sequence alignment using edit distance". doi:10.1101/070649. S2CID 3818517.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Baeza-Yates-1] Baeza-Yates, Ricardo; Salinger, Alejandro (2010), "Fast intersection algorithms for sorted sequences", in Elomaa, Tapio; Mannila, Heikki; Orponen, Pekka (eds.), Algorithms and Applications: Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday, Lecture Notes in Computer Science, vol. 6060, Springer, pp. 45–61, Bibcode:2010LNCS.6060...45B, doi:10.1007/978-3-642-12476-1_3, ISBN 9783642124754 .

[PaperBentley-2] 1 2 Bentley, Jon L.; Yao, Andrew C. (1976). "An almost optimal algorithm for unbounded searching". Information Processing Letters . 5 (3): 82–87. doi:10.1016/0020-0190(76)90071-5. ISSN 0020-0190.

[NotesJonsson-3] 1 2 Jonsson, Håkan (2011-04-19). "Exponential Binary Search". Archived from the original on 2020-06-01. Retrieved 2014-03-24.

[PaperAndersson-4] Andersson, Arne; Thorup, Mikkel (2007). "Dynamic ordered sets with exponential search trees". Journal of the ACM . 54 (3): 13. arXiv: cs/0210006 . doi:10.1145/1236457.1236460. ISSN 0004-5411. S2CID 8175703.

[5] Ukkonen, Esko (March 1985). "Finding approximate patterns in strings". Journal of Algorithms. 6 (1): 132–137. doi:10.1016/0196-6774(85)90023-9. ISSN 0196-6774.

[6] Šošić, Martin; Šikić, Mile (2016-08-23). "Edlib: a C/C++ library for fast, exact sequence alignment using edit distance". doi:10.1101/070649. S2CID 3818517.

[1]

[2]

[3]

[4]

[5]

[6]