Finger search

Last updated
Example of finger search on treaps. Performing finger searches on treaps.svg
Example of finger search on treaps.

In computer science, a finger search on a data structure is an extension of any search operation that structure supports, where a reference (finger) to an element in the data structure is given along with the query. While the search time for an element is most frequently expressed as a function of the number of elements in a data structure, finger search times are a function of the distance between the element and the finger.

Contents

In a set of n elements, the distance d(x,y) (or simply d when unambiguous) between two elements x and y is their difference in rank. If x and y are the ith and jth largest elements in the structure, then the difference in rank is |i - j|. If a normal search in some structure would normally take O(f(n)) time, then a finger search for x with finger y should ideally take O(f(d)) time. Remark that since dn, it follows that in the worst case, finger search is only as bad as normal search. However, in practice these degenerate finger searches do more work than normal searches. For instance, if f(n) is log(n), and finger search does twice as many comparisons as normal search in the worst case, it follows that finger search is slower when d > n. Therefore, finger search should only be used when one can reasonably expect the target to actually be close to the finger.

Implementations

Some popular data structures support finger search with no additional changes to the actual structure. In structures where searching for an element x is accomplished by narrowing down an interval over which x can be found, finger search from y is typically accomplished by reversing the search process from y until the search interval is large enough to contain x, at which point search proceeds as normal.

Sorted linked lists

In a linked list, one normally searches linearly for an element by walking from one end to the other. If the linked list is sorted, and we have a reference to some node containing y, then we may find x in O(d) time by starting our search from y.

Sorted arrays

In a sorted array A, one normally searches for an element x in A with a binary search. Finger search is accomplished by performing a one-sided search from A[j] = y. While binary search halves the search space after each comparison, one-sided search doubles the search space after each comparison. Specifically, on the kth iteration of one-sided search (assuming x > y), the interval under consideration is A[j, j+2k-1]. Expansion halts as soon as A[j + 2k-1] ≥ x, at which point this interval is binary searched for x.

If one-sided search takes k iterations to find an interval that contains x, then it follows that d > 2k-2. Binary searching this range will also take another k iterations. Therefore, finger search for x from y takes O(k) = O(log d) time.

Skip lists

In a skip list, one can finger search for x from a node containing the element y by simply continuing the search from this point. Note that if x < y, then search proceeds backwards, and if x > y, then search proceeds forwards. The backwards case is symmetric to normal search in a skip list, but the forward case is actually more complex. Normally, search in a skiplist is expected to be fast because the sentinel at the start of the list is as tall as the tallest node. However, our finger could be a node of height 1. Because of this, we may occasionally climb while trying to search; something which never occurs normally. However, even with this complication, we can achieve O(log d) expected search time. [1]

Treaps

A treap is a randomized binary search tree (BST). Searching in a treap is the same as searching for an element in any other BST. Treaps however have the property that the expected path length between two elements of distance d is O(log d). Thus, to finger search from the node containing y for x, one can climb the tree from y until an ancestor of x is found, at which point normal BST search proceeds as usual. While determining if a node is the ancestor of another is non-trivial, one may augment the tree to support queries of this form to give expected O(log d) finger search time. [1]

Ropes and trees

Implementations of the rope (data structure) typically implement a cord position iterator to traverse the string. The iterator can be seen as a finger that points at some specific character of the string. Like most balanced trees, ropes require O(log(n)) time to retrieve data in one leaf of the tree when given only the root of the tree. Reading every leaf of the tree would seem to require O(n*log(n)) time. However, by storing a little extra information, the iterator can be used to read "the next" leaf in only O(1) time, and every leaf of a tree in only O(n) time. Implementions of ropes typically cache "extra information" about the entire the path from the root to the current node position in the iterator. Other implementations of tree data structures sometimes store "extra information" in the tree itself, storing a pointer in each node to its parent or its successor (in addition to the normal pointers in each node to its children), and storing only the current node position in the iterator. [2] [3]

Generalizations

If one can perform finger search in an iterative manner in O(f(d)) time, where each iteration takes O(1) time, then by providing c different fingers, one can perform finger search in O(c min{d(x, y1), ..., d(x, yc)}) time. This is accomplished by starting finger search for all c fingers, and iterating them forward one step each until the first one terminates.

Given any sequence A = [a1, ..., am] of m accesses, a structure is said to have the static finger property for a fixed finger f, if the time to perform A is . Splay trees have this property for any choice of f with no additional processing on sufficiently large sequences of accesses. [4]

Applications

Finger search can be used to re-use work already done in previous searches. For instance, one way to iterate over the elements in a data structure is to simply finger search for them in order, where the finger for one query is the location of the result of the last. One may optimize their data structure by doing this internally if it is known that searches are frequently near the last search.

A finger search tree is a type of data structure that allows fingers to be specified such that all or some of their supported operations are faster when they access or modify a location near a finger. In contrast to the finger searches described in this article, these fingers are not provided at query time, but are separately specified so that all future operations use these fingers. One benefit of this is that the user needs not manipulate or store internal references to the structure, they may simply specify an element in the structure.

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is directly proportional to the height of the tree.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key and creating point clouds. k-d trees are a special case of binary space partitioning trees.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself.

In computer science, a segment tree, also known as a statistic tree, is a tree data structure used for storing information about intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, it's a structure that cannot be modified once it's built. A similar data structure is the interval tree.

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of but worse storage of , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

<span class="mw-page-title-main">Cartesian tree</span>

In computer science, a Cartesian tree is a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence. Introduced by Vuillemin (1980) in the context of geometric range searching data structures, Cartesian trees have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack-based algorithm for finding all nearest smaller values in a sequence.

In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Two different distributions are commonly used: binary trees formed by inserting nodes one at a time according to a random permutation, and binary trees chosen from a uniform discrete distribution in which all distinct trees are equally likely. It is also possible to form other distributions, for instance by repeated splitting. Adding and removing nodes directly in a random binary tree will in general disrupt its random structure, but the treap and related randomized binary search tree data structures use the principle of binary trees formed from a random permutation in order to maintain a balanced binary search tree dynamically as nodes are inserted and deleted.

In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982 to decrease the O(n log M) space used by an x-fast trie.

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

<span class="mw-page-title-main">Finger search tree</span>

In computer science, finger search trees are a type of binary search tree that keeps pointers to interior nodes, called fingers. The fingers speed up searches, insertions, and deletions for elements close to the fingers, giving amortized O(log n) lookups, and amortized O(1) insertions and deletions. It should not be confused with a finger tree nor a splay tree, although both can be used to implement finger search trees.

References

  1. 1 2 "Randomized Splay Trees: Theoretical and Experimental Results" (PDF).
  2. "General design concerns for a tree iterator".
  3. Steven J. Zeil. "Traversing Trees with Iterators" Archived 2016-02-16 at the Wayback Machine .
  4. "John Iacono. Key independent optimality. Algorithmica, 42(1):3-10, 2005" (PDF). Archived from the original (PDF) on 2010-06-13.