Cell-probe model

Last updated

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

Contents

Overview

The cell-probe model is a minor modification of the random-access machine model, itself a minor modification of the counter machine model, in which computational cost is only assigned to accessing units of memory called cells.

In this model, computation is framed as a problem of querying a set of memory cells. The problem has two phases: the preprocessing phase and the query phase. The input to the first phase, the preprocessing phase, is a set of data from which to build some structure from memory cells. The input to the second phase, the query phase, is a query datum. The problem is to determine if the query datum was included in the original input data set. Operations are free except to access memory cells.

This model is useful in the analysis of data structures. In particular, the model clearly shows a minimum number of memory accesses to solve a problem in which there is stored data on which we would like to run some query. An example of such a problem is the dynamic partial sum problem. [1] [2]

History

In Andrew Yao's 1981 paper "Should Tables Be Sorted?", [3] Yao described the cell-probe model and used it to give a minimum number of memory cell "probes" or accesses necessary to determine whether a given query datum exists within a table stored in memory.

Formal definition

Given a set of data construct a structure consisting of memory cells, each able to store bits. Then when given a query element determine whether with correctness by accessing memory cells. Such an algorithm is called an -error -probe algorithm using cells with word size . [4]

Notable results

Dynamic Partial Sums

The dynamic partial sum problem defines two operations Update(k,v) which conceptually operation sets the value in an array A at index k to be v, and Sum(k) which returns the sum of the values in A at indices 0 through k. Such an implementation would take time for Update and time for Sum. [5]

Instead, if the values are stored as leaves in a tree whose inner nodes store the values of the subtree rooted at that node. In this structure Update requires time to update each node in the leaf to root path, and Sum similarly requires time to traverse the tree from leaf to root summing the values of all subtrees left of the query index.

Mihai Pătraşcu used the cell-probe model and an information transfer argument to show that the partial sums problem requires time per operation. [1] [2]

Approximate Nearest Neighbor Searching

The exact nearest neighbor search problem is to determine the closest in a set of input points to a given query point. An approximate version of this problem is often considered since many applications of this problem are in very high dimension spaces and solving the problem in high dimensions requires exponential time or space with respect to the dimension. [4]

Chakrabarti and Regev proved that the approximate nearest neighbor search problem on the Hamming cube using polynomial storage and word size requires a worst-case query time of . This proof used the cell-probe model and information theoretic techniques for communication complexity.

Related Research Articles

In computational complexity theory, the 3SUM problem asks if a given set of real numbers contains three elements that sum to zero. A generalized version, k-SUM, asks the same question on k numbers. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key and creating point clouds. k-d trees are a special case of binary space partitioning trees.

<span class="mw-page-title-main">Linear probing</span> Computer programming method for hashing

Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel and first analyzed in 1963 by Donald Knuth.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

<span class="mw-page-title-main">Range searching</span>

In computer science, the range searching problem consists of processing a set S of objects, in order to determine which objects from S intersect with a query object, called the range. For example, if S is a set of points corresponding to the coordinates of several cities, find the subset of cities within a given range of latitudes and longitudes.

A tango tree is a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Pătrașcu in 2004. It is named after Buenos Aires, of which the tango is emblematic.

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself.

The dynamic convex hull problem is a class of dynamic problems in computational geometry. The problem consists in the maintenance, i.e., keeping track, of the convex hull for input data undergoing a sequence of discrete changes, i.e., when input data elements may be inserted, deleted, or modified. It should be distinguished from the kinetic convex hull, which studies similar problems for continuously moving points. Dynamic convex hull problems may be distinguished by the types of the input data and the allowed types of modification of the input data.

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of but worse storage of , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

In computational complexity theory, and more specifically in the analysis of algorithms with integer data, the transdichotomous model is a variation of the random access machine in which the machine word size is assumed to match the problem size. The model was proposed by Michael Fredman and Dan Willard, who chose its name "because the dichotomy between the machine model and the problem size is crossed in a reasonable manner."

In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are.

In data structures, a range query consists of pre-processing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

In computer science, one approach to the dynamic optimality problem on online algorithms for binary search trees involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible in order to avoid rectangles with only two points on their boundary.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

An oblivious RAM (ORAM) simulator is a compiler that transforms algorithms in such a way that the resulting algorithms preserve the input-output behavior of the original algorithm but the distribution of the memory access pattern of the transformed algorithm is independent of the memory access pattern of the original algorithm.

In computer science, the predecessor problem involves maintaining a set of items to, given an element, efficiently query which element precedes or succeeds that element in an order. Data structures used to solve the problem include balanced binary search trees, van Emde Boas trees, and fusion trees. In the static predecessor problem, the set of elements does not change, but in the dynamic predecessor problem, insertions into and deletions from the set are allowed.

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

In computer science, the augmented map is an abstract data type (ADT) based on ordered maps, which associates each ordered map an augmented value. For an ordered map with key type , comparison function on and value type , the augmented value is defined based on two functions: a base function and a combine function , where is the type of the augmented value. The base function converts a single entry in to an augmented value, and the combine function combines multiple augmented values. The combine function is required to be associative and have an identity . We extend the definition of the associative function as follows:

References

  1. 1 2 Pătraşcu, Mihai; Demaine, Erik D. (2006). "Logarithmic lower bounds in the cell-probe model". SIAM Journal on Computing. 35 (4): 932–963. arXiv: cs/0502041 . Bibcode:2005cs........2041P. doi:10.1137/s0097539705447256. S2CID   2202874.
  2. 1 2 Pătraşcu, Mihai. "Lower Bounds for Dynamic Partial Sums" (PDF). Retrieved 9 April 2014.
  3. Yao, Andrew Chi-Chih (July 1981). "Should Tables Be Sorted?". J. ACM. 28 (3): 615–628. doi: 10.1145/322261.322274 .
  4. 1 2 Chakrabarti, Amit; Regev, Oded (2004). "An Optimal Randomised Cell Probe Lower Bound for Approximate Nearest Neighbour Searching". 45th Annual IEEE Symposium on Foundations of Computer Science. pp. 473–482. doi:10.1109/FOCS.2004.12. ISBN   0-7695-2228-9. S2CID   3004976.
  5. Zatloukal, Kevin (November 10, 2010). "Notes on "Logarithmic Lower Bounds in the Cell-Probe Model"" (PDF). Retrieved 9 April 2014.