Cell-probe model

Last updated

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

Computer science study of the theoretical foundations of information and computation

Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers. Like the counter machine the RAM has its instructions in the finite-state portion of the machine.

Contents

Overview

The cell-probe model is a minor modification of the random-access machine model, itself a minor modification of the counter machine model, in which computational cost is only assigned to accessing units of memory called cells.

A counter machine is an abstract machine used in formal logic and theoretical computer science to model computation. It is the most primitive of the four types of register machines. A counter machine comprises a set of one or more unbounded registers, each of which can hold a single non-negative integer, and a list of arithmetic and control instructions for the machine to follow. The counter machine is typically used in the process of designing parallel algorithms in relation to the mutual exclusion principle. When used in this manner, the counter machine is used to model the discrete time-steps of a computational system in relation to memory accesses. By modeling computations in relation to the memory accesses for each respective computational step, parallel algorithms may be designed in such a matter to avoid interlocking, the simultaneous writing operation by two threads to the same memory address.

This page supplements counter machine.

In this model, computation is framed as a problem of querying a set of memory cells. The problem has two phases: the preprocessing phase and the query phase. The input to the first phase, the preprocessing phase, is a set of data from which to build some structure from memory cells. The input to the second phase, the query phase, is a query datum. The problem is to determine if the query datum was included in the original input data set. Operations are free except to access memory cells.

This model is useful in the analysis of data structures. In particular, the model clearly shows a minimum number of memory accesses to solve a problem in which there is stored data on which we would like to run some query. An example of such a problem is the dynamic partial sum problem. [1] [2]

History

In Andrew Yao's 1981 paper "Should Tables Be Sorted?", [3] Andrew described the cell-probe model and used it to give a minimum number of memory cell "probes" or accesses necessary to determine whether a given query datum exists within a table stored in memory.

Andrew Yao Taiwanese American computer scientist

Andrew Chi-Chih Yao is a Chinese computer scientist and computational theorist. He is currently a Professor and the Dean of Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University. Yao used the minimax theorem to prove what is now known as Yao's Principle.

Formal definition

Given a set of data construct a structure consisting of memory cells, each able to store bits. Then when given a query element determine whether with correctness by accessing memory cells. Such an algorithm is called an -error -probe algorithm using cells with word size . [4]

Notable results

Dynamic Partial Sums

The dynamic partial sum problem defines two operations Update which conceptually operation sets the value in an array at index to be , and Sum which returns the sum of the values in at indices through . Such an implementation would take time for Update and time for Sum. [5]

Instead, if the values are stored as leaves in a tree whose inner nodes store the values of the subtree rooted at that node. In this structure Update requires time to update each node in the leaf to root path, and Sum similarly requires time to traverse the tree from leaf to root summing the values of all subtrees left of the query index.

Mihai Pătraşcu used the cell-probe model and an information transfer argument to show that the partial sums problem requires time per operation. [1] [2]

Approximate Nearest Neighbor Searching

The exact nearest neighbor search problem is to determine the closest in a set of input points to a given query point. An approximate version of this problem is often considered since many applications of this problem are in very high dimension spaces and solving the problem in high dimensions requires exponential time or space with respect to the dimension. [4]

Chakrabarti and Regev proved that the approximate nearest neighbor search problem on the Hamming cube using polynomial storage and word size requires a worst-case query time of . This proof used the cell-probe model and information theoretic techniques for communication complexity.

Related Research Articles

In computer science, the computational complexity, or simply complexity of an algorithm is the amount of resources required for running it. The computational complexity of a problem is the minimum of the complexities of all possible algorithms for this problem.

Bucket sort Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with history stretching back to antiquity.

In computational complexity theory, the 3SUM problem asks if a given set of real numbers contains three elements that sum to zero. A generalized version, k-SUM, asks the same question on k numbers. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation.

In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic. This includes the cases of finding the minimum, maximum, and median elements. There are O(n)-time selection algorithms, and sublinear performance is possible for structured data; in the extreme, O(1) for an array of sorted data. Selection is a subproblem of more complex problems like the nearest neighbor and shortest path problems. Many selection algorithms are derived by generalizing a sorting algorithm, and conversely some sorting algorithms can be derived as repeated application of selection.

Linear probing

Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel and first analyzed in 1963 by Donald Knuth.

In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that is too large to fit into a computer's main memory at one time. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

Range searching

In data structures, the range searching problem most generally consists of preprocessing a set S of objects, in order to determine which objects from S intersect with a query object, called a range. For example, if S is a set of points corresponding to the coordinates of several cities, a geometric variant of the problem is to find cities within a certain latitude and longitude range.

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of but worse storage of , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

In computational complexity theory, and more specifically in the analysis of algorithms with integer data, the transdichotomous model is a variation of the random access machine in which the machine word size is assumed to match the problem size. The model was proposed by Michael Fredman and Dan Willard, who chose its name "because the dichotomy between the machine model and the problem size is crossed in a reasonable manner."

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

Mihai Pătrașcu Romanian computer scientist

Mihai Pătraşcu was a Romanian-American computer scientist at AT&T Labs in Florham Park, New Jersey, USA.

In computer science, one approach to the dynamic optimality problem on online algorithms for binary search trees involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible in order to avoid rectangles with only two points on their boundary.

An oblivious data structure is a data structure that gives no information about the sequence or pattern of the operations that have been applied except for the final result of the operations.

An Oblivious RAM (ORAM) simulator is a compiler that transforms algorithms in such a way that the resulting algorithms preserve the input-output behavior of the original algorithm but the distribution of memory access pattern of the transformed algorithm is independent of the memory access pattern of the original algorithm. The definition of ORAMs is motivated by the fact that an adversary can obtain nontrivial information about the execution of a program and the nature of the data that it is dealing with, just by observing the pattern in which various locations of memory are accessed during its execution. An adversary can get this information even if the data values are all encrypted. The definition suits equally well to the settings of protected programs running on unprotected shared memory as well as a client running a program on its system by accessing previously stored data on a remote server. The concept was formulated by Oded Goldreich in 1987.

In computer science, the predecessor problem involves maintaining a set of items to, given an element, efficiently query which element precedes or succeeds that element in an order. Data structures used to solve the problem include balanced binary search trees, Van Emde Boas trees, and fusion trees. In the static predecessor problem, the set of elements does not change, but in the dynamic predecessor problem, insertions into and deletions from the set are allowed.

In computer science, the augmented map is an abstract data type (ADT) based on ordered maps, which associates each ordered map an augmented value. For an ordered map with key type , comparison function on and value type , the augmented value is defined based on two functions: a base function and a combine function , where is the type of the augmented value. The base function converts a single entry in to an augmented value, and the combine function combines multiple augmented values. The combine function is required to be associative and have an identity . We extend the definition of the associative function as follows:

Parallel external memory

In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine. It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.

References

  1. 1 2 Pătraşcu, Mihai; Demaine, Erik D. (2006). "Logarithmic lower bounds in the cell-probe model". SIAM Journal on Computing. 35 (4): 932–963. arXiv: cs/0502041 . Bibcode:2005cs........2041P. doi:10.1137/s0097539705447256.
  2. 1 2 Pătraşcu, Mihai. "Lower Bounds for Dynamic Partial Sums" (PDF). Retrieved 9 April 2014.
  3. Yao, Andrew Chi-Chih (July 1981). "Should Tables Be Sorted?". J. ACM. 28 (3): 615–628. doi:10.1145/322261.322274.
  4. 1 2 Chakrabarti, Amit; Regev, Oded (2004). "An optimal randomised cell probe lower bound for approximate nearest neighbour searching". Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science: 473–482.
  5. Zatloukal, Kevin (November 10, 2010). "Notes on "Logarithmic Lower Bounds in the Cell-Probe Model"" (PDF). Retrieved 9 April 2014.