# Cell-probe model

Last updated

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems. Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers. Like the counter machine the RAM has its instructions in the finite-state portion of the machine.

## Overview

The cell-probe model is a minor modification of the random-access machine model, itself a minor modification of the counter machine model, in which computational cost is only assigned to accessing units of memory called cells.

A counter machine is an abstract machine used in formal logic and theoretical computer science to model computation. It is the most primitive of the four types of register machines. A counter machine comprises a set of one or more unbounded registers, each of which can hold a single non-negative integer, and a list of arithmetic and control instructions for the machine to follow. The counter machine is typically used in the process of designing parallel algorithms in relation to the mutual exclusion principle. When used in this manner, the counter machine is used to model the discrete time-steps of a computational system in relation to memory accesses. By modeling computations in relation to the memory accesses for each respective computational step, parallel algorithms may be designed in such a matter to avoid interlocking, the simultaneous writing operation by two threads to the same memory address.

In this model, computation is framed as a problem of querying a set of memory cells. The problem has two phases: the preprocessing phase and the query phase. The input to the first phase, the preprocessing phase, is a set of data from which to build some structure from memory cells. The input to the second phase, the query phase, is a query datum. The problem is to determine if the query datum was included in the original input data set. Operations are free except to access memory cells.

This model is useful in the analysis of data structures. In particular, the model clearly shows a minimum number of memory accesses to solve a problem in which there is stored data on which we would like to run some query. An example of such a problem is the dynamic partial sum problem.  

## History

In Andrew Yao's 1981 paper "Should Tables Be Sorted?",  Andrew described the cell-probe model and used it to give a minimum number of memory cell "probes" or accesses necessary to determine whether a given query datum exists within a table stored in memory. Andrew Chi-Chih Yao is a Chinese computer scientist and computational theorist. He is currently a Professor and the Dean of Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University. Yao used the minimax theorem to prove what is now known as Yao's Principle.

## Formal definition

Given a set of data $S$ construct a structure consisting of $c$ memory cells, each able to store $w$ bits. Then when given a query element $s$ determine whether $s\in S$ with correctness $1-\varepsilon$ by accessing $t$ memory cells. Such an algorithm is called an $\varepsilon$ -error $t$ -probe algorithm using $c$ cells with word size $w$ . 

## Notable results

### Dynamic Partial Sums

The dynamic partial sum problem defines two operations Update$(k,v)$ which conceptually operation sets the value in an array $A$ at index $k$ to be $v$ , and Sum$(k)$ which returns the sum of the values in $A$ at indices $0$ through $k$ . Such an implementation would take $O(1)$ time for Update and $O(n)$ time for Sum. 

Instead, if the values are stored as leaves in a tree whose inner nodes store the values of the subtree rooted at that node. In this structure Update requires $O(\log n)$ time to update each node in the leaf to root path, and Sum similarly requires $O(\log n)$ time to traverse the tree from leaf to root summing the values of all subtrees left of the query index.

Mihai Pătraşcu used the cell-probe model and an information transfer argument to show that the partial sums problem requires $\Omega \left(\log n\right)$ time per operation.  

### Approximate Nearest Neighbor Searching

The exact nearest neighbor search problem is to determine the closest in a set of input points to a given query point. An approximate version of this problem is often considered since many applications of this problem are in very high dimension spaces and solving the problem in high dimensions requires exponential time or space with respect to the dimension. 

Chakrabarti and Regev proved that the approximate nearest neighbor search problem on the Hamming cube using polynomial storage and $d^{O(1)}$ word size requires a worst-case query time of $\Omega \left({\frac {\log \log d}{\log \log \log d}}\right)$ . This proof used the cell-probe model and information theoretic techniques for communication complexity.