Implicit data structure

Last updated

In computer science, an implicit data structure or space-efficient data structure is a data structure that stores very little information other than the main or required data: a data structure that requires low overhead. They are called "implicit" because the position of the elements carries meaning and relationship between elements; this is contrasted with the use of pointers to give an explicit relationship between elements. Definitions of "low overhead" vary, but generally means constant overhead; in big O notation, O(1) overhead. A less restrictive definition is a succinct data structure, which allows greater overhead.

Contents

Definition

Formally, an implicit data structure is one with constant O(1) space overhead (above the information-theoretic lower bound).

Historically, Munro, & Suwanda (1980) defined an implicit data structure (and algorithms acting on one) as one "in which structural information is implicit in the way data are stored, rather than explicit in pointers." They are somewhat vague in the definition, defining it most strictly as a single array, with only the size retained (a single number of overhead), [1] or more loosely as a data structure with constant overhead (O(1)). [2] This latter definition is today more standard, and the still-looser notion of a data structure with non-constant but small o(n) overhead is today known as a succinct data structure, as defined by Jacobson (1988); it was referred to as semi-implicit by Munro, & Suwanda (1980). [3]

A fundamental distinction is between static data structures (read-only) and dynamic data structures (which can be modified). Simple implicit data structures, such as representing a sorted list as an array, may be very efficient as a static data structure, but inefficient as a dynamic data structure, due to modification operations (such as insertion in the case of a sorted list) being inefficient.

Examples

A trivial example of an implicit data structure is an array data structure , which is an implicit data structure for a list, and requires only the constant overhead of the length; unlike a linked list, which has a pointer associated with each data element, which explicitly gives the relationship from one element to the next. Similarly, a null-terminated string is an implicit data structure for a string (list of characters). These are considered very simple because they are static data structures (read-only), and only admit the simple operation of iteration over the elements.

Similarly simple is representing a multi-dimensional array as a single 1-dimensional array, together with its dimensions. For example, representing an m × n array as a single list of length m·n, together with the numbers m and n (instead of as a 1-dimensional array of pointers to each 1-dimensional subarray). The elements need not be of the same type, and a table of data (a list of records) may similarly be represented implicitly as a flat (1-dimensional) list, together with the length of each field, so long as each field has uniform size (so a single size can be used per field, not per record).

A less trivial example is a representing a sorted list by a sorted array , which allows search in logarithmic time by binary search. Contrast with a search tree, specifically a binary search tree, which also allows logarithmic-time search, but requires pointers. A sorted array is only efficient as a static data structure, as modifying the list is slow – unlike a binary search tree – but does not require the space overhead of a tree.

An important example of an implicit data structure is representing a perfect binary tree as a list, in increasing order of depth, so root, first left child, first right child, first left child of first left child, etc. Such a tree occurs notably for an ancestry chart to a give depth, and the implicit representation is known as an Ahnentafel (ancestor table).

This can be generalized to a complete binary tree (where the last level may be incomplete), which yields the best-known example of an implicit data structure, namely the binary heap , which is an implicit data structure for a priority queue. This is more sophisticated than earlier examples because it allows multiple operations, and is an efficient dynamic data structure (it allows efficient modification of the data): not only top, but also insert and pop.

More sophisticated implicit data structures include the beap (bi-parental heap).

History

The trivial examples of lists or tables of values date to prehistory, while historically non-trivial implicit data structures date at least to the Ahnentafel, which was introduced by Michaël Eytzinger in 1590 for use in genealogy. In formal computer science, the first implicit data structure is generally considered to be the sorted list, used for binary search, which was introduced by John Mauchly in 1946, in the Moore School Lectures, the first ever set of lectures regarding any computer-related topic. [4] [5] The binary heap was introduced in Williams (1964) to implement the heapsort. [5] The notion of an implicit data structure was formalized in Munro & Suwanda (1980), as part of introducing and analyzing the beap. [5]

See also

Related Research Articles

In computer science, an array data structure, or simply an array, is a data structure consisting of a collection of elements, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

Binary search algorithm Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

Binary search tree data structure in tree form with 0, 1, or 2 children per node, sorted for fast lookup

In computer science, binary search trees (BST), sometimes called ordered or sorted binary trees, are a particular type of container: a data structure that stores "items" in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key.

Binary tree tree data structure in which each node has at most two children

In computer science, a binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child. A recursive definition using just set theory notions is that a (non-empty) binary tree is a tuple, where L and R are binary trees or the empty set and S is a singleton set. Some authors allow the binary tree to be the empty set as well.

Heap (data structure) tree-based data structure in computer science

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

Insertion sort sorting algorithm that, at each iteration, inserts the current input element into the suitable position between the already sorted elements

Insertion sort is a simple sorting algorithm that builds the final sorted array one item at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

In computer science, a linked list is a linear collection of data elements, whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains: data, and a reference to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing more efficient insertion or removal of nodes at arbitrary positions. A drawback of linked lists is that access time is linear. Faster access, such as random access, is not feasible. Arrays have better cache locality compared to linked lists.

Trie A type of search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a kind of search tree—an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Keys tend to be associated with leaves, though some inner nodes may correspond to keys of interest. Hence, keys are not necessarily associated with every node. For the space-optimized presentation of prefix tree, see compact prefix tree.

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection.

Self-balancing binary search tree any node-based binary search tree that automatically keeps its height small

In computer science, a self-balancingbinary search tree is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm. It is a term commonly encountered in computer science research as a result of widespread use of big-O notation.

Beap data structure where a node usually has two parents

A beap, or bi-parental heap, is a data structure where a node usually has two parents and two children. Unlike a heap, a beap allows sublinear search. The beap was introduced by Ian Munro and Hendra Suwanda. A related data structure is the Young tableau.

Recursion (computer science) Use of functions that call themselves (on smaller inputs)

Recursion in computer science is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem. Such problems can generally be solved by iteration, but this needs to identify and index the smaller instances at programming time. At the opposite, recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, in which the size of the data structure depends upon the particular data being represented.

A sorted array is an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static lookup tables to hold multiple values which have the same data type. Sorting an array is useful in organising data in ordered form and recovering them rapidly.

Control table data structure

Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design. In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database.

Ian Munro (computer scientist) Canadian computer scientist

James Ian Munro is a Canadian computer scientist. He is known for his fundamental contributions to algorithms and data structures.

References

  1. "Thus, only a simple array is needed for the data.", p. 236; "We will draw no formal distinction between a pointer and an integer (index) in the range . A data structure is then implicit, if the only such integer which need be retained is N itself.", p. 238
  2. "... one might prefer to permit a constant number of pointers to be retained and still designate the structure as implicit.", p. 238
  3. "We will also suggest two structures which might be described as “semi-implicit,” in that a variable, but o(N), number of pointers (indices) is kept.", p. 238
  4. Knuth 1998, §6.2.1 ("Searching an ordered table"), subsection "History and bibliography".
  5. 1 2 3 Franceschini, Gianni; Munro, J. Ian (2006). Implicit dictionaries with O(1) modifications per update and fast search. Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms. Miami, FL, United States. pp. 404–413. doi:10.1145/1109557.1109603.

Further reading

See publications of Hervé Brönnimann, J. Ian Munro, and Greg Frederickson.