Key-independent optimality

Last updated

Key-independent optimality is a property of some binary search tree data structures in computer science proposed by John Iacono. [1] Suppose that key-value pairs are stored in a data structure, and that the keys have no relation to their paired values. A data structure has key-independent optimality if, when randomly assigning the keys, the expected performance of the data structure is within a constant factor of the optimal data structure. Key-independent optimality is related to dynamic optimality.

Binary search tree data structure in tree form with 0, 1, or 2 children per node, sorted for fast lookup

In computer science, binary search trees (BST), sometimes called ordered or sorted binary trees, are a particular type of container: a data structure that stores "items" in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key.

Computer science Study of the theoretical foundations of information and computation

Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

John Iacono is an American computer scientist specializing in data structures, algorithms and computational geometry. He is one of the inventors of the tango tree, the first known competitive binary search tree data structure.

Contents

Definitions

There are many binary search tree algorithms that can look up a sequence of keys , where each is a number between and . For each sequence , let be the fastest binary search tree algorithm that looks up the elements in in order. Let be one of the possible permutation of the sequence , chosen at random, where is the th entry of . Let . Iacono defined, for a sequence , that .

A data structure has key-independent optimality if it can lookup the elements in in time .

Relationship with other bounds

Key-independent optimality has been proved to be asymptotically equivalent to the working set theorem. Splay trees are known to have key-independent optimality.

A splay tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For many sequences of non-random operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

Related Research Articles

Binary search algorithm Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

Heap (data structure) tree-based data structure in computer science

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

Bucket sort Sorting algorithm

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

Perceptron algorithm for supervised learning of binary classifiers

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

Self-balancing binary search tree any node-based binary search tree that automatically keeps its height small

In computer science, a self-balancingbinary search tree is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions.

In information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>m</i>-ary tree Tree data structure in which each node has at most K children. From a graph theoretical perspective, k-ary trees are actually arborescences.

In graph theory, an m-ary tree is a rooted tree in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

A tango tree is a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Pătrașcu in 2004. It is named after Buenos Aires, of which the tango is emblematic.

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, in which the size of the data structure depends upon the particular data being represented.

In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Two different distributions are commonly used: binary trees formed by inserting nodes one at a time according to a random permutation, and binary trees chosen from a uniform discrete distribution in which all distinct trees are equally likely. It is also possible to form other distributions, for instance by repeated splitting. Adding and removing nodes directly in a random binary tree will in general disrupt its random structure, but the treap and related randomized binary search tree data structures use the principle of binary trees formed from a random permutation in order to maintain a balanced binary search tree dynamically as nodes are inserted and deleted.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size s from the n-element sequence, and determining the range of the buckets by sorting the sample and choosing m -1 elements from the result. These elements then divide the sample into m equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

In computer science, one approach to the dynamic optimality problem on online algorithms for binary search trees involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible in order to avoid rectangles with only two points on their boundary.

In computer science, an optimal binary search tree , sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time for a given sequence of accesses. Optimal BSTs are generally divided into two types: static and dynamic.

Finger search

In computer science, a finger search on a data structure is an extension of any search operation that structure supports, where a reference (finger) to an element in the data structure is given along with the query. While the search time for an element is most frequently expressed as a function of the number of elements in a data structure, finger search times are a function of the distance between the element and the finger.

In the theory of optimal binary search trees, the interleave lower bound is a lower bound on the number of operations required by a binary search tree (BST) to execute a given sequence of accesses.

Reduce is a collective communication primitive used in the context of a parallel programming model to combine multiple vectors into one, using an associative binary operator . Every vector is present at a distinct processor in the beginning. The goal of the primitive is to apply the operator in the order given by the processor-indices to the vectors until only one is left. The reduction of sets of elements is an integral part of programming models such as Map Reduce, where a function is applied (mapped) to all elements before they are reduced. Other parallel algorithms use reduce as a primary operation to solve more complex problems. The Message Passing Interface implements it in the operations MPI_Reduce and MPI_Allreduce, with the difference that the result is available at one (root) processing unit or all of them. Closely related to reduce is the broadcast operation, which distributes data to all processors. Many reduce algorithms can be used for broadcasting by reverting them and omitting the operator.

The Garsia–Wachs algorithm is an efficient method for computers to construct optimal binary search trees and alphabetic Huffman codes, in linearithmic time. It is named after Adriano Garsia and Michelle L. Wachs.

References

  1. "John Iacono. Key independent optimality. Algorithmica, 42(1):3-10, 2005" (PDF). Archived from the original (PDF) on 2010-06-13.