Iacono's working set structure

Iacono's working set data structure
Invented	2001
Invented by	John Iacono
Asymptotic complexity; in big O notation
Space	O(n)
Search	O(log w(x))
Insert	O(log n)
Delete	O(log n)

Last updated August 30, 2019

In computer science, Iacono's working set structure^[1] is a comparison based dictionary. It supports insertion, deletion and access operation to maintain a dynamic set of $n$ elements. The working set of an item $x$ is the set of elements that have been accessed in the structure since the last time that $x$ was accessed (or inserted if it was never accessed). Inserting and deleting in the working set structure takes $O(\log n)$ time while accessing an element $x$ takes $O(\log w(x))$ . Here, $w(x)$ represents the size of the working set of $x$ .

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection.

Structure

To store a dynamic set of $n$ elements, this structure consists of a series of Red–black trees , or other Self-balancing binary search trees $T_{1},T_{2},\ldots ,T_{k}$ , and a series of deques (Double-ended queues) $Q_{1},Q_{2},\ldots Q_{k}$ , where $k=\lceil \log \log n\rceil$ . For every $1\leq i\leq k$ , tree $T_{i}$ and deque $Q_{i}$ share the same contents and pointers are maintained between their corresponding elements. For every $i<k$ , the size of $T_{i}$ and $Q_{i}$ is $2^{2^{i}}$ . Tree $T_{k}$ and deque $Q_{k}$ consist of the remaining elements, i.e., their size is $n-\sum _{i=1}^{k-1}2^{2^{i}}$ . Therefore, the number of items in all trees and the number of elements in all deques both add up to $n$ . Every element that has been inserted in the data structure is stored in exactly one of the trees and its corresponding deque.

A red–black tree is a kind of self-balancing binary search tree in computer science. Each node of the binary tree has an extra bit, and that bit is often interpreted as the color of the node. These color bits are used to ensure the tree remains approximately balanced during insertions and deletions.

In computer science, a self-balancingbinary search tree is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions.

In computer science, a double-ended queue is an abstract data type that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail). It is also often called a head-tail linked list, though properly this refers to a specific data structure implementation of a deque.

Working set Invariant

In the deques of this structure, elements are kept in sorted order according to their working set size. Formally, element $x$ lies after $y$ in deque $Q_{i}$ if and only if $w(x)<w(y)$ . Moreover, for every $1\leq i<k$ , the elements in deque $Q_{i}$ have a smaller working sets than the elements in deque $Q_{i+1}$ . This property is referred to as the Working set invariant. Every operation in the data structure maintains the Working set invariant.

Operations

The basic operation in this structure is called shift from $h$ to $j$ , where $h$ and $j$ are indices of some trees in the structure. Two cases are considered in a shift from $h$ to $j$ : If $h<j$ , then for every $h\leq i<j$ , taken in increasing order, an item is dequeued from $Q_{i}$ and enqueued into $Q_{i+1}$ . The corresponding item is deleted from $T_{i}$ and inserted into $T_{i+1}$ . The running time of this operation is $O(\sum _{i=h}^{j}\log |T_{i}|)=O(\sum _{i=h}^{j}\log 2^{2^{i}})=O(2^{j})$ . Analogously, if $j<h$ , then for every $j<i\leq h$ , taken in decreasing order, an item is dequeued from $Q_{i}$ and enqueued into $Q_{i-1}$ . The corresponding item is deleted from $T_{i}$ and inserted into $T_{i-1}$ . The running time of this operation is $O(\sum _{i=j}^{h}\log |T_{i}|)=O(\sum _{i=j}^{h}\log 2^{2^{i}})=O(2^{h})$ . Regardless of the case, after a shift operation, the size of $T_{h}$ decreases by one whereas the size of $T_{j}$ increases by one. Since that elements in the deques are sorted with respect to their working sets sizes, a shift operation maintains the Working set invariant.

Search

To search for an element $x$ , search for $x$ in $T_{1},T_{2},\ldots T_{k}$ , in increasing order, until finding a tree $T_{j}$ containing $x$ . If no tree is found, the search is unsuccessful. If $x$ is found, it is deleted from $T_{j}$ and then inserted into $T_{1}$ , i.e., it is moved to the front of the structure. The search finishes by performing a shift from $1$ to $j$ which restores the size of every tree and every deque to their size prior to the search operation. The running time of this search is $O(\sum _{i=1}^{j}\log 2^{2^{i}})=O(2^{j})$ if the search was successful, or $O(\sum _{i=j}^{k}\log 2^{2^{i}})=O(2^{k})=O(\log n)$ otherwise. By the Working set property, every element in trees $T_{1},T_{2},\ldots ,T_{j-1}$ belongs to the working set of $x$ . In particular, every element in $T_{j-1}$ belongs to the working set of $x$ and hence, $w(x)>|T_{j-1}|=2^{2^{j-1}}$ . Thus, the running time of a successful search is $O(2^{j})=O(\log 2^{2^{j-1}})=O(\log w(x))$ .

Insert

Inserting an element $x$ into the structure is performed by inserting $x$ into $T_{1}$ and enqueuing it into $Q_{1}$ . Insertion is completed by performing a shift from $1$ to $k$ . To avoid overflow, if $|T_{k}|=2^{2^{k}}$ before the shift, i.e., if the last tree is full, then $k$ is incremented and a new empty $T_{k}$ and $Q_{k}$ is created. The running time of this operation is dominated by the shift from $1$ to $k$ whose running time is $O(2^{k})=O(2^{\log \log n})=O(\log n)$ . Since element $x$ , whose working set is the smallest, is enqueued in $Q_{1}$ , the Working set invariant is preserved after the shift.

Delete

Deleting an element $x$ is done by searching for $x$ on each tree in the structure, in increasing order, until finding a tree $T_{j}$ that contains it (if non is found the deletion is unsuccessful). Item $x$ is deleted from $T_{j}$ and $Q_{j}$ . Finally, a shift from $k$ to $j$ maintains the size of $T_{j}$ equal to $2^{2^{j}}$ . The running time of this operation is $O(2^{k})=O(\log n)$ . The working set invariant is preserved as deleting an element does not change the order of the working set of the elements.

Discussion

Splay tree s are self adjusting search trees introduced by Sleator and Tarjan^[2] in 1985. Using restructuring heuristic, splay trees are able to achieve insert and delete operations in $O(\log n)$ amortized time, without storing any balance information at the nodes. Moreover, the Working Set Theorem for splay trees states that the cost to access an element in a splay tree is $O(\log w(x))$ amortized. Iacono's workings set structure obtains the same running time for search, insert and delete in the worst-case. Therefore, offering an alternative to splay trees.

A splay tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For many sequences of non-random operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case run time per operation, rather than per algorithm, can be too pessimistic.

Related Research Articles

AVL tree one kind of self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take $O(log n)$ time in both the average and worst cases, where $is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.$

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, a priority queue is an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued, while in other implementations, ordering of elements with the same priority is undefined.

In computer science, a queue is a collection in which the entities in the collection are kept in order and the principal operations on the collection are the addition of entities to the rear terminal position, known as enqueue, and removal of entities from the front terminal position, known as dequeue. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that once a new element is added, all elements that were added before have to be removed before the new element can be removed. Often a peek or front operation is also entered, returning the value of the front element without dequeuing it. A queue is an example of a linear data structure, or more abstractly a sequential collection.

Binary heap heap data structure that takes the form of a binary tree

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

In computer science, a binomial heap is a heap similar to a binary heap but also supports quick merging of two heaps. This is achieved by using a special tree structure. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. Binomial heaps were invented in 1978 by J. Vuillemin.

In computer science, a skip list is a data structure that allows $search complexity as well as insertion complexity within an ordered sequence of elements. Thus it can get the best of array while maintaining a linked list-like structure that allows insertion- which is not possible in an array. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous one. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. Via the linked hierarchy, these two elements link to elements of the next sparsest subsequence, where searching is continued until finally we are searching in the full sequence. The elements that are skipped over may be chosen probabilistically or deterministically, with the former being more common.$

In computational complexity theory, the 3SUM problem asks if a given set of $real numbers contains three elements that sum to zero. A generalized version, k -SUM, asks the same question on k numbers. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation.$

A van Emde Boas tree, also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with $m$ -bit integer keys. It performs all operations in $O (log m)$ time, or equivalently in $O (log log M)$ time, where $M = 2 m$ is the maximum number of elements that can be stored in the tree. The $M$ is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has good space efficiency when it contains a large number of elements, as discussed below. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

In computer science, a queap is a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time.

In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982 to decrease the O(n log M) space used by an x-fast trie.

In computer science, the order-maintenance problem involves maintaining a totally ordered set supporting the following operations:

In computer science, an optimal binary search tree , sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time for a given sequence of accesses. Optimal BSTs are generally divided into two types: static and dynamic.

In computer science, a finger search on a data structure is an extension of any search operation that structure supports, where a reference (finger) to an element in the data structure is given along with the query. While the search time for an element is most frequently expressed as a function of the number of elements in a data structure, finger search times are a function of the distance between the element and the finger.

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

An oblivious data structure is a data structure that gives no information about the sequence or pattern of the operations that have been applied except for the final result of the operations.

References

↑ Iacono, John (2001). "Alternatives to splay trees with O(log n) worst-case access times" (PDF). Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms: 516–522.
↑ Sleator, Daniel D.; Tarjan, Robert E. (1985), "Self-Adjusting Binary Search Trees" (PDF), Journal of the ACM , 32 (3): 652–686, doi:10.1145/3828.3835

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[WorkingSetStructure-1] Iacono, John (2001). "Alternatives to splay trees with O(log n) worst-case access times" (PDF). Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms: 516–522.

[SleatorTarjan-2] Sleator, Daniel D.; Tarjan, Robert E. (1985), "Self-Adjusting Binary Search Trees" (PDF), Journal of the ACM , 32 (3): 652–686, doi:10.1145/3828.3835