Order-maintenance problem

Last updated

In computer science, the order-maintenance problem involves maintaining a totally ordered set supporting the following operations:

Contents

Paul Dietz first introduced a data structure to solve this problem in 1982. [1] This data structure supports insert(X, Y) in (in Big O notation) amortized time and order(X, Y) in constant time but does not support deletion. Athanasios Tsakalidis used BB[α] trees with the same performance bounds that supports deletion in and improved insertion and deletion performance to amortized time with indirection. [2] Dietz and Daniel Sleator published an improvement to worst-case constant time in 1987. [3] Michael Bender, Richard Cole and Jack Zito published significantly simplified alternatives in 2002. [4] Bender, Fineman, Gilbert, Kopelowitz and Montes also published a deamortized solution in 2017. [5]

Efficient data structures for order-maintenance have applications in many areas, including data structure persistence, [6] graph algorithms [7] [8] and fault-tolerant data structures. [9]

List labeling

A problem related to the order-maintenance problem is the list-labeling problem in which instead of the order(X, Y) operation the solution must maintain an assignment of labels from a universe of integers to the elements of the set such that X precedes Y in the total order if and only if X is assigned a lesser label than Y. It must also support an operation label(X) returning the label of any node X. Note that order(X, Y) can be implemented simply by comparing label(X) and label(Y) so that any solution to the list-labeling problem immediately gives one to the order-maintenance problem. In fact, most solutions to the order-maintenance problem are solutions to the list-labeling problem augmented with a level of data structure indirection to improve performance. We will see an example of this below.

For a list-labeling problem on sets of size up to , the cost of list labeling depends on how large is a function of . The relevant parameter range for order maintenance are for , for which an amortized cost solution is known, [10] and for which a constant time amortized solution is known [11]

O(1) amortized insertion via indirection

Indirection is a technique used in data structures in which a problem is split into multiple levels of a data structure in order to improve efficiency. Typically, a problem of size is split into problems of size . For example, this technique is used in y-fast tries. This strategy also works to improve the insertion and deletion performance of the data structure described above to constant amortized time. In fact, this strategy works for any solution of the list-labeling problem with amortized insertion and deletion time.

The order-maintenance data structure with indirection. The total order elements are stored in
O
(
N
/
log
[?]
N
)
{\displaystyle O(N/\log N)}
contiguous sublists of size
O
(
log
[?]
N
)
{\displaystyle O(\log N)}
, each of which has a representative in the scapegoat tree. Depiction of indirection in a tree based solution to the order-maintenance problem.svg
The order-maintenance data structure with indirection. The total order elements are stored in contiguous sublists of size , each of which has a representative in the scapegoat tree.

The new data structure is completely rebuilt whenever it grows too large or too small. Let be the number of elements of the total order when it was last rebuilt. The data structure is rebuilt whenever the invariant is violated by an insertion or deletion. Since rebuilding can be done in linear time this does not affect the amortized performance of insertions and deletions.

During the rebuilding operation, the elements of the total order are split into contiguous sublists, each of size . The list labeling problem is solved on the set set of nodes representing each of the sublists in their original list order. The labels for this subproblem are taken to be polynomial --- say , so that they can be compared in constant time and updated in amortized time.

For each sublist a doubly-linked list of its elements is built storing with each element a pointer to its representative in the tree as well as a local integer label. The local integer labels are also taken from a range , so that the can be compared in constant time, but because each local problem involves only items, the labels range is exponential in the number of items being labeled. Thus, they can be updated in amortized time.

See the list-labeling problem for details of both solutions.

Order

Given the sublist nodes X and Y, order(X, Y) can be answered by first checking if the two nodes are in the same sublist. If so, their order can be determined by comparing their local labels. Otherwise the labels of their representatives in the first list-labeling problem are compared. These comparisons take constant time.

Insert

Given a new sublist node for X and a pointer to the sublist node Y, insert(X, Y) inserts X immediately after Y in the sublist of Y, if there is room for X in the list, that is if the length of the list is no greater than after the insertion. It's local label is given by the local list labeling algorithm for exponential labels. This case takes amortized time.

If the local list overflows, it is split evenly into two lists of size , and the items in each list are given new labels from their (independent) ranges. This creates a new sublist, which is inserted into the list of sublists, and the new sublist node is given a label in the list of sublists by the list-labeling algorithm. Finally X is inserted into the appropriate list.

This sequence of operations take time, but there have been insertions since the list was created or last split. Thus the amortized time per insertion is .

Delete

Given a sublist node X to be deleted, delete(X) simply removes X from its sublist in constant time. If this leaves the sublist empty, then we need to remove the representative of the list of sublists. Since at least elements were deleted from the sublist since it was first built we can afford to spend the time, the amortized cost of a deletion is .

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Treap</span> Random search tree data structure

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue. It is an example of a mergeable heap, as it supports merging two heaps in logarithmic time. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

In computer science, a soft heap is a variant on the simple heap data structure that has constant amortized time complexity for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a constant number of values in the heap.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

<span class="mw-page-title-main">Queap</span>

In computer science, a queap is a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

References

  1. Dietz, Paul F. (1982), "Maintaining order in a linked list", Proceedings of the 14th Annual ACM Symposium on Theory of Computing (STOC '82), New York, NY, USA: ACM, pp. 122–127, doi: 10.1145/800070.802184 , ISBN   978-0897910705 .
  2. Tsakalidis, Athanasios K. (1984), "Maintaining order in a generalized linked list", Acta Informatica , 21 (1): 101–112, doi:10.1007/BF00289142, MR   0747173 .
  3. Dietz, P.; Sleator, D. (1987), "Two algorithms for maintaining order in a list", Proceedings of the 19th Annual ACM Symposium on Theory of Computing (STOC '87), New York, NY, USA: ACM, pp. 365–372, doi: 10.1145/28395.28434 , hdl: 1802/5693 , ISBN   978-0897912211 . Full version, Tech. Rep. CMU-CS-88-113, Carnegie Mellon University, 1988.
  4. Bender, Michael A.; Cole, Richard; Demaine, Erik D.; Farach-Colton, Martin; Zito, Jack (2002), "Two simplified algorithms for maintaining order in a list", in Möhring, Rolf H.; Raman, Rajeev (eds.), Algorithms – ESA 2002, 10th Annual European Symposium, Rome, Italy, September 17–21, 2002, Proceedings, Lecture Notes in Computer Science, vol. 2461, Springer, pp. 152–164, doi:10.1007/3-540-45749-6_17
  5. Bender, Michael A.; Fineman, Jeremy T.; Gilbert, Seth; Kopelowitz, Tsvi; Montes, Pablo (2017), "File maintenance: When in doubt, change the layout!", in Klein, Philip N. (ed.), Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16–19, Society for Industrial and Applied Mathematics, pp. 1503–1522, doi: 10.1137/1.9781611974782.98
  6. Driscoll, James R.; Sarnak, Neil; Sleator, Daniel D.; Tarjan, Robert E. (1989), "Making data structures persistent", Journal of Computer and System Sciences , 38 (1): 86–124, doi: 10.1016/0022-0000(89)90034-2 , MR   0990051 .
  7. Eppstein, David; Galil, Zvi; Italiano, Giuseppe F.; Nissenzweig, Amnon (1997), "Sparsification—a technique for speeding up dynamic graph algorithms", Journal of the ACM , 44 (5): 669–696, doi: 10.1145/265910.265914 , MR   1492341 .
  8. Katriel, Irit; Bodlaender, Hans L. (2006), "Online topological ordering", ACM Transactions on Algorithms, 2 (3): 364–379, CiteSeerX   10.1.1.78.7933 , doi:10.1145/1159892.1159896, MR   2253786 .
  9. Aumann, Yonatan; Bender, Michael A. (1996), "Fault tolerant data structures", Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS 1996), pp. 580–589, doi:10.1109/SFCS.1996.548517, ISBN   978-0-8186-7594-2 .
  10. Itai, Alon; Konheim, Alan G.; Rodeh, Michael (1981), "A Sparse Table Implementation of Priority Queues", ICALP, pp. 417–431
  11. Bulánek, Jan; Koucký, Michal; Saks, Michael E. (2015), "Tight Lower Bounds for the Online Labeling Problem", SIAM Journal on Computing, vol. 44, pp. 1765--1797.