List-labeling problem

Last updated

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

Contents

The cost of a list labeling algorithm is the number of label (re-)assignments per insertion or deletion. List labeling algorithms have applications in many areas, including the order-maintenance problem, cache-oblivious data structures, [1] data structure persistence, [2] graph algorithms [3] [4] and fault-tolerant data structures. [5]

Sometimes the list labeling problem is presented where S is not a set of values but rather a set of objects subject to a total order. In this setting, when an item is inserted into S, it is specified to be the successor of some other item already in S. For example, this is the way that list labeling is used in the order-maintenance problem. The solutions presented below apply to both formulations.

Upper bounds

The cost of list labeling is related to , the range of the labels assigned. Suppose that no more than items are stored in the list-labeling structure at any time. Four cases have been studied:

Exponential Labels

In the exponential label case, each item that is inserted can be given a label that is the average of its neighboring labels. It takes insertions before two items are at adjacent labels and there are no labels available for items in between them. When this happens, all items are relabelled evenly from the space of all labels. This incurs relabeling cost. Thus, the amortized relabeling cost in this case is . [6]

Polynomial Labels

The other cases of list labeling can be solved via balanced binary search trees. Consider , a binary search tree on S of height . We can label every node in the tree via a path label as follows: Let be the sequence of left and right edges on the root-to- path, encoded as bits. So if is in the left subtree of the root, the high-order bit of is , and if it is in the right subtree of the root, the high-order bit of is . Once we reach , we complete to a length of as follows. If is a leaf, we append s as the low order bits until has bits. If is an internal node, we append one and then s as the low order bits until has bits.

The important properties of are that: these labels are in the range ; and for two nodes with keys and in if then . To see this latter property, notice that the property is true if the least common ancestor of and is neither nor , because and will share bits until their least common ancestor. If , then because is a search tree, will be in the left subtree and will have a next bit of , whereas will be in the right subtree and will have a next bit of .

Suppose instead that, without loss of generality, the least common ancestor of and is , and that has depth . If is in the left subtree of , then and share the first bits. The remaining bits of are all 1s, whereas the remaining bits of must have a , so . If instead is in the right subtree of , then and share the first bits and the st bit of is , whereas the st bit of is . Hence .

We conclude that the function fulfills the monotonicity property of the label() function. Thus if we can balance the binary tree to a depth of , we will have a solution to the list labeling problem for labels in the range .

Weight-balanced trees

In order to use a self-balancing binary search tree to solve the list labeling problem, we need to first define the cost function of a balancing operation on insertion or deletion to equal the number of labels that are changed, since every rebalancing operation of the tree would have to also update all path labels in the subtree rooted at the site of the rebalance. So, for example, rotating a node with a subtree of size , which can be done in constant time under usual circumstances, requires path label updates. In particular, if the node being rotated is the root then the rotation would take time linear in the size of the whole tree. With that much time the entire tree could be rebuilt. We will see below that there are self-balancing binary search tree data structures that cause an appropriate number of label updates during rebalancing.

A weight-balanced tree BB[] is defined as follows. For every in a root tree , define to be the number of nodes in the subtree rooted at . Let the left and right children of be and , respectively. A tree is -weight balanced if for every internal node in , and

The height of a BB[] tree with nodes is at most Therefore, in order to solve the list-labeling problem, we need to achieve a depth of

A scapegoat tree is a weight-balanced tree where whenever a node no longer satisfies the weight-balance condition the entire subtree rooted at that node is rebuilt. This rebalancing scheme is ideal for list labeling, since the cost of rebalancing now equals the cost of relabeling. The amortized cost of an insertion or deletion is For the list labeling problem, the cost becomes:

Lower bounds and open problems

In the case where , a lower bound of [9] has been established for list labeling. This lower bound applies to randomized algorithms, and so the known bounds for this case are tight.

In the case where , there is a lower bound of list labeling cost for deterministic algorithms. [6] Furthermore, the same lower bound holds for smooth algorithms, which are those whose only relabeling operation assigns labels evenly in a range of items [10] This lower bound is surprisingly strong in that it applies in the offline cases where all insertions and deletions are known ahead of time.

However, the best lower bound known for the linear case of algorithms that are allowed to be non-smooth and randomized is . Indeed, it has been an open problem since 1981 to close the gap between the upper bound and the in the linear case. [7] [11] Some progress on this problem has been made by Bender et al. who give a randomized upper bound of . [12]

Applications

The best known applications of list labeling are the order-maintenance problem and packed-memory arrays for cache-oblivious data structures. The order-maintenance problem is that of maintaining a data structure on a linked list to answer order queries: given two items in the linked list, which is closer to the front of the list? This problem can be solved directly by polynomial list labeling in per insertion and deletion and time per query, by assigning labels that are monotone with the rank in the list. The time for insertions and deletions can be improved to constant time by combining exponential polynomial list labeling with exponential list labeling on small lists.

The packed-memory array is an array of size to hold items so that any subarray of size holds items. This can be solved directly by the case of list labeling, by using the labels as addresses in the array, as long as the solution guarantees that the space between items is . Packed-memory arrays are used in cache-oblivious data structures to store data that must be indexed and scanned. The density bounds guarantee that a scan through the data is asymptotically optimal in the external-memory model for any block transfer size.

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as a hash map or a hash set, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

In mathematics, the Lucas–Lehmer test (LLT) is a primality test for Mersenne numbers. The test was originally developed by Édouard Lucas in 1878 and subsequently proved by Derrick Henry Lehmer in 1930.

<span class="mw-page-title-main">Self-balancing binary search tree</span> Any node-based binary search tree that automatically keeps its height the same

In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions. These operations when designed for a self-balancing binary search tree, contain precautionary measures against boundlessly increasing tree height, so that these abstract data structures receive the attribute "self-balancing".

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson in 1989 and again by Igal Galperin and Ronald L. Rivest in 1993. It provides worst-case lookup time and amortized insertion and deletion time.

In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences. These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees. Their more common name is due to Knuth.

In mathematical logic, the Borel hierarchy is a stratification of the Borel algebra generated by the open subsets of a Polish space; elements of this algebra are called Borel sets. Each Borel set is assigned a unique countable ordinal number called the rank of the Borel set. The Borel hierarchy is of particular interest in descriptive set theory.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In statistics, the matrix t-distribution is the generalization of the multivariate t-distribution from vectors to matrices. The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution. For example, the matrix t-distribution is the compound distribution that results from sampling from a matrix normal distribution having sampled the covariance matrix of the matrix normal from an inverse Wishart distribution.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

In computer science, the order-maintenance problem involves maintaining a totally ordered set supporting the following operations:

<span class="mw-page-title-main">Suffix automaton</span> Deterministic finite automaton accepting set of all suffixes of particular string

In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix automaton of a string is the smallest directed acyclic graph with a dedicated initial vertex and a set of "final" vertices, such that paths from the initial vertex to final vertices represent the suffixes of the string.

In mathematics the symmetrization methods are algorithms of transforming a set to a ball with equal volume and centered at the origin. B is called the symmetrized version of A, usually denoted . These algorithms show up in solving the classical isoperimetric inequality problem, which asks: Given all two-dimensional shapes of a given area, which of them has the minimal perimeter. The conjectured answer was the disk and Steiner in 1838 showed this to be true using the Steiner symmetrization method. From this many other isoperimetric problems sprung and other symmetrization algorithms. For example, Rayleigh's conjecture is that the first eigenvalue of the Dirichlet problem is minimized for the ball. Another problem is that the Newtonian capacity of a set A is minimized by and this was proved by Polya and G. Szego (1951) using circular symmetrization.

In computer science, join-based tree algorithms are a class of algorithms for self-balancing binary search trees. This framework aims at designing highly-parallelized algorithms for various balanced binary search trees. The algorithmic framework is based on a single operation join. Under this framework, the join operation captures all balancing criteria of different balancing schemes, and all other functions join have generic implementation across different balancing schemes. The join-based algorithms can be applied to at least four balancing schemes: AVL trees, red–black trees, weight-balanced trees and treaps.

In mathematics, especially mathematical logic, graph theory and number theory, the Buchholz hydra game is a type of hydra game, which is a single-player game based on the idea of chopping pieces off a mathematical tree. The hydra game can be used to generate a rapidly growing function, , which eventually dominates all recursive functions that are provably total in "", and the termination of all hydra games is not provably total in .

References

  1. Bender, Michael A.; Demaine, Erik D.; Farach-Colton, Martin (2005), "Cache-oblivious B-trees" (PDF), SIAM Journal on Computing , 35 (2): 341–358, doi:10.1137/S0097539701389956, MR   2191447 .
  2. Driscoll, James R.; Sarnak, Neil; Sleator, Daniel D.; Tarjan, Robert E. (1989), "Making data structures persistent", Journal of Computer and System Sciences , 38 (1): 86–124, doi: 10.1016/0022-0000(89)90034-2 , MR   0990051 .
  3. Eppstein, David; Galil, Zvi; Italiano, Giuseppe F.; Nissenzweig, Amnon (1997), "Sparsification—a technique for speeding up dynamic graph algorithms", Journal of the ACM , 44 (5): 669–696, doi: 10.1145/265910.265914 , MR   1492341, S2CID   263324404 .
  4. Katriel, Irit; Bodlaender, Hans L. (2006), "Online topological ordering", ACM Transactions on Algorithms, 2 (3): 364–379, CiteSeerX   10.1.1.78.7933 , doi:10.1145/1159892.1159896, MR   2253786, S2CID   6552974 .
  5. Aumann, Yonatan; Bender, Michael A. (1996), "Fault tolerant data structures", Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS 1996), pp. 580–589, doi:10.1109/SFCS.1996.548517, ISBN   978-0-8186-7594-2, S2CID   80348 .
  6. 1 2 Bulánek, Jan; Koucký, Michal; Saks, Michael E. (2015), "Tight Lower Bounds for the Online Labeling Problem", SIAM Journal on Computing, vol. 44, pp. 1765–1797.
  7. 1 2 3 Itai, Alon; Konheim, Alan G.; Rodeh, Michael (1981), "A Sparse Table Implementation of Priority Queues", ICALP, pp. 417–431
  8. Willard, Dan E. (1992), "A Density Control Algorithm for Doing Insertions and Deletions in a Sequentially Ordered File in Good Worst-Case Time", Information and Computation, vol. 97, pp. 150–204.
  9. Dietz, Paul F.; Seiferas, Joel I.; Zhang, Ju (1994), "A tight lower bound for on-line monotonic list labeling", Algorithm theory—SWAT '94 (Aarhus, 1994), Lecture Notes in Computer Science, vol. 824, Berlin: Springer, pp. 131–142, doi:10.1007/3-540-58218-5_12, ISBN   978-3-540-58218-2, MR   1315312 .
  10. Dietz, Paul F.; Zhang, Ju (1990), "Lower bounds for monotonic list labeling", Algorithm theory—SWAT '90, pp. 173–180.
  11. Saks, Michael (2018), "Online Labeling: Algorithms, Lower Bounds and Open Questions", International Computer Science Symposium in Russia, pp. 23–28.
  12. Bender, Michael A.; Conway, Alex; Farach-Colton, Martin; Komlos, Hanna; Kuszmaul, William; Wein, Nicole (October 2022). "Online List Labeling: Breaking the log2n Barrier". 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). IEEE. pp. 980–990. arXiv: 2203.02763 . doi:10.1109/focs54457.2022.00096. ISBN   978-1-6654-5519-0. S2CID   247292594.