Bucket queue | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type | priority queue | |||||||||||||||||||||||
Invented | 1969 | |||||||||||||||||||||||
Invented by | Robert Dial | |||||||||||||||||||||||
|
A bucket queue is a data structure that implements the priority queue abstract data type: it maintains a dynamic collection of elements with numerical priorities and allows quick access to the element with minimum (or maximum) priority. In the bucket queue, the priorities must be integers, and it is particularly suited to applications in which the priorities have a small range. [1] A bucket queue has the form of an array of buckets: an array data structure, indexed by the priorities, whose cells contain collections of items with the same priority as each other. With this data structure, insertion of elements and changes of their priority take constant time. Searching for and removing the minimum-priority element takes time proportional to the number of buckets or, by maintaining a pointer to the most recently found bucket, in time proportional to the difference in priorities between successive operations.
The bucket queue is the priority-queue analogue of pigeonhole sort (also called bucket sort), a sorting algorithm that places elements into buckets indexed by their priorities and then concatenates the buckets. Using a bucket queue as the priority queue in a selection sort gives a form of the pigeonhole sort algorithm. [2] Bucket queues are also called bucket priority queues [3] or bounded-height priority queues. [1] When used for quantized approximations to real number priorities, they are also called untidy priority queues [4] or pseudo priority queues. [5] They are closely related to the calendar queue, a structure that uses a similar array of buckets for exact prioritization by real numbers.
Applications of the bucket queue include computation of the degeneracy of a graph, fast algorithms for shortest paths and widest paths for graphs with weights that are small integers or are already sorted, and greedy approximation algorithms for the set cover problem. The quantized version of the structure has also been applied to scheduling [2] and to marching cubes in computer graphics. [4] The first use of the bucket queue [6] was in a shortest path algorithm by Dial (1969). [7]
A bucket queue can handle elements with integer priorities in the range from 0 or 1 up to some known bound C, and operations that insert elements, change the priority of elements, or extract (find and remove) the element that has the minimum (or maximum) priority. It consists of an array A of container data structures; in most sources these containers are doubly linked lists but they could alternatively be dynamic arrays [3] or dynamic sets. The container in the pth array cell A[p] stores the collection of elements whose priority is p.
A bucket queue can handle the following operations:
In this way, insertions and priority changes take constant time, and extracting the minimum or maximum priority element takes time O(C). [1] [6] [8]
As an optimization, the data structure can start each sequential search for a non-empty bucket at the most recently-found non-empty bucket instead of at the start of the array. This can be done in either of two different ways, lazy (delaying these sequential searches until they are necessary) or eager (doing the searches ahead of time). The choice of when to do the search affects which of the data structure operations is slowed by these searches. Dial's original version of the structure used a lazy search. This can be done by maintaining an index L that is a lower bound on the minimum priority of any element currently in the queue. When inserting a new element, L should be updated to the minimum of its old value and the new element's priority. When searching for the minimum priority element, the search can start at L instead of at zero, and after the search L should be left equal to the priority that was found in the search. [7] [9] Alternatively, the eager version of this optimization keeps L updated so that it always points to the first non-empty bucket. When inserting a new element with a priority smaller than L, the data structure sets L to the new priority, and when removing the last element from a bucket with priority L, it performs a sequential search through larger indexes until finding a non-empty bucket and setting L to the priority of the resulting bucket. [1]
In either of these two variations, each sequential search takes time proportional to the difference between the old and new values of L. This could be significantly faster than the O(C) time bound for the searches in the un-optimized version of the data structure. In many applications of priority queues such as Dijkstra's algorithm, the minimum priorities form a monotonic sequence, allowing a monotone priority queue to be used. In these applications, for both the lazy and eager variations of the optimized structure, the sequential searches for non-empty buckets cover disjoint ranges of buckets. Because each bucket is in at most one of these ranges, their numbers of steps add to at most C. Therefore, in these applications, the total time for a sequence of n operations is O(n + C), rather than the slower O(nC) time bound that would result without this optimization. [9] A corresponding optimization can be applied in applications where a bucket queue is used to find elements of maximum priority, but in this case it should maintain an index that upper-bounds the maximum priority, and the sequential search for a non-empty bucket should proceed downwards from this upper bound. [10]
Another optimization (already given by Dial 1969) can be used to save space when the priorities are monotonic and, throughout the course of an algorithm, always fall within a range of r values rather than extending over the whole range from 0 to C. In this case, one can index the array by the priorities modulo r rather than by their actual values. The search for the minimum priority element should always begin at the previous minimum, to avoid priorities that are higher than the minimum but have lower moduli. In particular, this idea can be applied in Dijkstra's algorithm on graphs whose edge lengths are integers in the range from 1 to r. [8]
Because creating a new bucket queue involves initializing an array of empty buckets, this initialization step takes time proportional to the number of priorities. A variation of the bucket queue described by Donald B. Johnson in 1981 instead stores only the non-empty buckets in a linked list, sorted by their priorities, and uses an auxiliary search tree to quickly find the position in this linked list for any new buckets. It takes time O(log log C) to initialize this variant structure, constant time to find an element with minimum or maximum priority, and time O(log log D) to insert or delete an element, where D is the difference between the nearest smaller and larger priorities to the priority of the inserted or deleted element. [11]
For example, consider a bucket queue with four priorities, the numbers 0, 1, 2, and 3. It consists of an array whose four cells each contain a collection of elements, initially empty. For the purposes of this example, can be written as a bracketed sequence of four sets: . Consider a sequence of operations in which we insert two elements and with the same priority 1, insert a third element with priority 3, change the priority of to 3, and then perform two extractions of the minimum-priority element.
A bucket queue can be used to maintain the vertices of an undirected graph, prioritized by their degrees, and repeatedly find and remove the vertex of minimum degree. [1] This greedy algorithm can be used to calculate the degeneracy of a given graph, equal to the largest degree of any vertex at the time of its removal. The algorithm takes linear time, with or without the optimization that maintains a lower bound on the minimum priority, because each vertex is found in time proportional to its degree and the sum of all vertex degrees is linear in the number of edges of the graph. [12]
In Dijkstra's algorithm for shortest paths in directed graphs with edge weights that are positive integers, the priorities are monotone, [13] and a monotone bucket queue can be used to obtain a time bound of O(m + dc), where m is the number of edges, d is the diameter of the network, and c is the maximum (integer) link cost. [9] [14] This variant of Dijkstra's algorithm is also known as Dial's algorithm, [9] after Robert B. Dial, who published it in 1969. [7] The same idea also works, using a quantized bucket queue, for graphs with positive real edge weights when the ratio of the maximum to minimum weight is at most c. [15] In this quantized version of the algorithm, the vertices are processed out of order, compared to the result with a non-quantized priority queue, but the correct shortest paths are still found. [5] In these algorithms, the priorities will only span a range of width c + 1, so the modular optimization can be used to reduce the space to O(n + c). [8] [14]
A variant of the same algorithm can be used for the widest path problem. In combination with methods for quickly partitioning non-integer edge weights into subsets that can be assigned integer priorities, it leads to near-linear-time solutions to the single-source single-destination version of the widest path problem. [16]
The set cover problem has as its input a family of sets. The output should be a subfamily of these sets, with the same union as the original family, including as few sets as possible. It is NP-hard, but has a greedy approximation algorithm that achieves a logarithmic approximation ratio, essentially the best possible unless P = NP. [17] This approximation algorithm selects its subfamily by repeatedly choosing a set that covers the maximum possible number of remaining uncovered elements. [18] A standard exercise in algorithm design asks for an implementation of this algorithm that takes linear time in the input size, which is the sum of sizes of all the input sets. [19]
This may be solved using a bucket queue of sets in the input family, prioritized by the number of remaining elements that they cover. Each time that the greedy algorithm chooses a set as part of its output, the newly covered set elements should be subtracted from the priorities of the other sets that cover them; over the course of the algorithm the number of these changes of priorities is just the sum of sizes of the input sets. The priorities are monotonically decreasing integers, upper-bounded by the number of elements to be covered. Each choice of the greedy algorithm involves finding the set with the maximum priority, which can be done by scanning downwards through the buckets of the bucket queue, starting from the most recent previous maximum value. The total time is linear in the input size. [10]
Bucket queues can be used to schedule tasks with deadlines, for instance in packet forwarding for internet data with quality of service guarantees. For this application, the deadlines should be quantized into discrete intervals, and tasks whose deadlines fall into the same interval are considered to be of equivalent priority. [2]
A variation of the quantized bucket queue data structure, the calendar queue, has been applied to scheduling of discrete-event simulations, where the elements in the queue are future events prioritized by the time within the simulation that the events should happen. In this application, the ordering of events is critical, so the priorities cannot be approximated. Therefore, the calendar queue performs searches for the minimum-priority element in a different way than a bucket queue: in the bucket queue, any element of the first non-empty bucket may be returned, but instead the calendar queue searches all the elements in that bucket to determine which of them has the smallest non-quantized priority. To keep these searches fast, this variation attempts to keep the number of buckets proportional to the number of elements, by adjusting the scale of quantization and rebuilding the data structure when it gets out of balance. Calendar queues may be slower than bucket queues in the worst case (when many elements all land in the same smallest bucket) but are fast when elements are uniformly distributed among buckets causing the average bucket size to be constant. [20] [21]
In applied mathematics and numerical methods for the solution of differential equations, untidy priority queues have been used to prioritize the steps of the fast marching method for solving boundary value problems of the Eikonal equation, used to model wave propagation. This method finds the times at which a moving boundary crosses a set of discrete points (such as the points of an integer grid) using a prioritization method resembling a continuous version of Dijkstra's algorithm, and its running time is dominated by its priority queue of these points. It can be sped up to linear time by rounding the priorities used in this algorithm to integers, and using a bucket queue for these integers. As in Dijkstra's and Dial's algorithms, the priorities are monotone, so fast marching can use the monotone optimization of the bucket queue and its analysis. However, the discretization introduces some error into the resulting calculations. [4]
In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.
In computing, a hash table is a data structure that implements an associative array, also called a dictionary or simply map, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.
In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.
In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.
In computer science, a queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and the removal of entities from the other end of the sequence. By convention, the end of the sequence at which elements are added is called the back, tail, or rear of the queue, and the end at which elements are removed is called the head or front of the queue, analogously to the words used when people line up to wait for goods or services.
Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.
In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.
Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.
In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.
In computational complexity theory, the 3SUM problem asks if a given set of real numbers contains three elements that sum to zero. A generalized version, k-SUM, asks the same question on k elements, rather than simply 3. 3SUM can be easily solved in time, and matching lower bounds are known in some specialized models of computation.
In mathematics, in the areas of order theory and combinatorics, Dilworth's theorem characterizes the width of any finite partially ordered set in terms of a partition of the order into a minimum number of chains. It is named for the mathematician Robert P. Dilworth.
A van Emde Boas tree, also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with m-bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975. It performs all operations in O(log m) time, or equivalently in time, where is the largest element that can be stored in the tree. The parameter is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.
In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.
An oriented matroid is a mathematical structure that abstracts the properties of directed graphs, vector arrangements over ordered fields, and hyperplane arrangements over ordered fields. In comparison, an ordinary matroid abstracts the dependence properties that are common both to graphs, which are not necessarily directed, and to arrangements of vectors over fields, which are not necessarily ordered.
In computer science, a queap is a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time.
In computer science, a monotone priority queue is a variant of the priority queue abstract data type in which the priorities of extracted items are required to form a monotonic sequence. That is, for a priority queue in which each successively extracted item is the one with the minimum priority, the minimum priority should be monotonically increasing. Conversely for a max-heap the maximum priority should be monotonically decreasing. The assumption of monotonicity arises naturally in several applications of priority queues, and can be used as a simplifying assumption to speed up certain types of priority queues.
A radix heap is a data structure for realizing the operations of a monotone priority queue. A set of elements to which a key is assigned can then be managed. The run time of the operations depends on the difference between the largest and smallest key or constant. The data structure consists mainly of a series of buckets, the size of which increases exponentially.