Soft heap

Last updated October 27, 2023

In computer science, a soft heap is a variant on the simple heap data structure that has constant amortized time complexity for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a constant number of values in the heap.

Definition and performance

The constant time operations are:^[1]^[2]

create(S): Create a new soft heap
insert(S, x): Insert an element into a soft heap
meld(S, S' ): Combine the contents of two soft heaps into one, destroying both
delete(S, x): Delete an element from a soft heap
findmin(S): Get the element with minimum key in the soft heap

Other heaps such as Fibonacci heaps achieve most of these bounds without any corruption, but cannot provide a constant-time bound on the critical delete operation. The amount of corruption can be controlled by the choice of a parameter $\varepsilon$ , but the lower this is set, the more time insertions require: expressed using Big-O notation, the amortized time will be $O(\log 1/\varepsilon )$ for an error rate of $\varepsilon$ .^[1]^[2] Some versions of soft heaps allow the create, insert, and meld operations to take constant time in the worst case, producing amortized rather than worst-case performance only for findmin and delete.^[3] As with comparison sort, these algorithms access the keys only by comparisons; if arithmetic operations on integer keys are allowed, the time dependence on $\varepsilon$ can be reduced to $O(\log \log 1/\varepsilon )$ or (with randomization) ${\textstyle O({\sqrt {\log \log 1/\varepsilon }})}$ .^[4]

More precisely, the error guarantee offered by the soft heap is the following: each soft heap is initialized with a parameter $\varepsilon$ , chosen between 0 and 1/2. Then at any point in time it will contain at most $\varepsilon \cdot n$ corrupted keys, where $n$ is the number of elements inserted so far. Note that this does not guarantee that only a fixed percentage of the keys currently in the heap are corrupted: in an unlucky sequence of insertions and deletions, it can happen that all elements in the heap will have corrupted keys. Similarly, there is no guarantee that in a sequence of elements extracted from the heap with findmin and delete, only a fixed percentage will have corrupted keys: in an unlucky scenario only corrupted elements are extracted from the heap. When a key is corrupted, the value stored for it in the soft key is higher than its initially-given value; corruption can never decrease the value of any key. The findmin operation finds the minimum value among the currently stored keys, including the corrupted ones.^[1]^[2]

The soft heap was designed by Bernard Chazelle in 2000. The term "corruption" in the structure is the result of what Chazelle called "carpooling" in a soft heap. Each node in the soft heap contains a linked list of keys and one common key. The common key is an upper bound on the values of the keys in the linked list. Once a key is added to the linked list, it is considered corrupted because its value is never again relevant in any of the soft heap operations: only the common keys are compared. This is what makes soft heaps "soft"; one cannot be sure whether any particular value put into it will be corrupted. The purpose of these corruptions is effectively to lower the information entropy of the data, enabling the data structure to break through information-theoretic barriers regarding heaps.^[1]

Applications

Despite their limitations and unpredictable nature, soft heaps are useful in the design of deterministic algorithms. For instance, they have been used to achieve the best complexity to date for finding a minimum spanning tree.^[5] Other problems whose efficient solution has been simplified using soft heaps include finding the $k$ th smallest element in several classes of structured sets of values, including heap-ordered trees, sorted matrices, and sumsets.^[6]

Another simple example is a selection algorithm, to find the $k$ th smallest of a group of $n$ numbers:^[1]

Initialize a soft heap with error rate $1/3$ , allowing at most 33% of its inserted keys to be corrupted.
Insert all $n$ elements into the heap.
Repeat $n/3$ times: perform a findmin operation and delete the key that it returns.
Let $L$ be the deleted element whose correct key is largest.
Compare $L$ to all of the given elements, partition them into the subset less than $L$ and the subset greater than $L$ , placing elements that are equal to $L$ into the first subset when they were deleted and into the second subset otherwise.
Recursively call the same selection algorithm in the subset that contains the $k$ th smallest.

After the comparison step of the algorithm, the first of the two subsets contains all of the deleted keys, so it includes at least $n/3$ elements. Among the $2n/3$ elements that were not deleted, at most $n/3$ are corrupted, so at least $n/3$ are uncorrupted. These uncorrupted and undeleted elements must all belong to the second subset, because they are greater than or equal to the soft heap's (possibly corrupted) value of $L$ , which is in turn greater than the true value of $L$ . Thus, both subsets have between 33% and 66% of the elements. Because each level of recursion reduces the problem size by a constant factor, the total time of the algorithm can be bounded by a geometric series, showing that it is $O(n)$ .^[1]

Related Research Articles

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computer science, a perfect hash function $h$ for a set $S$ is a hash function that maps distinct elements in $S$ to a set of $m$ integers, with no collisions. In mathematical terms, it is an injective function.

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In computer science, a selection algorithm is an algorithm for finding the $th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .$

In mathematics, given a non-empty set of objects of finite extension in $-dimensional space, for example a set of points, a bounding sphere, enclosing sphere or enclosing ball for that set is an -dimensional solid sphere containing all of these objects.$

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

A pairing heap is a type of heap data structure with relatively simple implementation and excellent practical amortized performance, introduced by Michael Fredman, Robert Sedgewick, Daniel Sleator, and Robert Tarjan in 1986. Pairing heaps are heap-ordered multiway tree structures, and can be considered simplified Fibonacci heaps. They are considered a "robust choice" for implementing such algorithms as Prim's MST algorithm, and support the following operations :

<span class="mw-page-title-main">Queap</span>

In computer science, a queap is a priority queue data structure. The data structure allows insertions and deletions of arbitrary elements, as well as retrieval of the highest-priority element. Each deletion takes amortized time logarithmic in the number of items that have been in the structure for a longer time than the removed item. Insertions take constant amortized time.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computer science, a shadow heap is a mergeable heap data structure which supports efficient heap merging in the amortized sense. More specifically, shadow heaps make use of the shadow merge algorithm to achieve insertion in O(f(n)) amortized time and deletion in O((log n log log n)/f(n)) amortized time, for any choice of 1 ≤ f(n) ≤ log log n.

A radix heap is a data structure for realizing the operations of a monotone priority queue. A set of elements to which a key is assigned can then be managed. The run time of the operations depends on the difference between the largest and smallest key or constant. The data structure consists mainly of a series of buckets, the size of which increases exponentially.

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

This is a comparison of the performance of notable data structures, as measured by the complexity of their logical operations. For a more comprehensive listing of data structures, see List of data structures.

References

1 2 3 4 5 6 Chazelle, Bernard (November 2000). "The soft heap: an approximate priority queue with optimal error rate" (PDF). Journal of the ACM . 47 (6): 1012–1027. CiteSeerX 10.1.1.5.9705 . doi:10.1145/355541.355554. S2CID 12556140.
1 2 3 Kaplan, Haim; Zwick, Uri (2009). "A simpler implementation and analysis of Chazelle's soft heaps". Proceedings of the Nineteenth Annual ACM–SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics. pp. 477–485. CiteSeerX 10.1.1.215.6250 . doi:10.1137/1.9781611973068.53. ISBN 978-0-89871-680-1.
↑ Kaplan, Haim; Tarjan, Robert E.; Zwick, Uri (2013). "Soft heaps simplified". SIAM Journal on Computing . 42 (4): 1660–1673. doi:10.1137/120880185. MR 3084181.
↑ Thorup, Mikkel; Zamir, Or; Zwick, Uri (2019). "Dynamic ordered sets with approximate queries, approximate heaps and soft heaps". In Baier, Christel; Chatzigiannakis, Ioannis; Flocchini, Paola; Leonardi, Stefano (eds.). 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9–12, 2019, Patras, Greece. LIPIcs. Vol. 132. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 95:1–95:13. doi:10.4230/LIPICS.ICALP.2019.95.
↑ Chazelle, Bernard (2000). "A minimum spanning tree algorithm with inverse-Ackermann type complexity". Journal of the ACM . 47 (6): 1028–1047. doi:10.1145/355541.355562. MR 1866456.
↑ Kaplan, Haim; Kozma, László; Zamir, Or; Zwick, Uri (2019). "Selection from heaps, row-sorted matrices, and X+Y using soft heaps". In Fineman, Jeremy T.; Mitzenmacher, Michael (eds.). 2nd Symposium on Simplicity in Algorithms, SOSA 2019, January 8–9, 2019, San Diego, CA, USA. OASIcs. Vol. 69. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 5:1–5:21. doi:10.4230/OASICS.SOSA.2019.5.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[c-1] 1 2 3 4 5 6 Chazelle, Bernard (November 2000). "The soft heap: an approximate priority queue with optimal error rate" (PDF). Journal of the ACM . 47 (6): 1012–1027. CiteSeerX 10.1.1.5.9705 . doi:10.1145/355541.355554. S2CID 12556140.

[kz-2] 1 2 3 Kaplan, Haim; Zwick, Uri (2009). "A simpler implementation and analysis of Chazelle's soft heaps". Proceedings of the Nineteenth Annual ACM–SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics. pp. 477–485. CiteSeerX 10.1.1.215.6250 . doi:10.1137/1.9781611973068.53. ISBN 978-0-89871-680-1.

[ktz-3] Kaplan, Haim; Tarjan, Robert E.; Zwick, Uri (2013). "Soft heaps simplified". SIAM Journal on Computing . 42 (4): 1660–1673. doi:10.1137/120880185. MR 3084181.

[tzz-4] Thorup, Mikkel; Zamir, Or; Zwick, Uri (2019). "Dynamic ordered sets with approximate queries, approximate heaps and soft heaps". In Baier, Christel; Chatzigiannakis, Ioannis; Flocchini, Paola; Leonardi, Stefano (eds.). 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9–12, 2019, Patras, Greece. LIPIcs. Vol. 132. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 95:1–95:13. doi:10.4230/LIPICS.ICALP.2019.95.

[mst-5] Chazelle, Bernard (2000). "A minimum spanning tree algorithm with inverse-Ackermann type complexity". Journal of the ACM . 47 (6): 1028–1047. doi:10.1145/355541.355562. MR 1866456.

[k2z2-6] Kaplan, Haim; Kozma, László; Zamir, Or; Zwick, Uri (2019). "Selection from heaps, row-sorted matrices, and X+Y using soft heaps". In Fineman, Jeremy T.; Mitzenmacher, Michael (eds.). 2nd Symposium on Simplicity in Algorithms, SOSA 2019, January 8–9, 2019, San Diego, CA, USA. OASIcs. Vol. 69. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 5:1–5:21. doi:10.4230/OASICS.SOSA.2019.5.

[1]

[2]

[3]

[4]

[5]

[6]

Soft heap

Contents

Definition and performance

Applications

Related Research Articles

References