B-heap

Last updated

A B-heap is a binary heap implemented to keep subtrees in a single page. This reduces the number of pages accessed by up to a factor of ten for big heaps when using virtual memory, compared with the traditional implementation. [1] The traditional mapping of elements to locations in an array puts almost every level in a different page.

Contents

There are other heap variants which are efficient in computers using virtual memory or caches, such as cache-oblivious algorithms, k-heaps, [2] and van Emde Boas layouts. [3]

Motivation

Traditionally, binary trees are laid out in consecutive memory according to a n -> {2n, 2n+1} rule, meaning that if a node is at position n, its left and right child are taken to be at positions 2n and 2n+1 in the array. The root is at position 1. A common operation on binary trees is the vertical traversal; stepping down through the levels of a tree in order to arrive at a searched node. However, because of the way memory is organized on modern computers into pages in virtual memory, this scheme of laying out the binary tree can be highly ineffective. The reason is that, as when traversing deeper into the tree, the distance to the next node grows exponentially, so every next node retrieved will likely be on a separate memory page. This will increase the number of page misses, which are very expensive. The B-heap solves this problem by laying out child nodes in memory in a different way, trying as much as possible to position subtrees within a single page. Therefore, as a vertical traversal proceeds, several of the consecutive retrieved nodes will lay in the same page, leading to a low number of page misses.

Implementation

In detail, a b-heap can be implemented in the following way. Poul-Henning Kamp [4] gives two options for the layout of the nodes: one in which two positions per page are wasted, but the strict binary structure of the tree is preserved, and another which uses the whole available space of the pages, but has the tree fail to expand for one level upon entering a new page (The nodes on that level have only one child). In any case, an important point is that upon leaving a page, both child nodes are always in a common other page, since in a vertical transversal the algorithm will typically compare both children with the parent to know how to proceed. For this reason, each page can be said to contain two parallel subtrees, interspersed with each other. The pages themselves can be seen as a m-ary tree, and since half of the elements in each page will be leaves (within the page), the "tree of pages" has a splitting factor of pagesize/2.

Parent Function

In contrast to the classic array-like layout, the parent function in a B-heap is more complex because the index of a node's parent must be computed differently depending on where in the page it is. Assuming the positions inside a page are labelled from 0 to pagesize, the parent function can be as follows.

For nodes 0 and 1, these are only used in the variant which is exploiting all possible space. In this case, the parent index of both nodes is the same, it is in a different page, and its specific offset within that page only depends on the current page number. Specifically, to compute the parent's page number, simply divide the current node's page number by the "page tree's" splitting factor, which is pagesize/2. To get the right offset within the page, consider that it must be one of the leaf nodes within the parent page, so start at offset pagesize/2. Then add the difference between the current page number, and the parent's page number, minus one since the first page after the parent page has its parent node in index (pagesize/2).

For nodes 2 and 3, the parent is different depending on the mode. In space-saving mode, the parents are simply the nodes 0 and 1, respectively, so one needs only to subtract with 2. On the other hand, in strict-binary-tree-mode, these nodes are the roots of the page instead of 0 and 1, and so their common parent is computed the same way as described above.

For all other nodes, their parent will be within the same page, and it is enough to divide their offset within their page by 2, not changing the page number.

See also

Related Research Articles

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is directly proportional to the height of the tree.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to more quickly find the largest element in each step.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data-type similar to a regular queue or stack data structure. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

<span class="mw-page-title-main">Tree (data structure)</span> Abstract data type simulating a hierarchical tree structure and represented as a set of linked nodes

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue but also allows pairs of heaps to be merged. It is important as an implementation of the mergeable heap abstract data type, which is a priority queue supporting merge operation. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

A van Emde Boas tree, also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with m-bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975. It performs all operations in O(log m) time, or equivalently in O(log log M) time, where M = 2m is the largest element that can be stored in the tree. The parameter M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.

<span class="mw-page-title-main">Poul-Henning Kamp</span> Danish software developer

Poul-Henning Kamp is a Danish computer software developer known for work on various projects including FreeBSD and Varnish. He currently resides in Slagelse, Denmark.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

The d-ary heap or d-heap is a priority queue data structure, a generalization of the binary heap in which the nodes have d children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan and Jensen et al., d-ary heaps were invented by Donald B. Johnson in 1975.

In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by integer keys. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are.

In computer science, a randomized meldable heap is a priority queue based data structure in which the underlying structure is also a heap-ordered binary tree. However, there are no restrictions on the shape of the underlying binary tree.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

<span class="mw-page-title-main">K-D heap</span>

A K-D heap is a data structure in computer science which implements a multidimensional priority queue without requiring additional space. It is a generalization of the Heap. It allows for efficient insertion, query of the minimum element, and deletion of the minimum element in any of the k dimensions, and therefore includes the double-ended heap as a special case.

References

  1. Kamp, Poul-Henning (2020-07-26). "You're Doing It Wrong". ACM Queue .
  2. Naor, Dalit; Martel, Charles U.; Matloff, Norman S. (1991). "Performance of Priority Queue Structures in a Virtual Memory Environment". Comput. J. 34 (5): 428–437. doi:10.1093/comjnl/34.5.428.
  3. van Emde Boas, P.; Kaas, R.; Zijlstra, E. (1976). "Design and implementation of an efficient priority queue". Mathematical Systems Theory . 10: 99–127. doi:10.1007/BF01683268. S2CID   8105468.
  4. Kamp, Poul-Henning. "You're Doing It Wrong". phk.freebsd.dk. Retrieved 2019-06-08.