B-heap

Last updated January 28, 2025

A B-heap is a binary heap implemented to keep subtrees in a single page. This reduces the number of pages accessed by up to a factor of ten for big heaps when using virtual memory, compared with the traditional implementation.^[1] The traditional mapping of elements to locations in an array puts almost every level in a different page.

Motivation

Traditionally, binary trees are laid out in consecutive memory according to a n -> {2n, 2n+1} rule, meaning that if a node is at position n, its left and right child are taken to be at positions 2n and 2n+1 in the array. The root is at position 1. A common operation on binary trees is the vertical traversal; stepping down through the levels of a tree in order to arrive at a searched node. However, because of the way memory is organized on modern computers into pages in virtual memory, this scheme of laying out the binary tree can be highly ineffective. The reason is that, as when traversing deeper into the tree, the distance to the next node grows exponentially, so every next node retrieved will likely be on a separate memory page. This will increase the number of page misses, which are very expensive. The B-heap solves this problem by laying out child nodes in memory in a different way, trying as much as possible to position subtrees within a single page. Therefore, as a vertical traversal proceeds, several of the consecutive retrieved nodes will lay in the same page, leading to a low number of page misses.

Implementation

In detail, a b-heap can be implemented in the following way. Poul-Henning Kamp^[4] gives two options for the layout of the nodes: one in which two positions per page are wasted, but the strict binary structure of the tree is preserved, and another which uses the whole available space of the pages, but has the tree fail to expand for one level upon entering a new page (The nodes on that level have only one child). In any case, an important point is that upon leaving a page, both child nodes are always in a common other page, since in a vertical traversal the algorithm will typically compare both children with the parent to know how to proceed. For this reason, each page can be said to contain two parallel subtrees, interspersed with each other. The pages themselves can be seen as a m-ary tree, and since half of the elements in each page will be leaves (within the page), the "tree of pages" has a splitting factor of pagesize/2.

Parent Function

In contrast to the classic array-like layout, the parent function in a B-heap is more complex because the index of a node's parent must be computed differently depending on where in the page it is. Assuming the positions inside a page are labelled from 0 to pagesize, the parent function can be as follows.

For nodes 0 and 1, these are only used in the variant which is exploiting all possible space. In this case, the parent index of both nodes is the same, it is in a different page, and its specific offset within that page only depends on the current page number. Specifically, to compute the parent's page number, simply divide the current node's page number by the "page tree's" splitting factor, which is pagesize/2. To get the right offset within the page, consider that it must be one of the leaf nodes within the parent page, so start at offset pagesize/2. Then add the difference between the current page number, and the parent's page number, minus one since the first page after the parent page has its parent node in index (pagesize/2).

For nodes 2 and 3, the parent is different depending on the mode. In space-saving mode, the parents are simply the nodes 0 and 1, respectively, so one needs only to subtract with 2. On the other hand, in strict-binary-tree-mode, these nodes are the roots of the page instead of 0 and 1, and so their common parent is computed the same way as described above.

For all other nodes, their parent will be within the same page, and it is enough to divide their offset within their page by 2, not changing the page number.

Related Research Articles

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with $k = 2$ . A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

<span class="mw-page-title-main">Heapsort</span> A sorting algorithm which uses the heap data structure

In computer science, heapsort is an efficient, comparison-based sorting algorithm that reorganizes an input array into a heap and then repeatedly removes the largest node from that heap, placing it at the end of the array.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is the parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964 as a data structure for implementing heapsort.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of $O (n log n)$ operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to $O (n)$ time if the input is already sorted to some degree, whereas heapsort averages $O (n log n)$ regardless of the initial sorted state.

<span class="mw-page-title-main">Treap</span> Random search tree data structure

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a binomial heap is a data structure that acts as a priority queue. It is an example of a mergeable heap, as it supports merging two heaps in logarithmic time. It is implemented as a heap similar to a binary heap but using a special tree structure that is different from the complete binary trees used by binary heaps. Binomial heaps were invented in 1978 by Jean Vuillemin.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

A van Emde Boas tree, also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with $m$ -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975. It performs all operations in $O (log m)$ time, or equivalently in $time, where is the largest element that can be stored in the tree. The parameter is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.$

Poul-Henning Kamp is a Danish computer software developer known for work on various projects including FreeBSD and Varnish. He currently resides in Slagelse, Denmark.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

The $d$ -ary heap or $d$ -heap is a priority queue data structure, a generalization of the binary heap in which the nodes have $d$ children instead of 2. Thus, a binary heap is a 2-heap, and a ternary heap is a 3-heap. According to Tarjan and Jensen et al., $d$ -ary heaps were invented by Donald B. Johnson in 1975.

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

In computer science, a weak heap is a data structure for priority queues, combining features of the binary heap and binomial heap. It can be stored in an array as an implicit binary tree like a binary heap, and has the efficiency guarantees of binomial heaps.

A K-D heap is a data structure in computer science which implements a multidimensional priority queue without requiring additional space. It is a generalization of the Heap. It allows for efficient insertion, query of the minimum element, and deletion of the minimum element in any of the k dimensions, and therefore includes the double-ended heap as a special case.

References

↑ Kamp, Poul-Henning (2020-07-26). "You're Doing It Wrong". ACM Queue .
↑ Naor, Dalit; Martel, Charles U.; Matloff, Norman S. (1991). "Performance of Priority Queue Structures in a Virtual Memory Environment". Comput. J. 34 (5): 428–437. doi:10.1093/comjnl/34.5.428.
↑ van Emde Boas, P.; Kaas, R.; Zijlstra, E. (1976). "Design and implementation of an efficient priority queue". Mathematical Systems Theory . 10: 99–127. doi:10.1007/BF01683268. S2CID 8105468.
↑ Kamp, Poul-Henning. "You're Doing It Wrong". phk.freebsd.dk. Retrieved 2019-06-08.

External links

Implementations at https://github.com/varnish/Varnish-Cache/blob/master/lib/libvarnish/binary_heap.c and http://phk.freebsd.dk/B-Heap/binheap.c
Generic heap implementation with B-heap support.
For more on van Emde Boas layouts see Benjamin Sach Descent into Cache-Oblivion or Cache-oblivious data structures.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] Kamp, Poul-Henning (2020-07-26). "You're Doing It Wrong". ACM Queue .

[2] Naor, Dalit; Martel, Charles U.; Matloff, Norman S. (1991). "Performance of Priority Queue Structures in a Virtual Memory Environment". Comput. J. 34 (5): 428–437. doi:10.1093/comjnl/34.5.428.

[3] van Emde Boas, P.; Kaas, R.; Zijlstra, E. (1976). "Design and implementation of an efficient priority queue". Mathematical Systems Theory . 10: 99–127. doi:10.1007/BF01683268. S2CID 8105468.

[4] Kamp, Poul-Henning. "You're Doing It Wrong". phk.freebsd.dk. Retrieved 2019-06-08.

[1]

[2]

[3]

[4]