Fenwick tree

Last updated
Fenwick tree
Binary indexed tree
16-node Fenwick tree.svg
Type Binomial tree
Invented1989
Invented byBoris Ryabko
Time complexity in big O notation
OperationAverageWorst case
SearchO(logn)O(logn)
InsertO(logn)O(logn)
Space complexity
SpaceO(n)O(n)

A Fenwick tree or binary indexed tree(BIT) is a data structure that can efficiently update values and calculate prefix sums in an array of values.

Contents

This structure was proposed by Boris Ryabko in 1989 [1] with a further modification published in 1992. [2] It has subsequently become known under the name Fenwick tree after Peter Fenwick, who described this structure in his 1994 article. [3]

When compared with a flat array of values, the Fenwick tree achieves a much better balance between two operations: value update and prefix sum calculation. A flat array of values can either store the values or the prefix sums. In the first case, computing prefix sums requires linear time; in the second case, updating the array values requires linear time (in both cases, the other operation can be performed in constant time). Fenwick trees allow both operations to be performed in time. This is achieved by representing the values as a tree with nodes where the value of each node in the tree is the prefix sum of the array from the index of the parent (inclusive) up to the index of the node (exclusive). The tree itself is implicit and can be stored as an array of values, with the implicit root node omitted from the array. The tree structure allows the operations of value retrieval, value update, prefix sum, and range sum to be performed using only node accesses.

Motivation

Given an array of values, it is sometimes desirable to calculate the running total of values up to each index according to some associative binary operation (addition on integers being by far the most common). Fenwick trees provide a method to query the running total at any index, or prefix sum, in addition to allowing changes to the underlying value array and having all further queries reflect those changes.

Fenwick trees are particularly designed to implement the arithmetic coding algorithm, which maintains counts of each symbol produced and needs to convert those to the cumulative probability of a symbol less than a given symbol. Development of operations it supports were primarily motivated by use in that case.[ citation needed ]

Description

A Fenwick tree is an implicit tree where nodes are consecutively numbered, and parent-child relationships are determined by arithmetic on the node indexes.

An important function in this index arithmetic is the least significant set bit, also called the find first set operation. This is the greatest power of two which divides an index . This is the power of two (1, 2, 4, 8, ...) and not the exponent (0, 1, 2, 3, ...). It can be efficiently computed in two's complement arithmetic as (where & denotes bitwise AND).

A Fenwick tree is most easily understood using a one-based array with values. Using half-open interval syntax, let the range from (exclusive) to (inclusive). The corresponding Fenwick array stores the range sums . That is, the sum of values ending with and including .

A fictitious node 0 is used in some descriptions, but is never actually accessed and need not be explicitly stored. but the value is never actually needed. may be considered to contain the sum of the empty range with value 0.

A "Fenwick tree" is actually three implicit trees over the same array: the interrogation tree used for translating indexes to prefix sums, the update tree used for updating elements, and the search tree for translating prefix sums to indexes (rank queries). [4] The first two are normally walked upwards, while the third is usually walked downwards.

The interrogation tree

The interrogation tree is defined so that the parent of node is . For example, the parent of 6 = 1102 is 4 = 1002. Implicit node 0 is the root.

Each level of the tree contains nodes with indices corresponding to sums of distinct powers of 2 (with representing an empty sum 0). For example, level contains nodes and level contains nodes

Node has children (), and total descendants. (These numbers include nodes greater than , which are omitted and never accessed.)

The below diagram shows the structure of a 16-node Fenwick tree's interrogation tree, including the root, so it corresponds to a 15-element array A:

Depiction of a 16-node Fenwick interrogation tree containing range sums of a 15-node array A 16-node Fenwick tree.svg
Depiction of a 16-node Fenwick interrogation tree containing range sums of a 15-node array A

To find the prefix sum , sum the values in , its parent, its parent's parent, and so on up to (but not including) the root. To compute a range sum , subtract the prefix sums for and . This can be optimized by stopping at their first common ancestor.

The update tree

The update tree is the mirror image of the interrogation tree. The parent of node is (where | denotes bitwise OR). For example, the parent of 6 = 1102 is 8 = 10002.

This conceptual tree is infinite, but only the part with indexes up to is stored or used. Excluding the fictitious nodes with indexes greater than it will be a forest of disjoint trees, one for each bit set in the binary representation of .

Here, a node's ancestors are all nodes whose range sums include its own. For example, holds the sum of , holds the sum of , and so on.

To modify one of the values , add the change to , then 's parent, then its grandparent, and so on, until the index exceeds .

The search tree

Unlike the other two trees, the search tree is a binary tree, arranged in an order Knuth calls a "sideways heap". [5] Each node is assigned a height equal to the number of trailing zeros in the binary representation of its index, with the parent and children being the numerically closest index(es) of the adjacent height. Nodes with odd indexes () are leaves. Nodes with even indexes have the closest two nodes of the next-lowest index as children, . Node 's parent in the search tree is .

For example, the children of 6 = 1102 is are 5 = 1012 and 7 = 1112, and its parent is 4 = 1002.

Although this tree is potentially infinite, we may define its root to be the highest existing node, whose index is the greatest power of 2 less than or equal to .

It is possible for a node to have a fictitious parent with an index greater than yet still have an existing grandparent. If the example above applied to a 5-node tree, then node 5 would have a fictitious parent 6, but an existing grandparent 4.

The search tree may be considered a combination of the previous two trees. A node's left subtree contains all of its descendants in the update tree, while its right subtree contains all of its descendants in the interrogation tree. A node's parent in the search tree is either its interrogation or update parent (depending on whether the node is a right or left child, respectively), and the other type of parent may be found by multiple upward steps in the search tree.

However, upward traversals in the search tree are uncommon; its primary use is to perform rank queries: given a prefix sum, at what index does it appear? This is done by a downward traversal through the search tree. During the traversal, three variables are maintained: The current node's index, the rank being sought in the subtree rooted and the current node, and a "fallback index" to be returned if the rank sought is greater than can be found in the subtree.

Initially, the current node is the root, the rank sought is the original query, and the fallback index is a special "overflow" value indicating that the rank is not in the tree. (Depending on the application, or might be used for this purpose.)

Each step, either the current node is a fictitious node (index greater than ), or we must decide if the position sought is to the left or right of the end of the current node. If the rank sought is less than the Fenwick array value for the current node, we must search its left subtree. If it is greater, search its right subtree. If it is equal, the direction chosen depends on how you wish to handle searches for sums lying exactly between two nodes.

These three possibilities are then further divided based on whether the current node is a leaf or not:

Pseudocode

A simple pseudocode implementation of the two main operations on a Fenwick tree—query and update—is as following:

function query(tree, index) is     sum := 0     while index > 0 do         sum += tree[index]         index -= lsb(index)     return sum  function update(tree, index, value) iswhile index < size(tree) do         tree[index] += value         index += lsb(index)

The function computes the least significant 1-bit or last set bit of the given or, equivalently, the largest power of two that is also a divisor of . For example, , as shown in its binary representation: . This function can be simply implemented in code through a bitwise AND operation: lsb(n) = n & (-n), assuming a signed integer data type. [3]

Construction

One naive algorithm to construct a Fenwick tree consists of initializing the tree with null values and updating each index individually. This solution works in time, but an construction is possible: [6]

function construct(values) is     tree := values     for every index, value in tree do         parentIndex := index + lsb(index)         if parentIndex < size(tree) then             tree[parentIndex] += value     return tree

See also

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Treap</span>

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computer science, a fusion tree is a type of tree data structure that implements an associative array on w-bit integers on a finite universe, where each of the input integers has size less than 2w and is non-negative. When operating on a collection of n key–value pairs, it uses O(n) space and performs searches in O(logwn) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.

<span class="mw-page-title-main">Rope (data structure)</span> Data structure for storing strings

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself.

<span class="mw-page-title-main">Range minimum query</span> Minimizing problem in computer programming

In computer science, a range minimum query (RMQ) solves the problem of finding the minimal value in a sub-array of an array of comparable objects. Range minimum queries have several use cases in computer science, such as the lowest common ancestor problem and the longest common prefix problem (LCP).

<span class="mw-page-title-main">Euler tour technique</span>

The Euler tour technique (ETT), named after Leonhard Euler, is a method in graph theory for representing trees. The tree is viewed as a directed graph that contains two directed edges for each edge in the tree. The tree can then be represented as a Eulerian circuit of the directed graph, known as the Euler tour representation (ETR) of the tree. The ETT allows for efficient, parallel computation of solutions to common problems in algorithmic graph theory. It was introduced by Tarjan and Vishkin in 1984.

In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In computer science, the longest common prefix array is an auxiliary data structure to the suffix array. It stores the lengths of the longest common prefixes (LCPs) between all pairs of consecutive suffixes in a sorted suffix array.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

In computer science, and more precisely regarding data structures, a persistent array is a persistent data structure with properties similar to a (non-persistent) array. That is, after a value's update in a persistent array, there exist two persistent arrays: one persistent array in which the update is taken into account, and one which is equal to the array before the update.

The PH-tree is a tree data structure used for spatial indexing of multi-dimensional data (keys) such as geographical coordinates, points, feature vectors, rectangles or bounding boxes. The PH-tree is space partitioning index with a structure similar to that of a quadtree or octree. However, unlike quadtrees, it uses a splitting policy based on tries and similar to Crit bit trees that is based on the bit-representation of the keys. The bit-based splitting policy, when combined with the use of different internal representations for nodes, provides scalability with high-dimensional data. The bit-representation splitting policy also imposes a maximum depth, thus avoiding degenerated trees and the need for rebalancing.

References

  1. Boris Ryabko (1989). "A fast on-line code" (PDF). Soviet Math. Dokl. 39 (3): 533–537. Archived (PDF) from the original on 2019-07-17. Retrieved 2019-07-17.
  2. Boris Ryabko (1992). "A fast on-line adaptive code" (PDF). IEEE Transactions on Information Theory. 28 (1): 1400–1404. Archived (PDF) from the original on 2019-07-14. Retrieved 2019-07-14.
  3. 1 2 Peter M. Fenwick (1994). "A new data structure for cumulative frequency tables". Software: Practice and Experience. 24 (3): 327–336. CiteSeerX   10.1.1.14.8917 . doi:10.1002/spe.4380240306. S2CID   7519761.
  4. Marchini, Stefano; Vigna, Sebastiano (14 October 2019). "Compact Fenwick trees for dynamic ranking and selection". arXiv: 1904.12370 [cs.DS]. Extensive discussion of practical implementation details.
  5. Knuth, Donald (2011). Combinatorial Algorithms, Part 1. The Art of Computer Programming. Vol. 4A. Upper Saddle River, NJ: Addison-Wesley Professional. pp. 164–165.
  6. Halim, Steven; Halim, Felix; Effendy, Suhendry (3 December 2018). Competitive Programming. Vol. 1 (4th ed.). Lulu Press, Incorporated. ISBN   9781716745522.