Range tree

Range tree
Range tree
Type	tree
Invented	1979
Invented by	Jon Louis Bentley
Operation
Time complexity in big O notation
Operation	Average
Search
Space complexity
Space

Last updated August 10, 2024

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979.^[1] Similar data structures were discovered independently by Lueker,^[2] Lee and Wong,^[3] and Willard.^[4] The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of (in Big O notation) $O(\log ^{d}n+k)$ but worse storage of $O(n\log ^{d-1}n)$ , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

Data structure

An example of a 1-dimensional range tree. Each node which is not a leaf stores the largest value of its left subtree. 1-dimensional-range-tree.svg — An example of a 1-dimensional range tree. Each node which is not a leaf stores the largest value of its left subtree.

A range tree on a set of 1-dimensional points is a balanced binary search tree on those points. The points stored in the tree are stored in the leaves of the tree; each internal node stores the largest value of its left subtree. A range tree on a set of points in d-dimensions is a recursively defined multi-level binary search tree. Each level of the data structure is a binary search tree on one of the d-dimensions. The first level is a binary search tree on the first of the d-coordinates. Each vertex v of this tree contains an associated structure that is a (d−1)-dimensional range tree on the last (d−1)-coordinates of the points stored in the subtree of v.

Operations

Construction

A 1-dimensional range tree on a set of n points is a binary search tree, which can be constructed in $O(n\log n)$ time. Range trees in higher dimensions are constructed recursively by constructing a balanced binary search tree on the first coordinate of the points, and then, for each vertex v in this tree, constructing a (d−1)-dimensional range tree on the points contained in the subtree of v. Constructing a range tree this way would require $O(n\log ^{d}n)$ time.

This construction time can be improved for 2-dimensional range trees to $O(n\log n)$ .^[7] Let S be a set of n 2-dimensional points. If S contains only one point, return a leaf containing that point. Otherwise, construct the associated structure of S, a 1-dimensional range tree on the y-coordinates of the points in S. Let x_m be the median x-coordinate of the points. Let S_L be the set of points with x-coordinate less than or equal to x_m and let S_R be the set of points with x-coordinate greater than x_m. Recursively construct v_L, a 2-dimensional range tree on S_L, and v_R, a 2-dimensional range tree on S_R. Create a vertex v with left-child v_L and right-child v_R. If we sort the points by their y-coordinates at the start of the algorithm, and maintain this ordering when splitting the points by their x-coordinate, we can construct the associated structures of each subtree in linear time. This reduces the time to construct a 2-dimensional range tree to $O(n\log n)$ , and also reduces the time to construct a d-dimensional range tree to $O(n\log ^{d-1}n)$ .

Range queries

A 1-dimensional range query [x1, x2]. Points stored in the subtrees shaded in gray will be reported. find(x1) and find(x2) will be reported if they are inside the query interval. 1-dimensional-range-query.svg — A 1-dimensional range query [x₁, x₂]. Points stored in the subtrees shaded in gray will be reported. find(x₁) and find(x₂) will be reported if they are inside the query interval.

A range query on a range tree reports the set of points that lie inside a given interval. To report the points that lie in the interval [x₁, x₂], we start by searching for x₁ and x₂. At some vertex in the tree, the search paths to x₁ and x₂ will diverge. Let v_split be the last vertex that these two search paths have in common. For every vertex v in the search path from v_split to x₁, if the value stored at v is greater than x₁, report every point in the right-subtree of v. If v is a leaf, report the value stored at v if it is inside the query interval. Similarly, reporting all of the points stored in the left-subtrees of the vertices with values less than x₂ along the search path from v_split to x₂, and report the leaf of this path if it lies within the query interval.

Since the range tree is a balanced binary tree, the search paths to x₁ and x₂ have length $O(\log n)$ . Reporting all of the points stored in the subtree of a vertex can be done in linear time using any tree traversal algorithm. It follows that the time to perform a range query is $O(\log n+k)$ , where k is the number of points in the query interval.

Range queries in d-dimensions are similar. Instead of reporting all of the points stored in the subtrees of the search paths, perform a (d−1)-dimensional range query on the associated structure of each subtree. Eventually, a 1-dimensional range query will be performed and the correct points will be reported. Since a d-dimensional query consists of $O(\log n)$ (d−1)-dimensional range queries, it follows that the time required to perform a d-dimensional range query is $O(\log ^{d}n+k)$ , where k is the number of points in the query interval. This can be reduced to $O(\log ^{d-1}n+k)$ using a variant of fractional cascading.^[2]^[4]^[7]

Related Research Articles

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a fusion tree is a type of tree data structure that implements an associative array on $w$ -bit integers on a finite universe, where each of the input integers has size less than 2^w and is non-negative. When operating on a collection of $n$ key–value pairs, it uses $O (n)$ space and performs searches in $O (log w n)$ time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of $w$ . It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers $x 0, x 1, x 2, ...$ is a second sequence of numbers $y 0, y 1, y 2, ...$ , the sums of prefixes of the input sequence:

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

<span class="mw-page-title-main">Range searching</span>

In computer science, the range searching problem consists of processing a set S of objects, in order to determine which objects from S intersect with a query object, called the range. For example, if S is a set of points corresponding to the coordinates of several cities, find the subset of cities within a given range of latitudes and longitudes.

A tango tree is a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Pătrașcu in 2004. It is named after Buenos Aires, of which the tango is emblematic.

In computer science, the segment tree is a data structure used for storing information about intervals or segments. It allows querying which of the stored segments contain a given point. A similar data structure is the interval tree.

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

A Fenwick tree or binary indexed tree(BIT) is a data structure that can efficiently update values and calculate prefix sums in an array of values.

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. A ball tree partitions data points into a nested set of balls. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computational geometry and database theory, a range reporting query asks for a list of the points that match the query. The query is often specified by a geometric shape, containing all the points that should match, and is called a range. Range reporting is a special case of range searching, in which queries may return other kinds of aggregate information about points in a range.

References

↑ Bentley, J. L. (1979). "Decomposable searching problems" (PDF). Information Processing Letters. 8 (5): 244–251. doi:10.1016/0020-0190(79)90117-0. Archived from the original on September 24, 2017.
1 2 Lueker, G. S. (1978). "A data structure for orthogonal range queries". 19th Annual Symposium on Foundations of Computer Science (sfcs 1978). pp. 28–21. doi:10.1109/SFCS.1978.1. S2CID 14970942.
↑ Lee, D. T.; Wong, C. K. (1980). "Quintary trees: A file structure for multidimensional database systems". ACM Transactions on Database Systems. 5 (3): 339. doi:10.1145/320613.320618. S2CID 2547376.
1 2 Willard, Dan E. The super-b-tree algorithm (Technical report). Cambridge, MA: Aiken Computer Lab, Harvard University. TR-03-79.
↑ Chazelle, Bernard (1990). "Lower Bounds for Orthogonal Range Searching: I. The Reporting Case" (PDF). Journal of the ACM. 37 (2): 200–212. doi:10.1145/77600.77614. S2CID 8895683.
↑ Chazelle, Bernard (1990). "Lower Bounds for Orthogonal Range Searching: II. The Arithmetic Model" (PDF). Journal of the ACM. 37: 439–463. doi:10.1145/79147.79149. S2CID 15935619.
1 2 de Berg, Mark; Cheong, Otfried; van Kreveld, Marc; Overmars, Mark (2008). Computational Geometry. doi:10.1007/978-3-540-77974-2. ISBN 978-3-540-77973-5.

External links

Range and Segment Trees in CGAL, the Computational Geometry Algorithms Library.
Lecture 8: Range Trees, Marc van Kreveld. Archived here.
Range Trees using PAM, the parallel augmented map library.
2D Range Tree Visualization, Zhou Kaixuan.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Bentley79-1] Bentley, J. L. (1979). "Decomposable searching problems" (PDF). Information Processing Letters. 8 (5): 244–251. doi:10.1016/0020-0190(79)90117-0. Archived from the original on September 24, 2017.

[Lueker78-2] 1 2 Lueker, G. S. (1978). "A data structure for orthogonal range queries". 19th Annual Symposium on Foundations of Computer Science (sfcs 1978). pp. 28–21. doi:10.1109/SFCS.1978.1. S2CID 14970942.

[LeeWong80-3] Lee, D. T.; Wong, C. K. (1980). "Quintary trees: A file structure for multidimensional database systems". ACM Transactions on Database Systems. 5 (3): 339. doi:10.1145/320613.320618. S2CID 2547376.

[Willard79-4] 1 2 Willard, Dan E. The super-b-tree algorithm (Technical report). Cambridge, MA: Aiken Computer Lab, Harvard University. TR-03-79.

[Chazelle90_1-5] Chazelle, Bernard (1990). "Lower Bounds for Orthogonal Range Searching: I. The Reporting Case" (PDF). Journal of the ACM. 37 (2): 200–212. doi:10.1145/77600.77614. S2CID 8895683.

[Chazelle90_2-6] Chazelle, Bernard (1990). "Lower Bounds for Orthogonal Range Searching: II. The Arithmetic Model" (PDF). Journal of the ACM. 37: 439–463. doi:10.1145/79147.79149. S2CID 15935619.

[DutchBook3E-7] 1 2 de Berg, Mark; Cheong, Otfried; van Kreveld, Marc; Overmars, Mark (2008). Computational Geometry. doi:10.1007/978-3-540-77974-2. ISBN 978-3-540-77973-5.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic Palindrome (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal d-ary Fibonacci Leftist Pairing Skew binomial Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top