Range tree

Last updated
Range tree
Type tree
Invented1979
Invented by Jon Louis Bentley
Time complexity in big O notation
OperationAverageWorst case
Search
Space complexity
Space

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. [1] Similar data structures were discovered independently by Lueker, [2] Lee and Wong, [3] and Willard. [4] The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of (in Big O notation) but worse storage of , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

Contents

Bernard Chazelle improved this to query time and space complexity . [5] [6]

Data structure

An example of a 1-dimensional range tree. Each node which is not a leaf stores the largest value of its left subtree. 1-dimensional-range-tree.svg
An example of a 1-dimensional range tree. Each node which is not a leaf stores the largest value of its left subtree.

A range tree on a set of 1-dimensional points is a balanced binary search tree on those points. The points stored in the tree are stored in the leaves of the tree; each internal node stores the largest value of its left subtree. A range tree on a set of points in d-dimensions is a recursively defined multi-level binary search tree. Each level of the data structure is a binary search tree on one of the d-dimensions. The first level is a binary search tree on the first of the d-coordinates. Each vertex v of this tree contains an associated structure that is a (d−1)-dimensional range tree on the last (d−1)-coordinates of the points stored in the subtree of v.

Operations

Construction

A 1-dimensional range tree on a set of n points is a binary search tree, which can be constructed in time. Range trees in higher dimensions are constructed recursively by constructing a balanced binary search tree on the first coordinate of the points, and then, for each vertex v in this tree, constructing a (d−1)-dimensional range tree on the points contained in the subtree of v. Constructing a range tree this way would require time.

This construction time can be improved for 2-dimensional range trees to . [7] Let S be a set of n 2-dimensional points. If S contains only one point, return a leaf containing that point. Otherwise, construct the associated structure of S, a 1-dimensional range tree on the y-coordinates of the points in S. Let xm be the median x-coordinate of the points. Let SL be the set of points with x-coordinate less than or equal to xm and let SR be the set of points with x-coordinate greater than xm. Recursively construct vL, a 2-dimensional range tree on SL, and vR, a 2-dimensional range tree on SR. Create a vertex v with left-child vL and right-child vR. If we sort the points by their y-coordinates at the start of the algorithm, and maintain this ordering when splitting the points by their x-coordinate, we can construct the associated structures of each subtree in linear time. This reduces the time to construct a 2-dimensional range tree to , and also reduces the time to construct a d-dimensional range tree to .

Range queries

A 1-dimensional range query [x1, x2]. Points stored in the subtrees shaded in gray will be reported. find(x1) and find(x2) will be reported if they are inside the query interval. 1-dimensional-range-query.svg
A 1-dimensional range query [x1, x2]. Points stored in the subtrees shaded in gray will be reported. find(x1) and find(x2) will be reported if they are inside the query interval.

A range query on a range tree reports the set of points that lie inside a given interval. To report the points that lie in the interval [x1, x2], we start by searching for x1 and x2. At some vertex in the tree, the search paths to x1 and x2 will diverge. Let vsplit be the last vertex that these two search paths have in common. For every vertex v in the search path from vsplit to x1, if the value stored at v is greater than x1, report every point in the right-subtree of v. If v is a leaf, report the value stored at v if it is inside the query interval. Similarly, reporting all of the points stored in the left-subtrees of the vertices with values less than x2 along the search path from vsplit to x2, and report the leaf of this path if it lies within the query interval.

Since the range tree is a balanced binary tree, the search paths to x1 and x2 have length . Reporting all of the points stored in the subtree of a vertex can be done in linear time using any tree traversal algorithm. It follows that the time to perform a range query is , where k is the number of points in the query interval.

Range queries in d-dimensions are similar. Instead of reporting all of the points stored in the subtrees of the search paths, perform a (d−1)-dimensional range query on the associated structure of each subtree. Eventually, a 1-dimensional range query will be performed and the correct points will be reported. Since a d-dimensional query consists of (d−1)-dimensional range queries, it follows that the time required to perform a d-dimensional range query is , where k is the number of points in the query interval. This can be reduced to using a variant of fractional cascading. [2] [4] [7]

See also

Related Research Articles

<span class="mw-page-title-main">Binary search algorithm</span> Search algorithm finding the position of a target value within a sorted array

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Quadtree</span> Tree data structure in which each internal node has exactly four children, to partition a 2D area

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a fusion tree is a type of tree data structure that implements an associative array on w-bit integers on a finite universe, where each of the input integers has size less than 2w and is non-negative. When operating on a collection of n key–value pairs, it uses O(n) space and performs searches in O(logwn) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

<span class="mw-page-title-main">Range searching</span>

In computer science, the range searching problem consists of processing a set S of objects, in order to determine which objects from S intersect with a query object, called the range. For example, if S is a set of points corresponding to the coordinates of several cities, find the subset of cities within a given range of latitudes and longitudes.

A tango tree is a type of binary search tree proposed by Erik D. Demaine, Dion Harmon, John Iacono, and Mihai Pătrașcu in 2004. It is named after Buenos Aires, of which the tango is emblematic.

<span class="mw-page-title-main">Segment tree</span> Computer science data structure

In computer science, the segment tree is a data structure used for storing information about intervals or segments. It allows querying which of the stored segments contain a given point. A similar data structure is the interval tree.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

<span class="mw-page-title-main">Fenwick tree</span> Data structure

A Fenwick tree or binary indexed tree(BIT) is a data structure that can efficiently update values and calculate prefix sums in an array of values.

<span class="mw-page-title-main">Top tree</span>

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. A ball tree partitions data points into a nested set of balls. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computational geometry and database theory, a range reporting query asks for a list of the points that match the query. The query is often specified by a geometric shape, containing all the points that should match, and is called a range. Range reporting is a special case of range searching, in which queries may return other kinds of aggregate information about points in a range.

In computer science, a priority search tree is a tree data structure for storing points in two dimensions. It was originally introduced by Edward M. McCreight. It is effectively an extension of the priority queue with the purpose of improving the search time from O(n) to O(s + log n) time, where n is the number of points in the tree and s is the number of points returned by the search.

References

  1. Bentley, J. L. (1979). "Decomposable searching problems" (PDF). Information Processing Letters. 8 (5): 244–251. doi:10.1016/0020-0190(79)90117-0. Archived from the original on September 24, 2017.
  2. 1 2 Lueker, G. S. (1978). "A data structure for orthogonal range queries". 19th Annual Symposium on Foundations of Computer Science (sfcs 1978). pp. 28–21. doi:10.1109/SFCS.1978.1. S2CID   14970942.
  3. Lee, D. T.; Wong, C. K. (1980). "Quintary trees: A file structure for multidimensional database systems". ACM Transactions on Database Systems. 5 (3): 339. doi:10.1145/320613.320618. S2CID   2547376.
  4. 1 2 Willard, Dan E. The super-b-tree algorithm (Technical report). Cambridge, MA: Aiken Computer Lab, Harvard University. TR-03-79.
  5. Chazelle, Bernard (1990). "Lower Bounds for Orthogonal Range Searching: I. The Reporting Case" (PDF). Journal of the ACM. 37 (2): 200–212. doi:10.1145/77600.77614. S2CID   8895683.
  6. Chazelle, Bernard (1990). "Lower Bounds for Orthogonal Range Searching: II. The Arithmetic Model" (PDF). Journal of the ACM. 37: 439–463. doi:10.1145/79147.79149. S2CID   15935619.
  7. 1 2 de Berg, Mark; Cheong, Otfried; van Kreveld, Marc; Overmars, Mark (2008). Computational Geometry. doi:10.1007/978-3-540-77974-2. ISBN   978-3-540-77973-5.