Range searching

Last updated December 31, 2023

In computer science, the range searching problem consists of processing a set S of objects, in order to determine which objects from S intersect with a query object, called the range. For example, if S is a set of points corresponding to the coordinates of several cities, find the subset of cities within a given range of latitudes and longitudes.

Variations

There are several variations of the problem, and different data structures may be necessary for different variations.^[1] In order to obtain an efficient solution, several aspects of the problem need to be specified:

Object types: Algorithms depend on whether S consists of points, lines, line segments, boxes, polygons.... The simplest and most studied objects to search are points.
Range types: The query ranges also need to be drawn from a predetermined set. Some well-studied sets of ranges, and the names of the respective problems are axis-aligned rectangles (orthogonal range searching), simplices, halfspaces, and spheres/circles.
Query types: If the list of all objects that intersect the query range must be reported, the problem is called range reporting, and the query is called a reporting query. Sometimes, only the number of objects that intersect the range is required. In this case, the problem is called range counting, and the query is called a counting query. The emptiness query reports whether there is at least one object that intersects the range. In the semigroup version, a commutative semigroup (S,+) is specified, each point is assigned a weight from S, and it is required to report the semigroup sum of the weights of the points that intersect the range.
Dynamic range searching vs. static range searching: In the static setting the set S is known in advance. In dynamic setting objects may be inserted or deleted between queries.
Offline range searching: Both the set of objects and the whole set of queries are known in advance.

Data structures

Orthogonal range searching

In orthogonal range searching, the set S consists of $n$ points in $d$ dimensions, and the query consists of intervals in each of those dimensions. Thus, the query consists of a multi-dimensional axis-aligned rectangle. With an output size of $k$ , Jon Bentley used a k-d tree to achieve (in Big O notation) $O(n)$ space and $O{\big (}n^{1-{\frac {1}{d}}}+k{\big )}$ query time.^[2] Bentley also proposed using range trees, which improved query time to $O(\log ^{d}n+k)$ but increased space to $O(n\log ^{d-1}n)$ .^[3] Dan Willard used downpointers, a special case of fractional cascading to reduce the query time further to $O(\log ^{d-1}n+k)$ . ^[4]

While the above results were achieved in the pointer machine model, further improvements have been made in the word RAM model of computation in low dimensions (2D, 3D, 4D). Bernard Chazelle used compress range trees to achieve $O(\log n)$ query time and $O(n)$ space for range counting.^[5] Joseph JaJa and others later improved this query time to $O\left({\dfrac {\log n}{\log \log n}}\right)$ for range counting, which matches a lower bound and is thus asymptotically optimal.^[6]

As of 2015, the best results (in low dimensions (2D, 3D, 4D)) for range reporting found by Timothy M. Chan, Kasper Larsen, and Mihai Pătrașcu, also using compressed range trees in the word RAM model of computation, are one of the following:^[7]

$O(n)$ space, $O(\log ^{\epsilon }n+k\log ^{\epsilon }n)$ query time
$O(n\log \log n)$ space, $O(\log \log n+k\log \log n)$ query time
$O(n\log ^{\epsilon }n)$ space, $O(\log \log n+k)$ query time

In the orthogonal case, if one of the bounds is infinity, the query is called three-sided. If two of the bounds are infinity, the query is two-sided, and if none of the bounds are infinity, then the query is four-sided.

Dynamic range searching

While in static range searching the set S is known in advance, dynamic range searching, insertions and deletions of points are allowed. In the incremental version of the problem, only insertions are allowed, whereas the decremental version only allows deletions. For the orthogonal case, Kurt Mehlhorn and Stefan Näher created a data structure for dynamic range searching which uses dynamic fractional cascading to achieve $O(n\log n)$ space and $O(\log n\log \log n+k)$ query time.^[8] Both incremental and decremental versions of the problem can be solved with $O(\log n+k)$ query time, but it is unknown whether general dynamic range searching can be done with that query time.

Colored range searching

The problem of colored range counting considers the case where points have categorical attributes. If the categories are considered as colors of points in geometric space, then a query is for how many colors appear in a particular range. Prosenjit Gupta and others described a data structure in 1995 which solved 2D orthogonal colored range counting in $O(n^{2}\log ^{2}n)$ space and $O(\log ^{2}n)$ query time.^[9]

Applications

In addition to being considered in computational geometry, range searching, and orthogonal range searching in particular, has applications for range queries in databases. Colored range searching is also used for and motivated by searching through categorical data. For example, determining the rows in a database of bank accounts which represent people whose age is between 25 and 40 and who have between $10000 and $20000 might be an orthogonal range reporting problem where age and money are two dimensions.

Related Research Articles

In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space, or equivalently as the set of all convex combinations of points in the subset. For a bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset.

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with a history stretching back to antiquity.

The point location problem is a fundamental topic of computational geometry. It finds applications in areas that deal with processing geometrical data: computer graphics, geographic information systems (GIS), motion planning, and computer aided design (CAD).

<span class="mw-page-title-main">Bounding volume</span> Closed volume that completely contains the union of a set of objects

In computer graphics and computational geometry, a bounding volume for a set of objects is a closed region that completely contains the union of the objects in the set. Bounding volumes are used to improve the efficiency of geometrical operations, such as by using simple regions, having simpler ways to test for overlap.

<span class="mw-page-title-main">Polygon triangulation</span> Partition of a simple polygon into triangles

In computational geometry, polygon triangulation is the partition of a polygonal area $P$ into a set of triangles, i.e., finding a set of triangles with pairwise non-intersecting interiors whose union is $P$ .

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

<span class="mw-page-title-main">Orthogonal convex hull</span> Minimal superset that intersects each axis-parallel line in an interval

In geometry, a set $K \subset R d$ is defined to be orthogonally convex if, for every line $L$ that is parallel to one of standard basis vectors, the intersection of $K$ with $L$ is empty, a point, or a single segment. The term "orthogonal" refers to corresponding Cartesian basis and coordinates in Euclidean space, where different basis vectors are perpendicular, as well as corresponding lines. Unlike ordinary convex sets, an orthogonally convex set is not necessarily connected.

<span class="mw-page-title-main">K-set (geometry)</span> Points separated from others by a line

In discrete geometry, a $-set of a finite point set in the Euclidean plane is a subset of elements of that can be strictly separated from the remaining points by a line. More generally, in Euclidean space of higher dimensions, a -set of a finite point set is a subset of elements that can be separated from the remaining points by a hyperplane. In particular, when, the line or hyperplane that separates a -set from the rest of is a halving line or halving plane .$

The closest pair of points problem or closest pair problem is a problem of computational geometry: given $points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.$

<span class="mw-page-title-main">Monotone polygon</span>

In geometry, a polygon P in the plane is called monotone with respect to a straight line L, if every line orthogonal to L intersects the boundary of P at most twice.

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of $but worse storage of, where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.$

In computational geometry, the Bentley–Ottmann algorithm is a sweep line algorithm for listing all crossings in a set of line segments, i.e. it finds the intersection points of line segments. It extends the Shamos–Hoey algorithm, a similar previous algorithm for testing whether or not a set of line segments has any crossings. For an input consisting of $line segments with crossings, the Bentley-Ottmann algorithm takes time . In cases where, this is an improvement on a naïve algorithm that tests every pair of segments, which takes .$

Kenneth Lee Clarkson is an American computer scientist known for his research in computational geometry. He is a researcher at the IBM Almaden Research Center, and co-editor-in-chief of the Journal of Computational Geometry.

<span class="mw-page-title-main">David Mount</span> American computer scientist

David Mount is a professor at the University of Maryland, College Park department of computer science whose research is in computational geometry.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

In computational geometry, the convex layers of a set of points in the Euclidean plane are a sequence of nested convex polygons having the points as their vertices. The outermost one is the convex hull of the points and the rest are formed in the same way recursively. The innermost layer may be degenerate, consisting only of one or two points. The problem of constructing convex layers has also been called onion peeling or onion decomposition.

References

↑ Agarwal, P. K.; Erickson, J. (1999), "Geometric Range Searching and Its Relatives", in Chazelle, Bernard; Goodman, Jacob; Pollack, Richard (eds.), Advances in Discrete and Computational Geometry: proceedings of the 1996 AMS-IMS-SIAM joint summer research conference, Discrete and Computational Geometry--Ten Years Later, July 14-18, 1996, Mount Holyoke College, Contemporary Mathematics, vol. 223, American Mathematical Society Press, pp. 1–56
↑ Bentley, Jon (1975). "Multidimensional binary search trees used for associative searching". Communications of the ACM . 18 (9): 509–517. doi: 10.1145/361002.361007 . S2CID 13091446.
↑ Bentley, Jon (1980). "Multidimensional divide-and-conquer". Communications of the ACM . 23 (4): 214–229. doi: 10.1145/358841.358850 . S2CID 3997186.
↑ Willard, Dan (1985). "New data structures for orthogonal range queries". SIAM Journal on Computing . 14 (1): 232–253. doi:10.1137/0214019.
↑ Chazelle, Bernard (1988). "A functional approach to data structures and its use in multidimensional searching". SIAM Journal on Computing . 17 (3): 427–462. CiteSeerX 10.1.1.133.9153 . doi:10.1137/0217026.
↑ JaJa, Joseph; Mortensen, Christian; Shi, Qingmin (2005). "Space-efficient and fast algorithms for multidimensional dominance reporting and counting". International Symposium on Algorithms and Computation : 558–568.
↑ Chan, Timothy; Larsen, Kasper; Pătrașcu, Mihai (2011). "Orthogonal range searching on the RAM, revisited". Symposium on Computational Geometry : 1–10. arXiv: 1103.5510 .
↑ Mehlhorn, Kurt; Näher, Stefan (1990). "Dynamic fractional cascading" (PDF). Algorithmica . 5 (2): 215–241. doi:10.1007/BF01840386. S2CID 7721690.
↑ Gupta, Prosenjit; Janardan, Ravi; Smid, Michiel (1995). "Further results on generalized intersection searching problems: Counting, reporting, and dynamization". Journal of Algorithms . 19 (2): 282–317. doi:10.1006/jagm.1995.1038. hdl: 11858/00-001M-0000-0014-B721-F .