Range reporting

Last updated

In computational geometry and database theory, a range reporting query asks for a list of the points that match the query. The query is often specified by a geometric shape, containing all the points that should match, and is called a range. Range reporting is a special case of range searching, in which queries may return other kinds of aggregate information about points in a range.

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with history stretching back to antiquity.

Database organized collection of data

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

Range reporting queries are often handled by building a data structure from a collection of points that can answer queries efficiently. Because the worst case output size for a range reporting query, measured as a function of the data set size n, can be n itself, much of the research on range reporting data structures has investigated output-sensitive algorithms, where the query time is analyzed in terms of both n and the number of reported points (often denoted k).

Data structure Particular way of storing and organizing data in a computer

In computer science, a data structure is a data organization, management and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

In computer science, an output-sensitive algorithm is an algorithm whose running time depends on the size of the output, instead of, or in addition to, the size of the input. For certain problems where the output size varies widely, for example from linear in the size of the input to quadratic in the size of the input, analyses that take the output size explicitly into account can produce better runtime bounds that differentiate algorithms that would otherwise have identical asymptotic complexity.

For example, for one-dimensional (numeric) data with query ranges that are intervals, range reporting queries can be handled by storing the data in a sorted array. With this structure, one can use binary search to find the point closest to the start of a query interval, and then scan the array from that point forwards to list all of the points in the interval. Storing this data structure uses O(n) (linear) space, and it handles queries in time O(log n + k) per query.

In mathematics, a (real) interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers x satisfying 0 ≤ x ≤ 1 is an interval which contains 0 and 1, as well as all numbers between them. Other examples of intervals are the set of all real numbers , the set of all negative real numbers, and the empty set.

Related Research Articles

Discrete mathematics study of discrete mathematical structures

Discrete mathematics is the study of mathematical structures that are fundamentally discrete rather than continuous. In contrast to real numbers that have the property of varying "smoothly", the objects studied in discrete mathematics – such as integers, graphs, and statements in logic – do not vary smoothly in this way, but have distinct, separated values. Discrete mathematics therefore excludes topics in "continuous mathematics" such as calculus or Euclidean geometry. Discrete objects can often be enumerated by integers. More formally, discrete mathematics has been characterized as the branch of mathematics dealing with countable sets. However, there is no exact definition of the term "discrete mathematics." Indeed, discrete mathematics is described less by what is included than by what is excluded: continuously varying quantities and related notions.

Hash function type of function that can be used to map data of arbitrary size to data of fixed size

A hash function is any function that can be used to map data of arbitrary size onto data of a fixed size. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file. One such application is finding similar stretches in DNA sequences. They are also useful in cryptography. A cryptographic hash function allows one to easily verify whether some input data map onto a given hash value, but if the input data is unknown it is deliberately difficult to reconstruct it by knowing the stored hash value. This is used for assuring integrity of transmitted data, and is the building block for HMACs, which provide message authentication.

Quadtree geometric data structure based on the subdivision of squares onto four smaller squares

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>k</i>-d tree multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key. k-d trees are a special case of binary space partitioning trees.

Z-order curve function which maps multidimensional data to one dimension while preserving locality of the data points

In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It is named after Guy Macdonald Morton, who first applied the order to file sequencing in 1966. The z-value of a point in multidimensions is simply calculated by interleaving the binary representations of its coordinate values. Once the data are sorted into this ordering, any one-dimensional data structure can be used such as binary search trees, B-trees, skip lists or hash tables. The resulting ordering can equivalently be described as the order one would get from a depth-first traversal of a quadtree.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest-neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M, find the closest point in S to q. Donald Knuth in vol. 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a k-NN search, where we need to find the k closest points.

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

Range searching

In data structures, the range searching problem most generally consists of preprocessing a set S of objects, in order to determine which objects from S intersect with a query object, called a range. For example, if S is a set of points corresponding to the coordinates of several cities, a geometric variant of the problem is to find cities within a certain latitude and longitude range.

The dynamic convex hull problem is a class of dynamic problems in computational geometry. The problem consists in the maintenance, i.e., keeping track, of the convex hull for the dynamically changing input data, i.e., when input data elements may be inserted, deleted, or modified. Problems of this class may be distinguished by the types of the input data and the allowed types of modification of the input data.

In computer science, a segment tree also known as a statistic tree is a tree data structure used for storing information about intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, it's a structure that cannot be modified once it's built. A similar data structure is the interval tree.

Bin (computational geometry) data structure in computational geometry

In computational geometry, the bin is a data structure which allows efficient region queries. Each time a data point falls into a bin, the frequency of that bin is increased by one.

In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of but worse storage of , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.

In computational geometry, the fixed-radius near neighbor problem is a variant of the nearest neighbor search problem. In the fixed-radius near neighbor problem, one is given as input a set of points in d-dimensional Euclidean space and a fixed distance Δ. One must design a data structure that, given a query point q, efficiently reports the points of the data structure that are within distance Δ of q. The problem has long been studied; Bentley (1975) cites a 1966 paper by Levinthal that uses this technique as part of a system for visualizing molecular structures, and it has many other applications.

David Mount is a professor at University of Maryland Department of Computer Science whose research is in computational geometry.

Emmerich (Emo) Welzl is a computer scientist known for his research in computational geometry. He is a professor in the Institute for Theoretical Computer Science at ETH Zurich in Switzerland.

Convex layers

In computational geometry, the convex layers of a set of points in the Euclidean plane are a sequence of nested convex polygons having the points as their vertices. The outermost one is the convex hull of the points and the rest are formed in the same way recursively. The innermost layer may be degenerate, consisting only of one or two points. The problem of constructing convex layers has also been called onion peeling or onion decomposition.

References

Pankaj Kumar Agarwal is an Indian computer scientist and mathematician researching algorithms in computational geometry and related areas. He is the RJR Nabisco Professor of Computer Science and Mathematics at Duke University, where he has been chair of the computer science department since 2004. He obtained his Ph.D. in Computer Science in 1989 from the Courant Institute, New York City, under the supervision of Micha Sharir.

Bernard Chazelle French computer scientist

Bernard Chazelle is a French-American computer scientist. He is currently the Eugene Higgins Professor of Computer Science at Princeton University. Much of his work is in computational geometry, where he is known for his study of algorithms, such as linear-time triangulation of a simple polygon, as well as major complexity results, such as lower bound techniques based on discrepancy theory. He is also known for his invention of the soft heap data structure and the most asymptotically efficient known algorithm for finding minimum spanning trees.

Jacob Eli Goodman is an American geometer who has spent most of his career at the City College of New York, where he is now professor emeritus. In 1986 he and Richard Pollack were the founding co-editors-in-chief of the journal Discrete and Computational Geometry.