Closest pair of points problem

Last updated
Closest pair of points shown in red Closest pair of points.svg
Closest pair of points shown in red

The closest pair of points problem or closest pair problem is a problem of computational geometry: given points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane [1] was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.

Contents

Time bounds

Randomized algorithms that solve the problem in linear time are known, in Euclidean spaces whose dimension is treated as a constant for the purposes of asymptotic analysis. [2] [3] [4] This is significantly faster than the time (expressed here in big O notation) that would be obtained by a naive algorithm of finding distances between all pairs of points and selecting the smallest.

It is also possible to solve the problem without randomization, in random-access machine models of computation with unlimited memory that allow the use of the floor function, in near-linear time. [5] In even more restricted models of computation, such as the algebraic decision tree, the problem can be solved in the somewhat slower time bound, [6] and this is optimal for this model, by a reduction from the element uniqueness problem. Both sweep line algorithms and divide-and-conquer algorithms with this slower time bound are commonly taught as examples of these algorithm design techniques. [7] [8]

Linear-time randomized algorithms

A linear expected time randomized algorithm of Rabin (1976), modified slightly by Richard Lipton to make its analysis easier, proceeds as follows, on an input set consisting of points in a -dimensional Euclidean space:

  1. Select pairs of points uniformly at random, with replacement, and let be the minimum distance of the selected pairs.
  2. Round the input points to a square grid of points whose size (the separation between adjacent grid points) is , and use a hash table to collect together pairs of input points that round to the same grid point.
  3. For each input point, compute the distance to all other inputs that either round to the same grid point or to another grid point within the Moore neighborhood of surrounding grid points.
  4. Return the smallest of the distances computed throughout this process.

The algorithm will always correctly determine the closest pair, because it maps any pair closer than distance to the same grid point or to adjacent grid points. The uniform sampling of pairs in the first step of the algorithm (compared to a different method of Rabin for sampling a similar number of pairs) simplifies the proof that the expected number of distances computed by the algorithm is linear. [4]

Instead, a different algorithm Khuller & Matias (1995) goes through two phases: a random iterated filtering process that approximates the closest distance to within an approximation ratio of , together with a finishing step that turns this approximate distance into the exact closest distance. The filtering process repeat the following steps, until becomes empty:

  1. Choose a point uniformly at random from .
  2. Compute the distances from to all the other points of and let be the minimum such distance.
  3. Round the input points to a square grid of size , and delete from all points whose Moore neighborhood has no other points.

The approximate distance found by this filtering process is the final value of , computed in the step before becomes empty. Each step removes all points whose closest neighbor is at distance or greater, at least half of the points in expectation, from which it follows that the total expected time for filtering is linear. Once an approximate value of is known, it can be used for the final steps of Rabin's algorithm; in these steps each grid point has a constant number of inputs rounded to it, so again the time is linear. [3]

Dynamic closest-pair problem

The dynamic version for the closest-pair problem is stated as follows:

If the bounding box for all points is known in advance and the constant-time floor function is available, then the expected -space data structure was suggested that supports expected-time insertions and deletions and constant query time. When modified for the algebraic decision tree model, insertions and deletions would require expected time. [9] The complexity of the dynamic closest pair algorithm cited above is exponential in the dimension , and therefore such an algorithm becomes less suitable for high-dimensional problems.

An algorithm for the dynamic closest-pair problem in dimensional space was developed by Sergey Bespamyatnikh in 1998. [10] Points can be inserted and deleted in time per point (in the worst case).

See also

Notes

  1. Shamos, Michael Ian; Hoey, Dan (1975). "Closest-point problems". 16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975. IEEE Computer Society. pp. 151–162. doi:10.1109/SFCS.1975.8.
  2. Rabin, M. (1976). "Probabilistic algorithms". Algorithms and Complexity: Recent Results and New Directions. Academic Press. pp. 21–39. As cited by Khuller & Matias (1995).
  3. 1 2 Khuller, Samir; Matias, Yossi (1995). "A simple randomized sieve algorithm for the closest-pair problem". Information and Computation . 118 (1): 34–37. doi: 10.1006/inco.1995.1049 . MR   1329236. S2CID   206566076.
  4. 1 2 Lipton, Richard (24 September 2011). "Rabin Flips a Coin". Gödel's Lost Letter and P=NP.
  5. Fortune, Steve; Hopcroft, John (1979). "A note on Rabin's nearest-neighbor algorithm". Information Processing Letters . 8 (1): 20–23. doi:10.1016/0020-0190(79)90085-1. hdl: 1813/7460 . MR   0515507.
  6. Clarkson, Kenneth L. (1983). "Fast algorithms for the all nearest neighbors problem". 24th Annual Symposium on Foundations of Computer Science, Tucson, Arizona, USA, 7-9 November 1983. IEEE Computer Society. pp. 226–232. doi:10.1109/SFCS.1983.16.
  7. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "33.4: Finding the closest pair of points". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 957–961. ISBN   0-262-03293-7.
  8. Kleinberg, Jon M.; Tardos, Éva (2006). "5.4 Finding the closest pair of points". Algorithm Design. Addison-Wesley. pp. 225–231. ISBN   978-0-321-37291-8.
  9. Golin, Mordecai; Raman, Rajeev; Schwarz, Christian; Smid, Michiel (1998). "Randomized data structures for the dynamic closest-pair problem" (PDF). SIAM Journal on Computing . 27 (4): 1036–1072. doi:10.1137/S0097539794277718. MR   1622005. S2CID   1242364.
  10. Bespamyatnikh, S. N. (1998). "An optimal algorithm for closest-pair maintenance". Discrete & Computational Geometry . 19 (2): 175–195. doi: 10.1007/PL00009340 . MR   1600047.

Related Research Articles

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with a history stretching back to antiquity.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

<span class="mw-page-title-main">Bounding sphere</span> Sphere that contains a set of objects

In mathematics, given a non-empty set of objects of finite extension in -dimensional space, for example a set of points, a bounding sphere, enclosing sphere or enclosing ball for that set is an -dimensional solid sphere containing all of these objects.

<span class="mw-page-title-main">Euclidean minimum spanning tree</span> Shortest network connecting points

A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key and creating point clouds. k-d trees are a special case of binary space partitioning trees.

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression:

<span class="mw-page-title-main">Klee's measure problem</span>

In computational geometry, Klee's measure problem is the problem of determining how efficiently the measure of a union of (multidimensional) rectangular ranges can be computed. Here, a d-dimensional rectangular range is defined to be a Cartesian product of d intervals of real numbers, which is a subset of Rd.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

Algorithms that construct convex hulls of various objects have a broad range of applications in mathematics and computer science.

<span class="mw-page-title-main">Smallest-circle problem</span> Finding the smallest circle that contains all given points

The smallest-circle problem is a mathematical problem of computing the smallest circle that contains all of a given set of points in the Euclidean plane. The corresponding problem in n-dimensional space, the smallest bounding sphere problem, is to compute the smallest n-sphere that contains all of a given set of points. The smallest-circle problem was initially proposed by the English mathematician James Joseph Sylvester in 1857.

In computational geometry, the Bentley–Ottmann algorithm is a sweep line algorithm for listing all crossings in a set of line segments, i.e. it finds the intersection points of line segments. It extends the Shamos–Hoey algorithm, a similar previous algorithm for testing whether or not a set of line segments has any crossings. For an input consisting of line segments with crossings, the Bentley–Ottmann algorithm takes time . In cases where , this is an improvement on a naïve algorithm that tests every pair of segments, which takes .

<span class="mw-page-title-main">Kenneth L. Clarkson</span> American computer scientist

Kenneth Lee Clarkson is an American computer scientist known for his research in computational geometry. He is a researcher at the IBM Almaden Research Center, and co-editor-in-chief of the Journal of Computational Geometry.

<span class="mw-page-title-main">David Mount</span> American computer scientist

David Mount is a professor at the University of Maryland, College Park department of computer science whose research is in computational geometry.

In the study of algorithms, an LP-type problem is an optimization problem that shares certain properties with low-dimensional linear programs and that may be solved by similar algorithms. LP-type problems include many important optimization problems that are not themselves linear programs, such as the problem of finding the smallest circle containing a given set of planar points. They may be solved by a combination of randomized algorithms in an amount of time that is linear in the number of elements defining the problem, and subexponential in the dimension of the problem.

In computer science, the cell-probe model is a model of computation similar to the random-access machine, except that all operations are free except memory access. This model is useful for proving lower bounds of algorithms for data structure problems.

<span class="mw-page-title-main">Delone set</span> Well-spaced set of points in a metric space

In the mathematical theory of metric spaces, ε-nets, ε-packings, ε-coverings, uniformly discrete sets, relatively dense sets, and Delone sets are several closely related definitions of well-spaced sets of points, and the packing radius and covering radius of these sets measure how well-spaced they are. These sets have applications in coding theory, approximation algorithms, and the theory of quasicrystals.

<span class="mw-page-title-main">Maxima of a point set</span>

In computational geometry, a point p in a finite set of points S is said to be maximal or non-dominated if there is no other point q in S whose coordinates are all greater than or equal to the corresponding coordinates of p. The maxima of a point setS are all the maximal points of S. The problem of finding all maximal points, sometimes called the problem of the maxima or maxima set problem, has been studied as a variant of the convex hull and orthogonal convex hull problems. It is equivalent to finding the Pareto frontier of a collection of points, and was called the floating-currency problem by Herbert Freeman based on an application involving comparing the relative wealth of individuals with different holdings of multiple currencies.

<span class="mw-page-title-main">Greedy geometric spanner</span>

In computational geometry, a greedy geometric spanner is an undirected graph whose distances approximate the Euclidean distances among a finite set of points in a Euclidean space. The vertices of the graph represent these points. The edges of the spanner are selected by a greedy algorithm that includes an edge whenever its two endpoints are not connected by a short path of shorter edges. The greedy spanner was first described in the PhD thesis of Gautam Das and conference paper and subsequent journal paper by Ingo Althöfer et al. These sources also credited Marshall Bern (unpublished) with the independent discovery of the same construction.