Closest pair of points problem

Last updated October 15, 2024

The closest pair of points problem or closest pair problem is a problem of computational geometry: given $n$ points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane^[1] was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.

Time bounds

Randomized algorithms that solve the problem in linear time are known, in Euclidean spaces whose dimension is treated as a constant for the purposes of asymptotic analysis.^[2]^[3]^[4] This is significantly faster than the $O(n^{2})$ time (expressed here in big O notation) that would be obtained by a naive algorithm of finding distances between all pairs of points and selecting the smallest.

It is also possible to solve the problem without randomization, in random-access machine models of computation with unlimited memory that allow the use of the floor function, in near-linear $O(n\log \log n)$ time.^[5] In even more restricted models of computation, such as the algebraic decision tree, the problem can be solved in the somewhat slower $O(n\log n)$ time bound,^[6] and this is optimal for this model, by a reduction from the element uniqueness problem. Both sweep line algorithms and divide-and-conquer algorithms with this slower time bound are commonly taught as examples of these algorithm design techniques.^[7]^[8]

Linear-time randomized algorithms

A linear expected time randomized algorithm of Rabin (1976), modified slightly by Richard Lipton to make its analysis easier, proceeds as follows, on an input set $S$ consisting of $n$ points in a $k$ -dimensional Euclidean space:

Select $n$ pairs of points uniformly at random, with replacement, and let $d$ be the minimum distance of the selected pairs.
Round the input points to a square grid of points whose size (the separation between adjacent grid points) is $d$ , and use a hash table to collect together pairs of input points that round to the same grid point.
For each input point, compute the distance to all other inputs that either round to the same grid point or to another grid point within the Moore neighborhood of $3^{k}-1$ surrounding grid points.
Return the smallest of the distances computed throughout this process.

The algorithm will always correctly determine the closest pair, because it maps any pair closer than distance $d$ to the same grid point or to adjacent grid points. The uniform sampling of pairs in the first step of the algorithm (compared to a different method of Rabin for sampling a similar number of pairs) simplifies the proof that the expected number of distances computed by the algorithm is linear.^[4]

Instead, a different algorithm Khuller & Matias (1995) goes through two phases: a random iterated filtering process that approximates the closest distance to within an approximation ratio of $2{\sqrt {k}}$ , together with a finishing step that turns this approximate distance into the exact closest distance. The filtering process repeat the following steps, until $S$ becomes empty:

Choose a point $p$ uniformly at random from $S$ .
Compute the distances from $p$ to all the other points of $S$ and let $d$ be the minimum such distance.
Round the input points to a square grid of size $d/(2{\sqrt {k}})$ , and delete from $S$ all points whose Moore neighborhood has no other points.

The approximate distance found by this filtering process is the final value of $d$ , computed in the step before $S$ becomes empty. Each step removes all points whose closest neighbor is at distance $d$ or greater, at least half of the points in expectation, from which it follows that the total expected time for filtering is linear. Once an approximate value of $d$ is known, it can be used for the final steps of Rabin's algorithm; in these steps each grid point has a constant number of inputs rounded to it, so again the time is linear.^[3]

Dynamic closest-pair problem

The dynamic version for the closest-pair problem is stated as follows:

Given a dynamic set of objects, find algorithms and data structures for efficient recalculation of the closest pair of objects each time the objects are inserted or deleted.

If the bounding box for all points is known in advance and the constant-time floor function is available, then the expected $O(n)$ -space data structure was suggested that supports expected-time $O(\log n)$ insertions and deletions and constant query time. When modified for the algebraic decision tree model, insertions and deletions would require $O(\log ^{2}n)$ expected time.^[9] The complexity of the dynamic closest pair algorithm cited above is exponential in the dimension $d$ , and therefore such an algorithm becomes less suitable for high-dimensional problems.

An algorithm for the dynamic closest-pair problem in $d$ dimensional space was developed by Sergey Bespamyatnikh in 1998.^[10] Points can be inserted and deleted in $O(\log n)$ time per point (in the worst case).

Notes

↑ Shamos, Michael Ian; Hoey, Dan (1975). "Closest-point problems". 16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975. IEEE Computer Society. pp. 151–162. doi:10.1109/SFCS.1975.8.
↑ Rabin, M. (1976). "Probabilistic algorithms". Algorithms and Complexity: Recent Results and New Directions. Academic Press. pp. 21–39. As cited by Khuller & Matias (1995).
1 2 Khuller, Samir; Matias, Yossi (1995). "A simple randomized sieve algorithm for the closest-pair problem". Information and Computation . 118 (1): 34–37. doi: 10.1006/inco.1995.1049 . MR 1329236. S2CID 206566076.
1 2 Lipton, Richard (24 September 2011). "Rabin Flips a Coin". Gödel's Lost Letter and P=NP.
↑ Fortune, Steve; Hopcroft, John (1979). "A note on Rabin's nearest-neighbor algorithm". Information Processing Letters . 8 (1): 20–23. doi:10.1016/0020-0190(79)90085-1. hdl: 1813/7460 . MR 0515507.
↑ Clarkson, Kenneth L. (1983). "Fast algorithms for the all nearest neighbors problem". 24th Annual Symposium on Foundations of Computer Science, Tucson, Arizona, USA, 7-9 November 1983. IEEE Computer Society. pp. 226–232. doi:10.1109/SFCS.1983.16.
↑ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "33.4: Finding the closest pair of points". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 957–961. ISBN 0-262-03293-7.
↑ Kleinberg, Jon M.; Tardos, Éva (2006). "5.4 Finding the closest pair of points". Algorithm Design. Addison-Wesley. pp. 225–231. ISBN 978-0-321-37291-8.
↑ Golin, Mordecai; Raman, Rajeev; Schwarz, Christian; Smid, Michiel (1998). "Randomized data structures for the dynamic closest-pair problem" (PDF). SIAM Journal on Computing . 27 (4): 1036–1072. doi:10.1137/S0097539794277718. MR 1622005. S2CID 1242364.
↑ Bespamyatnikh, S. N. (1998). "An optimal algorithm for closest-pair maintenance". Discrete & Computational Geometry . 19 (2): 175–195. doi: 10.1007/PL00009340 . MR 1600047.

Related Research Articles

In computational geometry, a Delaunay triangulation or Delone triangulation of a set of points in the plane subdivides their convex hull into triangles whose circumcircles do not contain any of the points. This maximizes the size of the smallest angle in any of the triangles, and tends to avoid sliver triangles.

Computational geometry is a branch of computer science devoted to the study of algorithms which can be stated in terms of geometry. Some purely geometrical problems arise out of the study of computational geometric algorithms, and such problems are also considered to be part of computational geometry. While modern computational geometry is a recent development, it is one of the oldest fields of computing with a history stretching back to antiquity.

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In mathematics, given a non-empty set of objects of finite extension in $-dimensional space, for example a set of points, a bounding sphere, enclosing sphere or enclosing ball for that set is a -dimensional solid sphere containing all of these objects.$

A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

In computational geometry, Klee's measure problem is the problem of determining how efficiently the measure of a union of (multidimensional) rectangular ranges can be computed. Here, a d-dimensional rectangular range is defined to be a Cartesian product of d intervals of real numbers, which is a subset of R^d.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

Algorithms that construct convex hulls of various objects have a broad range of applications in mathematics and computer science.

The smallest-circle problem is a computational geometry problem of computing the smallest circle that contains all of a given set of points in the Euclidean plane. The corresponding problem in n-dimensional space, the smallest bounding sphere problem, is to compute the smallest n-sphere that contains all of a given set of points. The smallest-circle problem was initially proposed by the English mathematician James Joseph Sylvester in 1857.

In computational geometry, the Bentley–Ottmann algorithm is a sweep line algorithm for listing all crossings in a set of line segments, i.e. it finds the intersection points of line segments. It extends the Shamos–Hoey algorithm, a similar previous algorithm for testing whether or not a set of line segments has any crossings. For an input consisting of $line segments with crossings, the Bentley-Ottmann algorithm takes time . In cases where, this is an improvement on a naïve algorithm that tests every pair of segments, which takes .$

Kenneth Lee Clarkson is an American computer scientist known for his research in computational geometry. He is a researcher at the IBM Almaden Research Center, and co-editor-in-chief of Discrete and Computational Geometry and of the Journal of Computational Geometry.

<span class="mw-page-title-main">David Mount</span> American computer scientist

David Mount is a professor at the University of Maryland, College Park department of computer science whose research is in computational geometry.

In the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster distances.

In the study of algorithms, an LP-type problem is an optimization problem that shares certain properties with low-dimensional linear programs and that may be solved by similar algorithms. LP-type problems include many important optimization problems that are not themselves linear programs, such as the problem of finding the smallest circle containing a given set of planar points. They may be solved by a combination of randomized algorithms in an amount of time that is linear in the number of elements defining the problem, and subexponential in the dimension of the problem.

In the mathematical theory of metric spaces, $ε$ -nets, $ε$ -packings, $ε$ -coverings, uniformly discrete sets, relatively dense sets, and Delone sets are several closely related definitions of well-spaced sets of points, and the packing radius and covering radius of these sets measure how well-spaced they are. These sets have applications in coding theory, approximation algorithms, and the theory of quasicrystals.

In computational geometry, a point $p$ in a finite set of points $S$ is said to be maximal or non-dominated if there is no other point $q$ in $S$ whose coordinates are all greater than or equal to the corresponding coordinates of $p$ . The maxima of a point set $S$ are all the maximal points of $S$ . The problem of finding all maximal points, sometimes called the problem of the maxima or maxima set problem, has been studied as a variant of the convex hull and orthogonal convex hull problems. It is equivalent to finding the Pareto frontier of a collection of points, and was called the floating-currency problem by Herbert Freeman based on an application involving comparing the relative wealth of individuals with different holdings of multiple currencies.

In the design and analysis of algorithms for combinatorial optimization, parametric search is a technique invented by Nimrod Megiddo for transforming a decision algorithm into an optimization algorithm. It is frequently used for solving optimization problems in computational geometry.

In computational geometry, a greedy geometric spanner is an undirected graph whose distances approximate the Euclidean distances among a finite set of points in a Euclidean space. The vertices of the graph represent these points. The edges of the spanner are selected by a greedy algorithm that includes an edge whenever its two endpoints are not connected by a short path of shorter edges. The greedy spanner was first described in the PhD thesis of Gautam Das and conference paper and subsequent journal paper by Ingo Althöfer et al. These sources also credited Marshall Bern (unpublished) with the independent discovery of the same construction.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[sh-1] Shamos, Michael Ian; Hoey, Dan (1975). "Closest-point problems". 16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975. IEEE Computer Society. pp. 151–162. doi:10.1109/SFCS.1975.8.

[2] Rabin, M. (1976). "Probabilistic algorithms". Algorithms and Complexity: Recent Results and New Directions. Academic Press. pp. 21–39. As cited by Khuller & Matias (1995).

[km-3] 1 2 Khuller, Samir; Matias, Yossi (1995). "A simple randomized sieve algorithm for the closest-pair problem". Information and Computation . 118 (1): 34–37. doi: 10.1006/inco.1995.1049 . MR 1329236. S2CID 206566076.

[rl-4] 1 2 Lipton, Richard (24 September 2011). "Rabin Flips a Coin". Gödel's Lost Letter and P=NP.

[fh-5] Fortune, Steve; Hopcroft, John (1979). "A note on Rabin's nearest-neighbor algorithm". Information Processing Letters . 8 (1): 20–23. doi:10.1016/0020-0190(79)90085-1. hdl: 1813/7460 . MR 0515507.

[6] Clarkson, Kenneth L. (1983). "Fast algorithms for the all nearest neighbors problem". 24th Annual Symposium on Foundations of Computer Science, Tucson, Arizona, USA, 7-9 November 1983. IEEE Computer Society. pp. 226–232. doi:10.1109/SFCS.1983.16.

[7] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "33.4: Finding the closest pair of points". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 957–961. ISBN 0-262-03293-7.

[8] Kleinberg, Jon M.; Tardos, Éva (2006). "5.4 Finding the closest pair of points". Algorithm Design. Addison-Wesley. pp. 225–231. ISBN 978-0-321-37291-8.

[9] Golin, Mordecai; Raman, Rajeev; Schwarz, Christian; Smid, Michiel (1998). "Randomized data structures for the dynamic closest-pair problem" (PDF). SIAM Journal on Computing . 27 (4): 1036–1072. doi:10.1137/S0097539794277718. MR 1622005. S2CID 1242364.

[10] Bespamyatnikh, S. N. (1998). "An optimal algorithm for closest-pair maintenance". Discrete & Computational Geometry . 19 (2): 175–195. doi: 10.1007/PL00009340 . MR 1600047.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]