A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.
The edges of the minimum spanning tree meet at angles of at least 60°, at most six to a vertex. In higher dimensions, the number of edges per vertex is bounded by the kissing number of tangent unit spheres. The total length of the edges, for points in a unit square, is at most proportional to the square root of the number of points. Each edge lies in an empty region of the plane, and these regions can be used to prove that the Euclidean minimum spanning tree is a subgraph of other geometric graphs including the relative neighborhood graph and Delaunay triangulation. By constructing the Delaunay triangulation and then applying a graph minimum spanning tree algorithm, the minimum spanning tree of given planar points may be found in time , as expressed in big O notation. This is optimal in some models of computation, although faster randomized algorithms exist for points with integer coordinates. For points in higher dimensions, finding an optimal algorithm remains an open problem.
A Euclidean minimum spanning tree, for a set of points in the Euclidean plane or Euclidean space, is a system of line segments, having only the given points as their endpoints, whose union includes all of the points in a connected set, and which has the minimum possible total length of any such system. Such a network cannot contain a polygonal ring of segments; if one existed, the network could be shortened by removing an edge of the polygon. Therefore, the minimum-length network forms a tree. This observation leads to the equivalent definition that a Euclidean minimum spanning tree is a tree of line segments between pairs of the given points, of minimum total length. [1] The same tree may also be described as a minimum spanning tree of a weighted complete graph, having the given points as its vertices and the distances between points as edge weights. [2] The same points may have more than one minimum spanning tree. For instance, for the vertices of a regular polygon, removing any edge of the polygon produces a minimum spanning tree. [3]
Publications on the Euclidean minimum spanning tree commonly abbreviate it as "EMST". [4] [5] They may also be called "geometric minimum spanning trees", [6] [7] but that term may be used more generally for geometric spaces with non-Euclidean distances, such as Lp spaces. [8] When the context of Euclidean point sets is clear, they may be called simply "minimum spanning trees". [9] [10] [11]
Several other standard geometric networks are closely related to the Euclidean minimum spanning tree:
Whenever two edges of a Euclidean minimum spanning tree meet at a vertex, they must form an angle of 60° or more, with equality only when they form two sides of an equilateral triangle. This is because, for two edges forming any sharper angle, one of the two edges could be replaced by the third, shorter edge of the triangle they form, forming a tree with smaller total length. [14] In comparison, the Steiner tree problem has a stronger angle bound: an optimal Steiner tree has all angles at least 120°. [12]
The same 60° angle bound also occurs in the kissing number problem, of finding the maximum number of unit spheres in Euclidean space that can be tangent to a central unit sphere without any two spheres intersecting (beyond a point of tangency). The center points of these spheres have a minimum spanning tree in the form of a star, with the central point adjacent to all other points. Conversely, for any vertex of any minimum spanning tree, one can construct non-overlapping unit spheres centered at and at points two units along each of its edges, with a tangency for each neighbor of . Therefore, in -dimensional space the maximum possible degree of a vertex (the number of spanning tree edges connected to it) equals the kissing number of spheres in dimensions. [15] Planar minimum spanning trees have degree at most six, and when a tree has degree six there is always another minimum spanning tree with maximum degree five. [7] Three-dimensional minimum spanning trees have degree at most twelve. [15] The only higher dimensions in which the exact value of the kissing number is known are four, eight, and 24 dimensions. [16]
For points generated at random from a given continuous distribution, the minimum spanning tree is almost surely unique. The numbers of vertices of any given degree converge, for large number of vertices, to a constant times that number of vertices. The values of these constants depend on the degree and the distribution. However, even for simple cases—such as the number of leaves for points uniformly distributed in a unit square—their precise values are not known. [17]
For any edge of any Euclidean minimum spanning tree, the lens (or vesica piscis) formed by intersecting the two circles with as their radii cannot have any other given vertex in its interior. Put another way, if any tree has an edge whose lens contains a third point , then it is not of minimum length. For, by the geometry of the two circles, would be closer to both and than they are to each other. If edge were removed from the tree, would remain connected to one of and , but not the other. Replacing the removed edge by or (whichever of these two edges reconnects to the vertex from which it was disconnected) would produce a shorter tree. [12]
For any edge of any Euclidean minimum spanning tree, the rhombus with angles of 60° and 120°, having as its long diagonal, is disjoint from the rhombi formed analogously by all other edges. Two edges sharing an endpoint cannot have overlapping rhombi, because that would imply an edge angle sharper than 60°, and two disjoint edges cannot have overlapping rhombi; if they did, the longer of the two edges could be replaced by a shorter edge among the same four vertices. [12]
Certain geometric graphs have definitions involving empty regions in point sets, from which it follows that they contain every edge that can be part of a Euclidean minimum spanning tree. These include:
Because the empty-region criteria for these graphs are progressively weaker, these graphs form an ordered sequence of subgraphs. That is, using "⊆" to denote the subset relationship among their edges, these graphs have the relations:
Another graph guaranteed to contain the minimum spanning tree is the Yao graph, determined for points in the plane by dividing the plane around each point into six 60° wedges and connecting each point to the nearest neighbor in each wedge. The resulting graph contains the relative neighborhood graph, because two vertices with an empty lens must be the nearest neighbors to each other in their wedges. As with many of the other geometric graphs above, this definition can be generalized to higher dimensions, and (unlike the Delaunay triangulation) its generalizations always include a linear number of edges. [20] [21]
For points in the unit square (or any other fixed shape), the total length of the minimum spanning tree edges is . Some sets of points, such as points evenly spaced in a grid, attain this bound. [12] For points in a unit hypercube in -dimensional space, the corresponding bound is . [22] The same bound applies to the expected total length of the minimum spanning tree for points chosen uniformly and independently from a unit square or unit hypercube. [23] Returning to the unit square, the sum of squared edge lengths of the minimum spanning tree is . This bound follows from the observation that the edges have disjoint rhombi, with area proportional to the edge lengths squared. The bound on total length follows by application of the Cauchy–Schwarz inequality. [12]
Another interpretation of these results is that the average edge length for any set of points in a unit square is , at most proportional to the spacing of points in a regular grid; and that for random points in a unit square the average length is proportional to . However, in the random case, with high probability the longest edge has length approximately longer than the average by a non-constant factor. With high probability, the longest edge forms a leaf of the spanning tree, and connects a point far from all the other points to its nearest neighbor. For large numbers of points, the distribution of the longest edge length around its expected value converges to a Gumbel distribution. [24]
Any geometric spanner, a subgraph of a complete geometric graph whose shortest paths approximate the Euclidean distance, must have total edge length at least as large as the minimum spanning tree, and one of the standard quality measures for a geometric spanner is the ratio between its total length and of the minimum spanning tree for the same points. Several methods for constructing spanners, such as the greedy geometric spanner, achieve a constant bound for this ratio. [13] It has been conjectured that the Steiner ratio, the largest possible ratio between the total length of a minimum spanning tree and Steiner tree for the same set of points in the plane, is , the ratio for three points in an equilateral triangle. [12]
If every edge of a Euclidean minimum spanning tree is subdivided, by adding a new point at its midpoint, then the resulting tree is still a minimum spanning tree of the augmented point set. Repeating this subdivision process allows a Euclidean minimum spanning tree to be subdivided arbitrarily finely. However, subdividing only some of the edges, or subdividing the edges at points other than the midpoint, may produce a point set for which the subdivided tree is not the minimum spanning tree. [25]
For points in any dimension, the minimum spanning tree can be constructed in time by constructing a complete graph with an edge between every pair of points, weighted by Euclidean distance, and then applying a graph minimum spanning tree algorithm such as the Prim–Dijkstra–Jarník algorithm or Borůvka's algorithm on it. These algorithms can be made to take time on complete graphs, unlike another common choice, Kruskal's algorithm, which is slower because it involves sorting all distances. [13] For points in low-dimensional spaces, the problem may be solved more quickly, as detailed below.
Computing Euclidean distances involves a square root calculation. In any comparison of edge weights, comparing the squares of the Euclidean distances, instead of the distances themselves, yields the same ordering, and so does not change the rest of the tree's computation. This shortcut speeds up calculation and allows a minimum spanning tree for points with integer coordinates to be constructed using only integer arithmetic. [20]
A faster approach to finding the minimum spanning tree of planar points uses the property that it is a subgraph of the Delaunay triangulation:
The result is an algorithm taking time, [2] optimal in certain models of computation (see below).
If the input coordinates are integers and can be used as array indices, faster algorithms are possible: the Delaunay triangulation can be constructed by a randomized algorithm in expected time. [26] Additionally, since the Delaunay triangulation is a planar graph, its minimum spanning tree can be found in linear time by a variant of Borůvka's algorithm that removes all but the cheapest edge between each pair of components after each stage of the algorithm. [13] [27] Therefore, the total expected time for this algorithm is . [26] In the other direction, the Delaunay triangulation can be constructed from the minimum spanning tree in the near-linear time bound , where denotes the iterated logarithm. [28]
The problem can also be generalized to points in the -dimensional space . In higher dimensions, the connectivity determined by the Delaunay triangulation (which, likewise, partitions the convex hull into -dimensional simplices) contains the minimum spanning tree; however, the triangulation might contain the complete graph. [4] Therefore, finding the Euclidean minimum spanning tree as a spanning tree of the complete graph or as a spanning tree of the Delaunay triangulation both take time. For three dimensions the minimum spanning tree can be found in time , and in any greater dimension, in time for any —faster than the quadratic time bound for the complete graph and Delaunay triangulation algorithms. [4]
The optimal time complexity for higher-dimensional minimum spanning trees remains unknown, [29] but is closely related to the complexity of computing bichromatic closest pairs. In the bichromatic closest pair problem, the input is a set of points, given two different colors (say, red and blue). The output is a pair of a red point and a blue point with the minimum possible distance. This pair always forms one of the edges in the minimum spanning tree. Therefore, the bichromatic closest pair problem can be solved in the amount of time that it takes to construct a minimum spanning tree and scan its edges for the shortest red–blue edge. Conversely, for any red–blue coloring of any subset of a given set of points, the bichromatic closest pair produces one edge of the minimum spanning tree of the subset. By carefully choosing a sequence of colorings of subsets, and finding the bichromatic closest pair of each subproblem, the minimum spanning tree may be found in time proportional to the optimal time for finding bichromatic closest pairs for the same number of points, whatever that optimal time turns out to be. [4] [11]
For uniformly random point sets in any bounded dimension, the Yao graph [20] or Delaunay triangulation have linear expected numbers of edges, are guaranteed to contain the minimum spanning tree, and can be constructed in linear expected time. [21] [6] [30] From these graphs, the minimum spanning tree itself may be constructed in linear time, by using a randomized linear time algorithm for graph minimum spanning trees. [31] However, the poor performance of these methods on inputs coming from clustered data has led algorithm engineering researchers to develop methods with a somewhat slower time bound, for random inputs or inputs whose distances and clustering resemble those of random data, while exhibiting better performance on real-world data. [8] [32] [5]
A well-separated pair decomposition is a family of pairs of subsets of the given points, so that every pair of points belong to one of these pairs of subsets, and so that all pairs of points coming from the same pair of subsets have approximately the same length. It is possible to find a well-separated pair decomposition with a linear number of subsets, and a representative pair of points for each subset, in time . The minimum spanning tree of the graph formed by these representative pairs is then an approximation to the minimum spanning tree. Using these ideas, a -approximation to the minimum spanning tree may be found in time, for constant . More precisely, by choosing each representative pair to approximate the closest pair in its equivalence class, and carefully varying the quality of this approximation for different pairs, the dependence on in the time bound can be given as for any fixed dimension. [33]
The Euclidean minimum spanning tree has been generalized in many different ways to systems of moving or changing points:
An asymptotic lower bound of of the Euclidean minimum spanning tree problem can be established in restricted models of computation. These include the algebraic decision tree and algebraic computation tree models, in which the algorithm has access to the input points only through certain restricted primitives that perform simple algebraic computations on their coordinates. In these models, the closest pair of points problem requires time, but the closest pair is necessarily an edge of the minimum spanning tree, so the minimum spanning tree also requires this much time. Therefore, algorithms for constructing the planar minimum spanning tree in time within this model, for instance by using the Delaunay triangulation, are optimal. [46] However, these lower bounds do not apply to models of computation with integer point coordinates, in which bitwise operations and table indexing operations on those coordinates are permitted. In these models, faster algorithms are possible, as described above. [26]
An obvious application of Euclidean minimum spanning trees is to find the cheapest network of wires or pipes to connect a set of places, assuming the links cost a fixed amount per unit length. The first publications on minimum spanning trees more generally concerned a geographic version of the problem, involving the design of an electrical grid for southern Moravia, [47] and an application to minimizing wire lengths in circuits was described in 1957 by Loberman and Weinberger. [48]
Minimum spanning trees are closely related to single-linkage clustering, one of several methods for hierarchical clustering. The edges of a minimum spanning tree, sorted by their length, give the order in which to merge clusters into larger clusters in this clustering method. Once these edges have been found, by any algorithm, they may be used to construct the single-linkage clustering in time . [1] Although the long thin cluster shapes produced by single-linkage clustering can be a bad fit for certain types of data, such as mixtures of Gaussian distributions, it can be a good choice in applications where the clusters themselves are expected to have long thin shapes, such as in modeling the dark matter halos of galaxies. [5] In geographic information science, several researcher groups have used minimum spanning trees of the centroids of buildings to identify meaningful clusters of buildings, for instance by removing edges identified in some other way as inconsistent. [49]
Minimum spanning trees have also been used to infer the shape of curves in the plane, given points sampled along the curve. For a smooth curve, sampled more finely than its local feature size, the minimum spanning tree will form a path connecting consecutive points along the curve. More generally, similar methods can recognize curves drawn in a dotted or dashed style rather than as a single connected set. Applications of this curve-finding technique include particle physics, in automatically identifying the tracks left by particles in a bubble chamber. [50] More sophisticated versions of this idea can find curves from a cloud of noisy sample points that roughly follows the curve outline, by using the topology of the spanning tree to guide a moving least squares method. [51]
Another application of minimum spanning trees is a constant-factor approximation algorithm for the Euclidean traveling salesman problem, the problem of finding the shortest polygonalization of a point set. Walking around the boundary of the minimum spanning tree can approximate the optimal traveling salesman tour within a factor of two of the optimal length. [2] However, more accurate polynomial-time approximation schemes are known for this problem. [52] In wireless ad hoc networks, broadcasting messages along paths in a minimum spanning tree can be an accurate approximation to the minimum-energy broadcast routing, which is, again, hard to compute exactly. [53] [54] [55] [56]
The realization problem for Euclidean minimum spanning trees takes an abstract tree as input and seeks a geometric location for each vertex of the tree (in a space of some fixed dimension), such that the given tree equals the minimum spanning tree of those points. Not every abstract tree has such a realization; for instance, the tree must obey the kissing number bound on the degree of each vertex. Additional restrictions exist; for instance, it is not possible for a planar minimum spanning tree to have a degree-six vertex adjacent to a vertex of degree five or six. [7] Determining whether a two-dimensional realization exists is NP-hard. However, the proof of hardness depends on the fact that degree-six vertices in a tree have a very restricted set of realizations: the neighbors of such a vertex must be placed on the vertices of a regular hexagon centered at that vertex. [57] Indeed, for trees of maximum degree five, a planar realization always exists. [7] Similarly, for trees of maximum degree ten, a three-dimensional realization always exists. [10] For these realizations, some trees may require edges of exponential length and bounding boxes of exponential area relative to the length of their shortest edge. [58] Trees of maximum degree four have smaller planar realizations, with polynomially bounded edge lengths and bounding boxes. [9]
In computational geometry, a Delaunay triangulation or Delone triangulation of a set of points in the plane subdivides their convex hull into triangles whose circumcircles do not contain any of the points. This maximizes the size of the smallest angle in any of the triangles, and tends to avoid sliver triangles.
In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.
In combinatorial mathematics, the Steiner tree problem, or minimum Steiner tree problem, named after Jakob Steiner, is an umbrella term for a class of problems in combinatorial optimization. While Steiner tree problems may be formulated in a number of settings, they all require an optimal interconnect for a given set of objects and a predefined objective function. One well-known variant, which is often used synonymously with the term Steiner tree problem, is the Steiner tree problem in graphs. Given an undirected graph with non-negative edge weights and a subset of vertices, usually referred to as terminals, the Steiner tree problem in graphs requires a tree of minimum weight that contains all terminals and minimizes the total weight of its edges. Further well-known variants are the Euclidean Steiner tree problem and the rectilinear minimum Steiner tree problem.
In computational geometry, polygon triangulation is the partition of a polygonal area P into a set of triangles, i.e., finding a set of triangles with pairwise non-intersecting interiors whose union is P.
A triangulation of a set of points in the Euclidean space is a simplicial complex that covers the convex hull of , and whose vertices belong to . In the plane, triangulations are made up of triangles, together with their edges and vertices. Some authors require that all the points of are vertices of its triangulations. In this case, a triangulation of a set of points in the plane can alternatively be defined as a maximal set of non-crossing edges between points of . In the plane, triangulations are special cases of planar straight-line graphs.
In discrete geometry, a -set of a finite point set in the Euclidean plane is a subset of elements of that can be strictly separated from the remaining points by a line. More generally, in Euclidean space of higher dimensions, a -set of a finite point set is a subset of elements that can be separated from the remaining points by a hyperplane. In particular, when , the line or hyperplane that separates a -set from the rest of is a halving line or halving plane.
Proximity problems is a class of problems in computational geometry which involve estimation of distances between geometric objects.
A geometric spanner or a t-spanner graph or a t-spanner was initially introduced as a weighted graph over a set of points as its vertices for which there is a t-path between any pair of vertices for a fixed parameter t. A t-path is defined as a path through the graph with weight at most t times the spatial distance between its endpoints. The parameter t is called the stretch factor or dilation factor of the spanner.
The nearest neighbor graph (NNG) is a directed graph defined for a set of points in a metric space, such as the Euclidean distance in the plane. The NNG has a vertex for each point, and a directed edge from p to q whenever q is a nearest neighbor of p, a point whose distance from p is minimum among all the given points other than p itself.
In computational geometry, the relative neighborhood graph (RNG) is an undirected graph defined on a set of points in the Euclidean plane by connecting two points and by an edge whenever there does not exist a third point that is closer to both and than they are to each other. This graph was proposed by Godfried Toussaint in 1980 as a way of defining a structure from a set of points that would match human perceptions of the shape of the set.
In computational geometry and computer science, the minimum-weight triangulation problem is the problem of finding a triangulation of minimal total edge length. That is, an input polygon or the convex hull of an input point set must be subdivided into triangles that meet edge-to-edge and vertex-to-vertex, in such a way as to minimize the sum of the perimeters of the triangles. The problem is NP-hard for point set inputs, but may be approximated to any desired degree of accuracy. For polygon inputs, it may be solved exactly in polynomial time. The minimum weight triangulation has also sometimes been called the optimal triangulation.
In computational geometry, the Urquhart graph of a set of points in the plane, named after Roderick B. Urquhart, is obtained by removing the longest edge from each triangle in the Delaunay triangulation.
In computational geometry and geometric graph theory, a β-skeleton or beta skeleton is an undirected graph defined from a set of points in the Euclidean plane. Two points p and q are connected by an edge whenever all the angles prq are sharper than a threshold determined from the numerical parameter β.
In computational geometry, a constrained Delaunay triangulation is a generalization of the Delaunay triangulation that forces certain required segments into the triangulation as edges, unlike the Delaunay triangulation itself which is based purely on the position of a given set of vertices without regard to how they should be connected by edges. It can be computed efficiently and has applications in geographic information systems and in mesh generation.
In graph algorithms, the widest path problem is the problem of finding a path between two designated vertices in a weighted graph, maximizing the weight of the minimum-weight edge in the path. The widest path problem is also known as the maximum capacity path problem. It is possible to adapt most shortest path algorithms to compute widest paths, by modifying them to use the bottleneck distance instead of path length. However, in many cases even faster algorithms are possible.
In mathematics, a topological graph is a representation of a graph in the plane, where the vertices of the graph are represented by distinct points and the edges by Jordan arcs joining the corresponding pairs of points. The points representing the vertices of a graph and the arcs representing its edges are called the vertices and the edges of the topological graph. It is usually assumed that any two edges of a topological graph cross a finite number of times, no edge passes through a vertex different from its endpoints, and no two edges touch each other. A topological graph is also called a drawing of a graph.
In geometry, a partition of a polygon is a set of primitive units, which do not overlap and whose union equals the polygon. A polygon partition problem is a problem of finding a partition which is minimal in some sense, for example a partition with a smallest number of units or with units of smallest total side-length.
In geometric graph theory, a penny graph is a contact graph of unit circles. It is formed from a collection of unit circles that do not cross each other, by creating a vertex for each circle and an edge for every pair of tangent circles. The circles can be represented physically by pennies, arranged without overlapping on a flat surface, with a vertex for each penny and an edge for each two pennies that touch.
In computational geometry, a greedy geometric spanner is an undirected graph whose distances approximate the Euclidean distances among a finite set of points in a Euclidean space. The vertices of the graph represent these points. The edges of the spanner are selected by a greedy algorithm that includes an edge whenever its two endpoints are not connected by a short path of shorter edges. The greedy spanner was first described in the PhD thesis of Gautam Das and conference paper and subsequent journal paper by Ingo Althöfer et al. These sources also credited Marshall Bern (unpublished) with the independent discovery of the same construction.
In computational geometry, a polygonalization of a finite set of points in the Euclidean plane is a simple polygon with the given points as its vertices. A polygonalization may also be called a polygonization, simple polygonalization, Hamiltonian polygon, non-crossing Hamiltonian cycle, or crossing-free straight-edge spanning cycle.