Prim's algorithm

Last updated
A demo for Prim's algorithm based on Euclidean distance PrimAlgDemo.gif
A demo for Prim's algorithm based on Euclidean distance

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

Contents

The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník [1] and later rediscovered and republished by computer scientists Robert C. Prim in 1957 [2] and Edsger W. Dijkstra in 1959. [3] Therefore, it is also sometimes called the Jarník's algorithm, [4] Prim–Jarník algorithm, [5] Prim–Dijkstra algorithm [6] or the DJP algorithm. [7]

Other well-known algorithms for this problem include Kruskal's algorithm and Borůvka's algorithm. [8] These algorithms find the minimum spanning forest in a possibly disconnected graph; in contrast, the most basic form of Prim's algorithm only finds minimum spanning trees in connected graphs. However, running Prim's algorithm separately for each connected component of the graph, it can also be used to find the minimum spanning forest. [9] In terms of their asymptotic time complexity, these three algorithms are equally fast for sparse graphs, but slower than other more sophisticated algorithms. [7] [6] However, for graphs that are sufficiently dense, Prim's algorithm can be made to run in linear time, meeting or improving the time bounds for other algorithms. [10]

Prim's algorithm starting at vertex A. In the third step, edges BD and AB both have weight 2, so BD is chosen arbitrarily. After that step, AB is no longer a candidate for addition to the tree because it links two nodes that are already in the tree. Prim's algorithm.svg
Prim's algorithm starting at vertex A. In the third step, edges BD and AB both have weight 2, so BD is chosen arbitrarily. After that step, AB is no longer a candidate for addition to the tree because it links two nodes that are already in the tree.

Description

The algorithm may informally be described as performing the following steps:

  1. Initialize a tree with a single vertex, chosen arbitrarily from the graph.
  2. Grow the tree by one edge: Of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and transfer it to the tree.
  3. Repeat step 2 (until all vertices are in the tree).

In more detail, it may be implemented following the pseudocode below.

  1. Associate with each vertex v of the graph a number C[v] (the cheapest cost of a connection to v) and an edge E[v] (the edge providing that cheapest connection). To initialize these values, set all values of C[v] to +∞ (or to any number larger than the maximum edge weight) and set each E[v] to a special flag value indicating that there is no edge connecting v to earlier vertices.
  2. Initialize an empty forest F and a set Q of vertices that have not yet been included in F (initially, all vertices).
  3. Repeat the following steps until Q is empty:
    1. Find and remove a vertex v from Q having the minimum possible value of C[v]
    2. Add v to F
    3. Loop over the edges vw connecting v to other vertices w. For each such edge, if w still belongs to Q and vw has smaller weight than C[w], perform the following steps:
      1. Set C[w] to the cost of edge vw
      2. Set E[w] to point to edge vw.
  4. Return F, which specifically includes the corresponding edges in E

As described above, the starting vertex for the algorithm will be chosen arbitrarily, because the first iteration of the main loop of the algorithm will have a set of vertices in Q that all have equal weights, and the algorithm will automatically start a new tree in F when it completes a spanning tree of each connected component of the input graph. The algorithm may be modified to start with any particular vertex s by setting C[s] to be a number smaller than the other values of C (for instance, zero), and it may be modified to only find a single spanning tree rather than an entire spanning forest (matching more closely the informal description) by stopping whenever it encounters another vertex flagged as having no associated edge.

Different variations of the algorithm differ from each other in how the set Q is implemented: as a simple linked list or array of vertices, or as a more complicated priority queue data structure. This choice leads to differences in the time complexity of the algorithm. In general, a priority queue will be quicker at finding the vertex v with minimum cost, but will entail more expensive updates when the value of C[w] changes.

Time complexity

Prim's algorithm has many applications, such as in the generation of this maze, which applies Prim's algorithm to a randomly weighted grid graph.

The time complexity of Prim's algorithm depends on the data structures used for the graph and for ordering the edges by weight, which can be done using a priority queue. The following table shows the typical choices:

Minimum edge weight data structureTime complexity (total)
adjacency matrix, searching
binary heap and adjacency list
Fibonacci heap and adjacency list

A simple implementation of Prim's, using an adjacency matrix or an adjacency list graph representation and linearly searching an array of weights to find the minimum weight edge to add, requires O(|V|2) running time. However, this running time can be greatly improved by using heaps to implement finding minimum weight edges in the algorithm's inner loop.

A first improved version uses a heap to store all edges of the input graph, ordered by their weight. This leads to an O(|E| log |E|) worst-case running time. But storing vertices instead of edges can improve it still further. The heap should order the vertices by the smallest edge-weight that connects them to any vertex in the partially constructed minimum spanning tree (MST) (or infinity if no such edge exists). Every time a vertex v is chosen and added to the MST, a decrease-key operation is performed on all vertices w outside the partial MST such that v is connected to w, setting the key to the minimum of its previous value and the edge cost of (v,w).

Using a simple binary heap data structure, Prim's algorithm can now be shown to run in time O(|E| log |V|) where |E| is the number of edges and |V| is the number of vertices. Using a more sophisticated Fibonacci heap, this can be brought down to O(|E| + |V| log |V|), which is asymptotically faster when the graph is dense enough that |E| is ω(|V|), and linear time when |E| is at least |V| log |V|. For graphs of even greater density (having at least |V|c edges for some c > 1), Prim's algorithm can be made to run in linear time even more simply, by using a d-ary heap in place of a Fibonacci heap. [10] [11]

Demonstration of proof. In this case, the graph Y1 = Y - f + e is already equal to Y. In general, the process may need to be repeated. Prim's algorithm proof.svg
Demonstration of proof. In this case, the graph Y1 = Yf + e is already equal to Y. In general, the process may need to be repeated.

Proof of correctness

Let P be a connected, weighted graph. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since P is connected, there will always be a path to every vertex. The output Y of Prim's algorithm is a tree, because the edge and vertex added to tree Y are connected. Let Y1 be a minimum spanning tree of graph P. If Y1=Y then Y is a minimum spanning tree. Otherwise, let e be the first edge added during the construction of tree Y that is not in tree Y1, and V be the set of vertices connected by the edges added before edge e. Then one endpoint of edge e is in set V and the other is not. Since tree Y1 is a spanning tree of graph P, there is a path in tree Y1 joining the two endpoints. As one travels along the path, one must encounter an edge f joining a vertex in set V to one that is not in set V. Now, at the iteration when edge e was added to tree Y, edge f could also have been added and it would be added instead of edge e if its weight was less than e, and since edge f was not added, we conclude that

Let tree Y2 be the graph obtained by removing edge f from and adding edge e to tree Y1. It is easy to show that tree Y2 is connected, has the same number of edges as tree Y1, and the total weights of its edges is not larger than that of tree Y1, therefore it is also a minimum spanning tree of graph P and it contains edge e and all the edges added before it during the construction of set V. Repeat the steps above and we will eventually obtain a minimum spanning tree of graph P that is identical to tree Y. This shows Y is a minimum spanning tree. The minimum spanning tree allows for the first subset of the sub-region to be expanded into a larger subset X, which we assume to be the minimum.

Parallel algorithm

The adjacency matrix distributed between multiple processors for parallel Prim's algorithm. In each iteration of the algorithm, every processor updates its part of C by inspecting the row of the newly inserted vertex in its set of columns in the adjacency matrix. The results are then collected and the next vertex to include in the MST is selected globally. Distributed adjacency matrix for parallel prim.png
The adjacency matrix distributed between multiple processors for parallel Prim's algorithm. In each iteration of the algorithm, every processor updates its part of C by inspecting the row of the newly inserted vertex in its set of columns in the adjacency matrix. The results are then collected and the next vertex to include in the MST is selected globally.

The main loop of Prim's algorithm is inherently sequential and thus not parallelizable. However, the inner loop, which determines the next edge of minimum weight that does not form a cycle, can be parallelized by dividing the vertices and edges between the available processors. [12] The following pseudocode demonstrates this.

  1. Assign each processor a set of consecutive vertices of length .
  2. Create C, E, F, and Q as in the sequential algorithm and divide C, E, as well as the graph between all processors such that each processor holds the incoming edges to its set of vertices. Let , denote the parts of C, E stored on processor .
  3. Repeat the following steps until Q is empty:
    1. On every processor: find the vertex having the minimum value in [] (local solution).
    2. Min-reduce the local solutions to find the vertex v having the minimum possible value of C[v] (global solution).
    3. Broadcast the selected node to every processor.
    4. Add v to F and, if E[v] is not the special flag value, also add E[v] to F.
    5. On every processor: update and as in the sequential algorithm.
  4. Return F

This algorithm can generally be implemented on distributed machines [12] as well as on shared memory machines. [13] The running time is , assuming that the reduce and broadcast operations can be performed in . [12] A variant of Prim's algorithm for shared memory machines, in which Prim's sequential algorithm is being run in parallel, starting from different vertices, has also been explored. [14] It should, however, be noted that more sophisticated algorithms exist to solve the distributed minimum spanning tree problem in a more efficient manner.

See also

Related Research Articles

<span class="mw-page-title-main">Minimum spanning tree</span> Least-weight tree connecting graph vertices

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Kruskal's algorithm</span> Minimum spanning forest algorithm that greedily adds edges

Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a greedy algorithm that in each step adds to the forest the lowest-weight edge that will not form a cycle. The key steps of the algorithm are sorting and the use of a disjoint-set data structure to detect cycles. Its running time is dominated by the time to sort all of the graph edges by their weight.

<span class="mw-page-title-main">Assignment problem</span> Combinatorial optimization problem

The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows:

<span class="mw-page-title-main">Borůvka's algorithm</span> Method for finding minimum spanning trees

Borůvka's algorithm is a greedy algorithm for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected.

<span class="mw-page-title-main">Spanning tree</span> Tree which includes all vertices of a graph

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

<span class="mw-page-title-main">Steiner tree problem</span> On short connecting networks with added vertices

In combinatorial mathematics, the Steiner tree problem, or minimum Steiner tree problem, named after Jakob Steiner, is an umbrella term for a class of problems in combinatorial optimization. While Steiner tree problems may be formulated in a number of settings, they all require an optimal interconnect for a given set of objects and a predefined objective function. One well-known variant, which is often used synonymously with the term Steiner tree problem, is the Steiner tree problem in graphs. Given an undirected graph with non-negative edge weights and a subset of vertices, usually referred to as terminals, the Steiner tree problem in graphs requires a tree of minimum weight that contains all terminals and minimizes the total weight of its edges. Further well-known variants are the Euclidean Steiner tree problem and the rectilinear minimum Steiner tree problem.

<span class="mw-page-title-main">Euclidean minimum spanning tree</span> Shortest network connecting points

A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.

Johnson's algorithm is a way to find the shortest paths between all pairs of vertices in an edge-weighted directed graph. It allows some of the edge weights to be negative numbers, but no negative-weight cycles may exist. It works by using the Bellman–Ford algorithm to compute a transformation of the input graph that removes all negative weights, allowing Dijkstra's algorithm to be used on the transformed graph. It is named after Donald B. Johnson, who first published the technique in 1977.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

The Christofides algorithm or Christofides–Serdyukov algorithm is an algorithm for finding approximate solutions to the travelling salesman problem, on instances where the distances form a metric space . It is an approximation algorithm that guarantees that its solutions will be within a factor of 3/2 of the optimal solution length, and is named after Nicos Christofides and Anatoliy I. Serdyukov ; the latter discovered it independently in 1976.

<span class="mw-page-title-main">Distributed minimum spanning tree</span>

The distributed minimum spanning tree (MST) problem involves the construction of a minimum spanning tree by a distributed algorithm, in a network where nodes communicate by message passing. It is radically different from the classical sequential problem, although the most basic approach resembles Borůvka's algorithm. One important application of this problem is to find a tree that can be used for broadcasting. In particular, if the cost for a message to pass through an edge in a graph is significant, an MST can minimize the total cost for a source process to communicate with all the other processes in the network.

The reverse-delete algorithm is an algorithm in graph theory used to obtain a minimum spanning tree from a given connected, edge-weighted graph. It first appeared in Kruskal (1956), but it should not be confused with Kruskal's algorithm which appears in the same paper. If the graph is disconnected, this algorithm will find a minimum spanning tree for each disconnected part of the graph. The set of these minimum spanning trees is called a minimum spanning forest, which contains every vertex in the graph.

<span class="mw-page-title-main">Karger's algorithm</span> Randomized algorithm for minimum cuts

In computer science and graph theory, Karger's algorithm is a randomized algorithm to compute a minimum cut of a connected graph. It was invented by David Karger and first published in 1993.

In graph theory, Edmonds' algorithm or Chu–Liu/Edmonds' algorithm is an algorithm for finding a spanning arborescence of minimum weight . It is the directed analog of the minimum spanning tree problem. The algorithm was proposed independently first by Yoeng-Jin Chu and Tseng-Hong Liu (1965) and then by Jack Edmonds (1967).

In theoretical computer science and network routing, Suurballe's algorithm is an algorithm for finding two disjoint paths in a nonnegatively-weighted directed graph, so that both paths connect the same pair of vertices and have minimum total length. The algorithm was conceived by John W. Suurballe and published in 1974. The main idea of Suurballe's algorithm is to use Dijkstra's algorithm to find one path, to modify the weights of the graph edges, and then to run Dijkstra's algorithm a second time. The output of the algorithm is formed by combining these two paths, discarding edges that are traversed in opposite directions by the paths, and using the remaining edges to form the two paths to return as the output. The modification to the weights is similar to the weight modification in Johnson's algorithm, and preserves the non-negativity of the weights while allowing the second instance of Dijkstra's algorithm to find the correct second path.

<span class="mw-page-title-main">Top tree</span> Data structure

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In mathematics, a minimum bottleneck spanning tree (MBST) in an undirected graph is a spanning tree in which the most expensive edge is as cheap as possible. A bottleneck edge is the highest weighted edge in a spanning tree. A spanning tree is a minimum bottleneck spanning tree if the graph does not contain a spanning tree with a smaller bottleneck edge weight. For a directed graph, a similar problem is known as Minimum Bottleneck Spanning Arborescence (MBSA).

<span class="mw-page-title-main">Stoer–Wagner algorithm</span> Recursive algorithm in graph theory

In graph theory, the Stoer–Wagner algorithm is a recursive algorithm to solve the minimum cut problem in undirected weighted graphs with non-negative weights. It was proposed by Mechthild Stoer and Frank Wagner in 1995. The essential idea of this algorithm is to shrink the graph by merging the most intensive vertices, until the graph only contains two combined vertex sets. At each phase, the algorithm finds the minimum - cut for two vertices and chosen at its will. Then the algorithm shrinks the edge between and to search for non - cuts. The minimum cut found in all phases will be the minimum weighted cut of the graph.

In graph theory a minimum spanning tree (MST) of a graph with and is a tree subgraph of that contains all of its vertices and is of minimum weight.

References

  1. Jarník, V. (1930), "O jistém problému minimálním" [About a certain minimal problem], Práce Moravské Přírodovědecké Společnosti (in Czech), 6 (4): 57–63, hdl:10338.dmlcz/500726 .
  2. Prim, R. C. (November 1957), "Shortest connection networks And some generalizations", Bell System Technical Journal , 36 (6): 1389–1401, Bibcode:1957BSTJ...36.1389P, doi:10.1002/j.1538-7305.1957.tb01515.x .
  3. Dijkstra, E. W. (December 1959), "A note on two problems in connexion with graphs" (PDF), Numerische Mathematik, 1 (1): 269–271, CiteSeerX   10.1.1.165.7577 , doi:10.1007/BF01386390, S2CID   123284777 .
  4. Sedgewick, Robert; Wayne, Kevin Daniel (2011), Algorithms (4th ed.), Addison-Wesley, p. 628, ISBN   978-0-321-57351-3 .
  5. Rosen, Kenneth (2011), Discrete Mathematics and Its Applications (7th ed.), McGraw-Hill Science, p. 798.
  6. 1 2 Cheriton, David; Tarjan, Robert Endre (1976), "Finding minimum spanning trees", SIAM Journal on Computing , 5 (4): 724–742, doi:10.1137/0205051, MR   0446458 .
  7. 1 2 Pettie, Seth; Ramachandran, Vijaya (January 2002), "An optimal minimum spanning tree algorithm" (PDF), Journal of the ACM, 49 (1): 16–34, CiteSeerX   10.1.1.110.7670 , doi:10.1145/505241.505243, MR   2148431, S2CID   5362916 .
  8. Tarjan, Robert Endre (1983), "Chapter 6. Minimum spanning trees. 6.2. Three classical algorithms", Data Structures and Network Algorithms, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 44, Society for Industrial and Applied Mathematics, pp. 72–77.
  9. Kepner, Jeremy; Gilbert, John (2011), Graph Algorithms in the Language of Linear Algebra, Software, Environments, and Tools, vol. 22, Society for Industrial and Applied Mathematics, p. 55, ISBN   9780898719901 .
  10. 1 2 Tarjan (1983), p. 77.
  11. Johnson, Donald B. (December 1975), "Priority queues with update and finding minimum spanning trees", Information Processing Letters, 4 (3): 53–57, doi:10.1016/0020-0190(75)90001-0 .
  12. 1 2 3 Grama, Ananth; Gupta, Anshul; Karypis, George; Kumar, Vipin (2003), Introduction to Parallel Computing, Addison-Wesley, pp. 444–446, ISBN   978-0201648652
  13. Quinn, Michael J.; Deo, Narsingh (1984), "Parallel graph algorithms", ACM Computing Surveys, 16 (3): 319–348, doi: 10.1145/2514.2515 , S2CID   6833839
  14. Setia, Rohit (2009), "A new parallel algorithm for minimum spanning tree problem" (PDF), Proc. International Conference on High Performance Computing (HiPC)