Kruskal's algorithm

Last updated
Kruskal's algorithm
KruskalDemo.gif
Animation of Kruskal's algorithm on a complete graph with weights based on Euclidean distance
Class Minimum spanning tree algorithm
Data structure Graph
Worst-case performance

Kruskal's algorithm [1] finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a greedy algorithm that in each step adds to the forest the lowest-weight edge that will not form a cycle. [2] The key steps of the algorithm are sorting and the use of a disjoint-set data structure to detect cycles. Its running time is dominated by the time to sort all of the graph edges by their weight.

Contents

A minimum spanning tree of a connected weighted graph is a connected subgraph, without cycles, for which the sum of the weights of all the edges in the subgraph is minimal. For a disconnected graph, a minimum spanning forest is composed of a minimum spanning tree for each connected component.

This algorithm was first published by Joseph Kruskal in 1956, [3] and was rediscovered soon afterward by Loberman & Weinberger (1957). [4] Other algorithms for this problem include Prim's algorithm, Borůvka's algorithm, and the reverse-delete algorithm.

Algorithm

The algorithm performs the following steps:

At the termination of the algorithm, the forest forms a minimum spanning forest of the graph. If the graph is connected, the forest has a single component and forms a minimum spanning tree.

Pseudocode

The following code is implemented with a disjoint-set data structure. It represents the forest F as a set of undirected edges, and uses the disjoint-set data structure to efficiently determine whether two vertices are part of the same tree.

algorithm Kruskal(G) is     F:= ∅     for each v in G.V do         MAKE-SET(v)     for each {u, v} in G.E ordered by weight({u, v}), increasing doif FIND-SET(u) ≠ FIND-SET(v) then             F := F ∪ { {u, v} }             UNION(FIND-SET(u), FIND-SET(v))     return F

Complexity

For a graph with E edges and V vertices, Kruskal's algorithm can be shown to run in time O(E log E) time, with simple data structures. Here, O expresses the time in big O notation, and log is a logarithm to any base (since inside O-notation logarithms to all bases are equivalent, because they are the same up to a constant factor). This time bound is often written instead as O(E log V), which is equivalent for graphs with no isolated vertices, because for these graphs V/2 ≤ E < V2 and the logarithms of V and E are again within a constant factor of each other.

To achieve this bound, first sort the edges by weight using a comparison sort in O(E log E) time. Once sorted, it is possible to loop through the edges in sorted order in constant time per edge. Next, use a disjoint-set data structure, with a set of vertices for each component, to keep track of which vertices are in which components. Creating this structure, with a separate set for each vertex, takes V operations and O(V) time. The final iteration through all edges performs two find operations and possibly one union operation per edge. These operations take amortized time O(α(V)) time per operation, giving worst-case total time O(Eα(V)) for this loop, where α is the extremely slowly growing inverse Ackermann function. This part of the time bound is much smaller than the time for the sorting step, so the total time for the algorithm can be simplified to the time for the sorting step.

In cases where the edges are already sorted, or where they have small enough integer weight to allow integer sorting algorithms such as counting sort or radix sort to sort them in linear time, the disjoint set operations are the slowest remaining part of the algorithm and the total time is O(Eα(V)).

Example

ImageDescription
Kruskal Algorithm 1.svg AD and CE are the shortest edges, with length 5, and AD has been arbitrarily chosen, so it is highlighted.
Kruskal Algorithm 2.svg CE is now the shortest edge that does not form a cycle, with length 5, so it is highlighted as the second edge.
Kruskal Algorithm 3.svg The next edge, DF with length 6, is highlighted using much the same method.
Kruskal Algorithm 4.svg The next-shortest edges are AB and BE, both with length 7. AB is chosen arbitrarily, and is highlighted. The edge BD has been highlighted in red, because there already exists a path (in green) between B and D, so it would form a cycle (ABD) if it were chosen.
Kruskal Algorithm 5.svg The process continues to highlight the next-smallest edge, BE with length 7. Many more edges are highlighted in red at this stage: BC because it would form the loop BCE, DE because it would form the loop DEBA, and FE because it would form FEBAD.
Kruskal Algorithm 6.svg Finally, the process finishes with the edge EG of length 9, and the minimum spanning tree is found.

Proof of correctness

The proof consists of two parts. First, it is proved that the algorithm produces a spanning tree. Second, it is proved that the constructed spanning tree is of minimal weight.

Spanning tree

Let be a connected, weighted graph and let be the subgraph of produced by the algorithm. cannot have a cycle, as by definition an edge is not added if it results in a cycle. cannot be disconnected, since the first encountered edge that joins two components of would have been added by the algorithm. Thus, is a spanning tree of .

Minimality

We show that the following proposition P is true by induction: If F is the set of edges chosen at any stage of the algorithm, then there is some minimum spanning tree that contains F and none of the edges rejected by the algorithm.

Parallel algorithm

Kruskal's algorithm is inherently sequential and hard to parallelize. It is, however, possible to perform the initial sorting of the edges in parallel or, alternatively, to use a parallel implementation of a binary heap to extract the minimum-weight edge in every iteration. [5] As parallel sorting is possible in time on processors, [6] the runtime of Kruskal's algorithm can be reduced to O(E α(V)), where α again is the inverse of the single-valued Ackermann function.

A variant of Kruskal's algorithm, named Filter-Kruskal, has been described by Osipov et al. [7] and is better suited for parallelization. The basic idea behind Filter-Kruskal is to partition the edges in a similar way to quicksort and filter out edges that connect vertices of the same tree to reduce the cost of sorting. The following pseudocode demonstrates this.

function filter_kruskal(G) isif |G.E| < kruskal_threshold:         return kruskal(G)     pivot = choose_random(G.E)     E, E> = partition(G.E, pivot)     A = filter_kruskal(E)     E> = filter(E>)     A = A ∪ filter_kruskal(E>)     return A  function partition(E, pivot) is     E = ∅, E> = ∅     foreach (u, v) in E doif weight(u, v) ≤ pivot then             E = E ∪ {(u, v)}         else             E> = E> ∪ {(u, v)}     return E, E>function filter(E) is     Ef = ∅     foreach (u, v) in E doif find_set(u) ≠ find_set(v) then             Ef = Ef ∪ {(u, v)}     return Ef

Filter-Kruskal lends itself better to parallelization as sorting, filtering, and partitioning can easily be performed in parallel by distributing the edges between the processors. [7]

Finally, other variants of a parallel implementation of Kruskal's algorithm have been explored. Examples include a scheme that uses helper threads to remove edges that are definitely not part of the MST in the background, [8] and a variant which runs the sequential algorithm on p subgraphs, then merges those subgraphs until only one, the final MST, remains. [9]

See also

Related Research Articles

<span class="mw-page-title-main">Minimum spanning tree</span> Least-weight tree connecting graph vertices

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

<span class="mw-page-title-main">Prim's algorithm</span> Method for finding minimum spanning trees

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

<span class="mw-page-title-main">Borůvka's algorithm</span> Method for finding minimum spanning trees

Borůvka's algorithm is a greedy algorithm for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

<span class="mw-page-title-main">Spanning tree</span> Tree which includes all vertices of a graph

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

In the mathematical theory of matroids, a graphic matroid is a matroid whose independent sets are the forests in a given finite undirected graph. The dual matroids of graphic matroids are called co-graphic matroids or bond matroids. A matroid that is both graphic and co-graphic is sometimes called a planar matroid ; these are exactly the graphic matroids formed from planar graphs.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

<span class="mw-page-title-main">Euclidean minimum spanning tree</span> Shortest network connecting points

A Euclidean minimum spanning tree of a finite set of points in the Euclidean plane or higher-dimensional Euclidean space connects the points by a system of line segments with the points as endpoints, minimizing the total length of the segments. In it, any two points can reach each other along a path through the line segments. It can be found as the minimum spanning tree of a complete graph with the points as vertices and the Euclidean distances between points as edge weights.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

The arboricity of an undirected graph is the minimum number of forests into which its edges can be partitioned. Equivalently it is the minimum number of spanning forests needed to cover all the edges of the graph. The Nash-Williams theorem provides necessary and sufficient conditions for when a graph is k-arboric.

The reverse-delete algorithm is an algorithm in graph theory used to obtain a minimum spanning tree from a given connected, edge-weighted graph. It first appeared in Kruskal (1956), but it should not be confused with Kruskal's algorithm which appears in the same paper. If the graph is disconnected, this algorithm will find a minimum spanning tree for each disconnected part of the graph. The set of these minimum spanning trees is called a minimum spanning forest, which contains every vertex in the graph.

In graph theory, Edmonds' algorithm or Chu–Liu/Edmonds' algorithm is an algorithm for finding a spanning arborescence of minimum weight . It is the directed analog of the minimum spanning tree problem. The algorithm was proposed independently first by Yoeng-Jin Chu and Tseng-Hong Liu (1965) and then by Jack Edmonds (1967).

<span class="mw-page-title-main">Pseudoforest</span> Graph with at most one cycle per component

In graph theory, a pseudoforest is an undirected graph in which every connected component has at most one cycle. That is, it is a system of vertices and edges connecting pairs of vertices, such that no two cycles of consecutive edges share any vertex with each other, nor can any two cycles be connected to each other by a path of consecutive edges. A pseudotree is a connected pseudoforest.

<span class="mw-page-title-main">Cycle basis</span> Cycles in a graph that generate all cycles

In graph theory, a branch of mathematics, a cycle basis of an undirected graph is a set of simple cycles that forms a basis of the cycle space of the graph. That is, it is a minimal set of cycles that allows every even-degree subgraph to be expressed as a symmetric difference of basis cycles.

In theoretical computer science and network routing, Suurballe's algorithm is an algorithm for finding two disjoint paths in a nonnegatively-weighted directed graph, so that both paths connect the same pair of vertices and have minimum total length. The algorithm was conceived by John W. Suurballe and published in 1974. The main idea of Suurballe's algorithm is to use Dijkstra's algorithm to find one path, to modify the weights of the graph edges, and then to run Dijkstra's algorithm a second time. The output of the algorithm is formed by combining these two paths, discarding edges that are traversed in opposite directions by the paths, and using the remaining edges to form the two paths to return as the output. The modification to the weights is similar to the weight modification in Johnson's algorithm, and preserves the non-negativity of the weights while allowing the second instance of Dijkstra's algorithm to find the correct second path.

The expected linear time MST algorithm is a randomized algorithm for computing the minimum spanning forest of a weighted graph with no isolated vertices. It was developed by David Karger, Philip Klein, and Robert Tarjan. The algorithm relies on techniques from Borůvka's algorithm along with an algorithm for verifying a minimum spanning tree in linear time. It combines the design paradigms of divide and conquer algorithms, greedy algorithms, and randomized algorithms to achieve expected linear performance.

In mathematics, a minimum bottleneck spanning tree (MBST) in an undirected graph is a spanning tree in which the most expensive edge is as cheap as possible. A bottleneck edge is the highest weighted edge in a spanning tree. A spanning tree is a minimum bottleneck spanning tree if the graph does not contain a spanning tree with a smaller bottleneck edge weight. For a directed graph, a similar problem is known as Minimum Bottleneck Spanning Arborescence (MBSA).

In graph theory a minimum spanning tree (MST) of a graph with and is a tree subgraph of that contains all of its vertices and is of minimum weight.

References

  1. Kleinberg, Jon (2006). Algorithm design. Éva Tardos. Boston: Pearson/Addison-Wesley. pp. 142–151. ISBN   0-321-29535-8. OCLC   57422612.
  2. Cormen, Thomas; Charles E Leiserson, Ronald L Rivest, Clifford Stein (2009). Introduction To Algorithms (Third ed.). MIT Press. pp.  631. ISBN   978-0262258104.{{cite book}}: CS1 maint: multiple names: authors list (link)
  3. Kruskal, J. B. (1956). "On the shortest spanning subtree of a graph and the traveling salesman problem". Proceedings of the American Mathematical Society . 7 (1): 48–50. doi: 10.1090/S0002-9939-1956-0078686-7 . JSTOR   2033241.
  4. Loberman, H.; Weinberger, A. (October 1957). "Formal Procedures for connecting terminals with a minimum total wire length". Journal of the ACM. 4 (4): 428–437. doi: 10.1145/320893.320896 . S2CID   7320964.
  5. Quinn, Michael J.; Deo, Narsingh (1984). "Parallel graph algorithms". ACM Computing Surveys. 16 (3): 319–348. doi: 10.1145/2514.2515 . S2CID   6833839.
  6. Grama, Ananth; Gupta, Anshul; Karypis, George; Kumar, Vipin (2003). Introduction to Parallel Computing. Addison-Wesley. pp. 412–413. ISBN   978-0201648652.
  7. 1 2 Osipov, Vitaly; Sanders, Peter; Singler, Johannes (2009). "The filter-kruskal minimum spanning tree algorithm". Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics: 52–61. doi: 10.1137/1.9781611972894.5 . ISBN   978-0-89871-930-7.
  8. Katsigiannis, Anastasios; Anastopoulos, Nikos; Konstantinos, Nikas; Koziris, Nectarios (2012). "An Approach to Parallelize Kruskal's Algorithm Using Helper Threads". 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (PDF). pp. 1601–1610. doi:10.1109/IPDPSW.2012.201. ISBN   978-1-4673-0974-5. S2CID   14430930.
  9. Lončar, Vladimir; Škrbić, Srdjan; Balaž, Antun (2014). "Parallelization of Minimum Spanning Tree Algorithms Using Distributed Memory Architectures". Transactions on Engineering Technologies. pp. 543–554. doi:10.1007/978-94-017-8832-8_39. ISBN   978-94-017-8831-1.