Parallel algorithms for minimum spanning trees

Last updated July 31, 2023

In graph theory a minimum spanning tree (MST) $T$ of a graph $G=(V,E)$ with $|V|=n$ and $|E|=m$ is a tree subgraph of $G$ that contains all of its vertices and is of minimum weight.

MSTs are useful and versatile tools utilised in a wide variety of practical and theoretical fields. For example, a company looking to supply multiple stores with a certain product from a single warehouse might use an MST originating at the warehouse to calculate the shortest paths to each company store. In this case the stores and the warehouse are represented as vertices and the road connections between them - as edges. Each edge is labelled with the length of the corresponding road connection.

If $G$ is edge-unweighted every spanning tree possesses the same number of edges and thus the same weight. In the edge-weighted case, the spanning tree, the sum of the weights of the edges of which is lowest among all spanning trees of $G$ , is called a minimum spanning tree (MST). It is not necessarily unique. More generally, graphs that are not necessarily connected have minimum spanning forests, which consist of a union of MSTs for each connected component.

As finding MSTs is a widespread problem in graph theory, there exist many sequential algorithms for solving it. Among them are Prim's, Kruskal's and Borůvka's algorithms, each utilising different properties of MSTs. They all operate in a similar fashion - a subset of $E$ is iteratively grown until a valid MST has been discovered. However, as practical problems are often quite large (road networks sometimes have billions of edges), performance is a key factor. One option of improving it is by parallelising known MST algorithms.^[1]

Prim's algorithm

This algorithm utilises the cut-property of MSTs. A simple high-level pseudocode implementation is provided below:

 $T\gets \emptyset$  $S\gets \{s\}$  where  $s$  is a random vertex in  $V$ repeat $|V|-1$  times     find lightest edge  $(u,v)$  s.t.  $u\in S$  but  $v\in (V\setminus S)$  $S\gets S\cup \{v\}$  $T\gets T\cup \{(u,v)\}$ return T

Each edge is observed exactly twice - namely when examining each of its endpoints. Each vertex is examined exactly once for a total of $O(n+m)$ operations aside from the selection of the lightest edge at each loop iteration. This selection is often performed using a priority queue (PQ). For each edge at most one decreaseKey operation (amortised in $O(1)$ ) is performed and each loop iteration performs one deleteMin operation ( $O(\log n)$ ). Thus using Fibonacci heaps the total runtime of Prim's algorithm is asymptotically in $O(m+n\log n)$ .

It is important to note that the loop is inherently sequential and can not be properly parallelised. This is the case, since the lightest edge with one endpoint in $S$ and on in $V\setminus S$ might change with the addition of edges to $T$ . Thus no two selections of a lightest edge can be performed at the same time. However, there do exist some attempts at parallelisation.

One possible idea is to use $O(n)$ processors to support PQ access in $O(1)$ on an EREW-PRAM machine,^[2] thus lowering the total runtime to $O(n+m)$ .

Kruskal's algorithm

Kruskal's MST algorithm utilises the cycle property of MSTs. A high-level pseudocode representation is provided below.

 $T\gets$  forest with every vertex in its own subtree foreach $(u,v)\in E$  in ascending order of weight     if $u$  and  $v$  in different subtrees of  $T$  $T\gets T\cup \{(u,v)\}$ return T

The subtrees of $T$ are stored in union-find data structures, which is why checking whether or not two vertices are in the same subtree is possible in amortised $O(\alpha (m,n))$ where $\alpha (m,n)$ is the inverse Ackermann function. Thus the total runtime of the algorithm is in $O(sort(n)+\alpha (n))$ . Here $\alpha (n)$ denotes the single-valued inverse Ackermann function, for which any realistic input yields an integer less than five.

Approach 1: Parallelising the sorting step

Similarly to Prim's algorithm there are components in Kruskal's approach that can not be parallelised in its classical variant. For example, determining whether or not two vertices are in the same subtree is difficult to parallelise, as two union operations might attempt to join the same subtrees at the same time. Really the only opportunity for parallelisation lies in the sorting step. As sorting is linear in the optimal case on $O(\log n)$ processors, the total runtime can be reduced to $O(m\alpha (n))$ .

Approach 2: Filter-Kruskal

Another approach would be to modify the original algorithm by growing $T$ more aggressively. This idea was presented by Osipov et al.^[3]^[4] The basic idea behind Filter-Kruskal is to partition the edges in a similar way to quicksort and filter out edges that connect vertices that belong to the same tree in order to reduce the cost of sorting. A high-level pseudocode representation is provided below.

filterKruskal( $G$ ): if $m<$  KruskalThreshold:     return kruskal( $G$ ) pivot = chooseRandom( $E$ )  $(E_{\leq }$ ,  $E_{>})\gets$ partition( $E$ , pivot)  $A\gets$  filterKruskal( $E_{\leq }$ )  $E_{>}\gets$  filter( $E_{>}$ )  $A\gets A$  $\cup$  filterKruskal( $E_{>}$ ) return $A$   partition( $E$ , pivot):  $E_{\leq }\gets \emptyset$  $E_{>}\gets \emptyset$ foreach $(u,v)\in E$ :     if weight( $u,v$ )  $\leq$  pivot:          $E_{\leq }\gets E_{\leq }\cup {(u,v)}$ else $E_{>}\gets E_{>}\cup {(u,v)}$ return ( $E_{\leq }$ ,  $E_{>}$ )  filter( $E$ ):  $E_{filtered}\gets \emptyset$ foreach $(u,v)\in E$ :     if find-set(u)  $\neq$  find-set(v):          $E_{filtered}\gets E_{filtered}\cup {(u,v)}$ return $E_{filtered}$

Filter-Kruskal is better suited for parallelisation, since sorting, partitioning and filtering have intuitively easy parallelisations where the edges are simply divided between the cores.

Borůvka's algorithm

The main idea behind Borůvka's algorithm is edge contraction. An edge $\{u,v\}$ is contracted by first removing $v$ from the graph and then redirecting every edge $\{w,v\}\in E$ to $\{w,u\}$ . These new edges retain their old edge weights. If the goal is not just to determine the weight of an MST but also which edges it comprises, it must be noted between which pairs of vertices an edge was contracted. A high-level pseudocode representation is presented below.

 $T\gets \emptyset$ while $|V|>1$  $S\gets \emptyset$ for $v\in V$  $S\gets S$  $\cup$  lightest  $\{u,v\}\in E$ for $\{u,v\}\in S$          contract  $\{u,v\}$  $T\gets T\cup S$ return T

It is possible that contractions lead to multiple edges between a pair of vertices. The intuitive way of choosing the lightest of them is not possible in $O(m)$ . However, if all contractions that share a vertex are performed in parallel this is doable. The recursion stops when there is only a single vertex remaining, which means the algorithm needs at most $\log n$ iterations, leading to a total runtime in $O(m\log n)$ .

Parallelisation

One possible parallelisation of this algorithm^[5]^[6]^[7] yields a polylogarithmic time complexity, i.e. $T(m,n,p)\cdot p\in O(m\log n)$ and there exists a constant $c$ so that $T(m,n,p)\in O(\log ^{c}m)$ . Here $T(m,n,p)$ denotes the runtime for a graph with $m$ edges, $n$ vertices on a machine with $p$ processors. The basic idea is the following:

while $|V|>1$      find lightest incident edges //  $O({\frac {m}{p}}+\log n+\log p)$      assign the corresponding subgraph to each vertex //  $O({\frac {n}{p}}+\log n)$      contract each subgraph //  $O({\frac {m}{p}}+\log n)$

The MST then consists of all the found lightest edges.

This parallelisation utilises the adjacency array graph representation for $G=(V,E)$ . This consists of three arrays - $\Gamma$ of length $n+1$ for the vertices, $\gamma$ of length $m$ for the endpoints of each of the $m$ edges and $c$ of length $m$ for the edges' weights. Now for vertex $i$ the other end of each edge incident to $i$ can be found in the entries between $\gamma [\Gamma [i-1]]$ and $\gamma [\Gamma [i]]$ . The weight of the $i$ -th edge in $\Gamma$ can be found in $c[i]$ . Then the $i$ -th edge in $\gamma$ is between vertices $u$ and $v$ if and only if $\Gamma [u]\leq i<\Gamma [u+1]$ and $\gamma [i]=v$ .

Finding the lightest incident edge

First the edges are distributed between each of the $p$ processors. The $i$ -th processor receives the edges stored between $\gamma [{\frac {im}{p}}]$ and $\gamma [{\frac {(i+1)m}{p}}-1]$ . Furthermore, each processor needs to know to which vertex these edges belong (since $\gamma$ only stores one of the edge's endpoints) and stores this in the array $pred$ . Obtaining this information is possible in $O(\log n)$ using $p$ binary searches or in $O({\frac {n}{p}}+p)$ using a linear search. In practice the latter approach is sometimes quicker, even though it is asymptotically worse.

Now each processor determines the lightest edge incident to each of its vertices.

 $v\gets$  find( ${\frac {im}{p}}$ ,  $\Gamma$ ) for $e\gets {\frac {im}{p}};e<{\frac {(i+1)m}{p}}-1;e++$ if $\Gamma [v+1]=e$  $v++$ if $c[e]<c[pred[v]]$  $pred[v]\gets e$

Here the issue arises some vertices are handled by more than one processor. A possible solution to this is that every processor has its own $prev$ array which is later combined with those of the others using a reduction. Each processor has at most two vertices that are also handled by other processors and each reduction is in $O(\log p)$ . Thus the total runtime of this step is in $O({\frac {m}{p}}+\log n+\log p)$ .

Assigning subgraphs to vertices

Observe the graph that consists solely of edges collected in the previous step. These edges are directed away from the vertex to which they are the lightest incident edge. The resulting graph decomposes into multiple weakly connected components. The goal of this step is to assign to each vertex the component of which it is a part. Note that every vertex has exactly one outgoing edge and therefore each component is a pseudotree - a tree with a single extra edge that runs in parallel to the lightest edge in the component but in the opposite direction. The following code mutates this extra edge into a loop:

parallel forAll $v\in V$  $w\gets pred[v]$ if $pred[w]=v\land v<w$  $pred[v]\gets v$

Now every weakly connected component is a directed tree where the root has a loop. This root is chosen as the representative of each component. The following code uses doubling to assign each vertex its representative:

while $\exists v\in V:pred[v]\neq pred[pred[v]]$ forAll $v\in V$  $pred[v]\gets pred[pred[v]]$

Now every subgraph is a star. With some advanced techniques this step needs $O({\frac {n}{p}}+\log n)$ time.

Contracting the subgraphs

In this step each subgraph is contracted to a single vertex.

 $k\gets$  number of subgraphs  $V'\gets \{0,\dots ,k-1\}$  find a bijective function  $f:$  star root  $\rightarrow \{0,\dots ,k-1\}$  $E'\gets \{(f(pred[v]),f(pred[w]),c,e_{old}):(v,w)\in E\land pred[v]\neq pred[w]\}$

Finding the bijective function is possible in $O({\frac {n}{p}}+\log p)$ using a prefix sum. As we now have a new set of vertices and edges the adjacency array must be rebuilt, which can be done using Integersort on $E'$ in $O({\frac {m}{p}}+\log p)$ time.

Complexity

Each iteration now needs $O({\frac {m}{p}}+\log n)$ time and just like in the sequential case there are $\log n$ iterations, resulting in a total runtime of $O(\log n({\frac {m}{p}}+\log n))$ . If $m\in \Omega (p\log ^{2}p)$ the efficiency of the algorithm is in $\Theta (1)$ and it is relatively efficient. If $m\in O(n)$ then it is absolutely efficient.

Further algorithms

There are multiple other parallel algorithms that deal the issue of finding an MST. With a linear number of processors it is possible to achieve this in $O(\log n)$ .^[8]^[9] Bader and Cong presented an MST-algorithm, that was five times quicker on eight cores than an optimal sequential algorithm.^[10]

Another challenge is the External Memory model - there is a proposed algorithm due to Dementiev et al. that is claimed to be only two to five times slower than an algorithm that only makes use of internal memory^[11]

Related Research Articles

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

<span class="mw-page-title-main">Kruskal's algorithm</span> Minimum spanning forest algorithm that greedily adds edges

Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a greedy algorithm in graph theory as in each step it adds the next lowest-weight edge that will not form a cycle to the minimum spanning forest.

<span class="mw-page-title-main">Prim's algorithm</span> Method for finding minimum spanning trees

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

In graph theory, an Eulerian trail is a trail in a finite graph that visits every edge exactly once. Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. The problem can be stated mathematically like this:

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph $G$ is a subset $D$ of its vertices, such that any vertex of $G$ is either in $D$ , or has a neighbor in $D$ . The domination number $γ(G)$ is the number of vertices in a smallest dominating set for $G$ .

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex $can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .$

The distributed minimum spanning tree (MST) problem involves the construction of a minimum spanning tree by a distributed algorithm, in a network where nodes communicate by message passing. It is radically different from the classical sequential problem, although the most basic approach resembles Borůvka's algorithm. One important application of this problem is to find a tree that can be used for broadcasting. In particular, if the cost for a message to pass through an edge in a graph is significant, an MST can minimize the total cost for a source process to communicate with all the other processes in the network.

In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to Ajtai, Komlós & Szemerédi (1987), and the more general version is typically attributed to Gillman (1998).

In computer science and graph theory, Karger's algorithm is a randomized algorithm to compute a minimum cut of a connected graph. It was invented by David Karger and first published in 1993.

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

In mathematics, a packing in a hypergraph is a partition of the set of the hypergraph's edges into a number of disjoint subsets such that no pair of edges in each subset share any vertex. There are two famous algorithms to achieve asymptotically optimal packing in k-uniform hypergraphs. One of them is a random greedy algorithm which was proposed by Joel Spencer. He used a branching process to formally prove the optimal achievable bound under some side conditions. The other algorithm is called the Rödl nibble and was proposed by Vojtěch Rödl et al. They showed that the achievable packing by the Rödl nibble is in some sense close to that of the random greedy algorithm.

In mathematics, the Kontsevich quantization formula describes how to construct a generalized ★-product operator algebra from a given arbitrary finite-dimensional Poisson manifold. This operator algebra amounts to the deformation quantization of the corresponding Poisson algebra. It is due to Maxim Kontsevich.

In coding theory, expander codes form a class of error-correcting codes that are constructed from bipartite expander graphs. Along with Justesen codes, expander codes are of particular interest since they have a constant positive rate, a constant positive relative distance, and a constant alphabet size. In fact, the alphabet contains only two elements, so expander codes belong to the class of binary codes. Furthermore, expander codes can be both encoded and decoded in time proportional to the block length of the code.

The expected linear time MST algorithm is a randomized algorithm for computing the minimum spanning forest of a weighted graph with no isolated vertices. It was developed by David Karger, Philip Klein, and Robert Tarjan. The algorithm relies on techniques from Borůvka's algorithm along with an algorithm for verifying a minimum spanning tree in linear time. It combines the design paradigms of divide and conquer algorithms, greedy algorithms, and randomized algorithms to achieve expected linear performance.

In coding theory, Zemor's algorithm, designed and developed by Gilles Zemor, is a recursive low-complexity approach to code construction. It is an improvement over the algorithm of Sipser and Spielman.

A hyperbolic geometric graph (HGG) or hyperbolic geometric network (HGN) is a special type of spatial network where (1) latent coordinates of nodes are sprinkled according to a probability density function into a hyperbolic space of constant negative curvature and (2) an edge between two nodes is present if they are close according to a function of the metric (typically either a Heaviside step function resulting in deterministic connections between vertices closer than a certain threshold distance, or a decaying function of hyperbolic distance yielding the connection probability). A HGG generalizes a random geometric graph (RGG) whose embedding space is Euclidean.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex $to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.$

References

↑ Sanders; Dietzfelbinger; Martin; Mehlhorn; Kurt; Peter (2014-06-10). Algorithmen und Datenstrukturen Die Grundwerkzeuge. Springer Vieweg. ISBN 978-3-642-05472-3.
↑ Brodal, Gerth Stølting; Träff, Jesper Larsson; Zaroliagis, Christos D. (1998), "A Parallel Priority Queue with Constant Time Operations", Journal of Parallel and Distributed Computing, 49 (1): 4–21, CiteSeerX 10.1.1.48.3272 , doi:10.1006/jpdc.1998.1425
↑ Osipov, Vitaly; Sanders, Peter; Singler, Johannes (2009), "The filter-kruskal minimum spanning tree algorithm", Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics: 52–61, CiteSeerX 10.1.1.218.2574
↑ Sanders, Peter. "Algorithm Engineering script" (PDF). Algorithm Engineering KIT Homepage. Retrieved 25 February 2019.
↑ Sanders, Peter. "Parallel Algorithms script" (PDF). Parallel Algorithms KIT Homepage. Retrieved 25 February 2019.
↑ Zadeh, Reza. "Distributed Algorithms and Optimization" (PDF). Distributed Algorithms and Optimization Stanford University Homepage. Retrieved 25 February 2019.
↑ Chun, Sun; Condon, Anne (1996). "Parallel implementation of Bouvka's minimum spanning tree algorithm". Proceedings of International Conference on Parallel Processing. pp. 302–308. doi:10.1109/IPPS.1996.508073. ISBN 0-8186-7255-2. S2CID 12710022.
↑ Chong, Ka Wong; Han, Yijie; Lam, Tak Wah (2001), "Concurrent threads and optimal parallel minimum spanning trees algorithm", Journal of the Association for Computing Machinery , 48 (2): 297–323, CiteSeerX 10.1.1.32.1554 , doi:10.1145/375827.375847, MR 1868718, S2CID 1778676
↑ Pettie, Seth; Ramachandran, Vijaya (2002), "A randomized time-work optimal parallel algorithm for finding a minimum spanning forest" (PDF), SIAM Journal on Computing , 31 (6): 1879–1895, doi:10.1137/S0097539700371065, MR 1954882
↑ Bader, David A.; Cong, Guojing (2006), "Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs", Journal of Parallel and Distributed Computing, 66 (11): 1366–1378, doi:10.1016/j.jpdc.2006.06.001
↑ Dementiev, Roman; Sanders, Peter; Schultes, Dominik; Sibeyn, Jop F. (2004), "Engineering an external memory minimum spanning tree algorithm", Proc. IFIP 18th World Computer Congress, TC1 3rd International Conference on Theoretical Computer Science (TCS2004) (PDF), pp. 195–208.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Sanders; Dietzfelbinger; Martin; Mehlhorn; Kurt; Peter (2014-06-10). Algorithmen und Datenstrukturen Die Grundwerkzeuge. Springer Vieweg. ISBN 978-3-642-05472-3.

[2] Brodal, Gerth Stølting; Träff, Jesper Larsson; Zaroliagis, Christos D. (1998), "A Parallel Priority Queue with Constant Time Operations", Journal of Parallel and Distributed Computing, 49 (1): 4–21, CiteSeerX 10.1.1.48.3272 , doi:10.1006/jpdc.1998.1425

[3] Osipov, Vitaly; Sanders, Peter; Singler, Johannes (2009), "The filter-kruskal minimum spanning tree algorithm", Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics: 52–61, CiteSeerX 10.1.1.218.2574

[4] Sanders, Peter. "Algorithm Engineering script" (PDF). Algorithm Engineering KIT Homepage. Retrieved 25 February 2019.

[5] Sanders, Peter. "Parallel Algorithms script" (PDF). Parallel Algorithms KIT Homepage. Retrieved 25 February 2019.

[6] Zadeh, Reza. "Distributed Algorithms and Optimization" (PDF). Distributed Algorithms and Optimization Stanford University Homepage. Retrieved 25 February 2019.

[7] Chun, Sun; Condon, Anne (1996). "Parallel implementation of Bouvka's minimum spanning tree algorithm". Proceedings of International Conference on Parallel Processing. pp. 302–308. doi:10.1109/IPPS.1996.508073. ISBN 0-8186-7255-2. S2CID 12710022.

[8] Chong, Ka Wong; Han, Yijie; Lam, Tak Wah (2001), "Concurrent threads and optimal parallel minimum spanning trees algorithm", Journal of the Association for Computing Machinery , 48 (2): 297–323, CiteSeerX 10.1.1.32.1554 , doi:10.1145/375827.375847, MR 1868718, S2CID 1778676

[9] Pettie, Seth; Ramachandran, Vijaya (2002), "A randomized time-work optimal parallel algorithm for finding a minimum spanning forest" (PDF), SIAM Journal on Computing , 31 (6): 1879–1895, doi:10.1137/S0097539700371065, MR 1954882

[10] Bader, David A.; Cong, Guojing (2006), "Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs", Journal of Parallel and Distributed Computing, 66 (11): 1366–1378, doi:10.1016/j.jpdc.2006.06.001

[11] Dementiev, Roman; Sanders, Peter; Schultes, Dominik; Sibeyn, Jop F. (2004), "Engineering an external memory minimum spanning tree algorithm", Proc. IFIP 18th World Computer Congress, TC1 3rd International Conference on Theoretical Computer Science (TCS2004) (PDF), pp. 195–208.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]