Tarjan's strongly connected components algorithm

Last updated
Tarjan's strongly connected components algorithm
Tarjan's Algorithm Animation.gif
Tarjan's algorithm animation
Data structure Graph
Worst-case performance

Tarjan's strongly connected components algorithm is an algorithm in graph theory for finding the strongly connected components (SCCs) of a directed graph. It runs in linear time, matching the time bound for alternative methods including Kosaraju's algorithm and the path-based strong component algorithm. The algorithm is named for its inventor, Robert Tarjan. [1]

Contents

Overview

The algorithm takes a directed graph as input, and produces a partition of the graph's vertices into the graph's strongly connected components. Each vertex of the graph appears in exactly one of the strongly connected components. Any vertex that is not on a directed cycle forms a strongly connected component all by itself: for example, a vertex whose in-degree or out-degree is 0, or any vertex of an acyclic graph.

The basic idea of the algorithm is this: a depth-first search (DFS) begins from an arbitrary start node (and subsequent depth-first searches are conducted on any nodes that have not yet been found). As usual with depth-first search, the search visits every node of the graph exactly once, refusing to revisit any node that has already been visited. Thus, the collection of search trees is a spanning forest of the graph. The strongly connected components will be recovered as certain subtrees of this forest. The roots of these subtrees are called the "roots" of the strongly connected components. Any node of a strongly connected component might serve as a root, if it happens to be the first node of a component that is discovered by search.

Stack invariant

Nodes are placed on a stack in the order in which they are visited. When the depth-first search recursively visits a node v and its descendants, those nodes are not all necessarily popped from the stack when this recursive call returns. The crucial invariant property is that a node remains on the stack after it has been visited if and only if there exists a path in the input graph from it to some node earlier on the stack. In other words, it means that in the DFS a node would be only removed from the stack after all its connected paths have been traversed. When the DFS will backtrack it would remove the nodes on a single path and return to the root in order to start a new path.

At the end of the call that visits v and its descendants, we know whether v itself has a path to any node earlier on the stack. If so, the call returns, leaving v on the stack to preserve the invariant. If not, then v must be the root of its strongly connected component, which consists of v together with any nodes later on the stack than v (such nodes all have paths back to v but not to any earlier node, because if they had paths to earlier nodes then v would also have paths to earlier nodes which is false). The connected component rooted at v is then popped from the stack and returned, again preserving the invariant.

Bookkeeping

Each node v is assigned a unique integer v.index, which numbers the nodes consecutively in the order in which they are discovered. It also maintains a value v.lowlink that represents the smallest index of any node on the stack known to be reachable from v through v's DFS subtree, including v itself. Therefore v must be left on the stack if v.lowlink < v.index, whereas v must be removed as the root of a strongly connected component if v.lowlink == v.index. The value v.lowlink is computed during the depth-first search from v, as this finds the nodes that are reachable from v.

The lowlink is different from the lowpoint, which is the smallest index reachable from v through any part of the graph. [1] :156 [2]

The algorithm in pseudocode

algorithm tarjan isinput: graph G = (V, E)     output: set of strongly connected components (sets of vertices)         index := 0     S := empty stack     for eachvinVdoifv.index is undefined then             strongconnect(v)         function strongconnect(v)         // Set the depth index for v to the smallest unused indexv.index := indexv.lowlink := indexindex := index + 1         S.push(v)         v.onStack := true                // Consider successors of vfor each (v, w) inEdoifw.index is undefined then// Successor w has not yet been visited; recurse on it                 strongconnect(w)                 v.lowlink := min(v.lowlink, w.lowlink)             else ifw.onStack then// Successor w is in stack S and hence in the current SCC// If w is not on stack, then (v, w) is an edge pointing to an SCC already found and must be ignored// The next line may look odd - but is correct.// It says w.index not w.lowlink; that is deliberate and from the original paperv.lowlink := min(v.lowlink, w.index)                // If v is a root node, pop the stack and generate an SCCifv.lowlink = v.index then             start a new strongly connected component             repeatw := S.pop()                 w.onStack := false                 add w to current strongly connected component             whilewv             output the current strongly connected component

The index variable is the depth-first search node number counter. S is the node stack, which starts out empty and stores the history of nodes explored but not yet committed to a strongly connected component. This is not the normal depth-first search stack, as nodes are not popped as the search returns up the tree; they are only popped when an entire strongly connected component has been found.

The outermost loop searches each node that has not yet been visited, ensuring that nodes which are not reachable from the first node are still eventually traversed. The function strongconnect performs a single depth-first search of the graph, finding all successors from the node v, and reporting all strongly connected components of that subgraph.

When each node finishes recursing, if its lowlink is still set to its index, then it is the root node of a strongly connected component, formed by all of the nodes above it on the stack. The algorithm pops the stack up to and including the current node, and presents all of these nodes as a strongly connected component.

Complexity

Time Complexity: The Tarjan procedure is called once for each node; the forall statement considers each edge at most once. The algorithm's running time is therefore linear in the number of edges and nodes in G, i.e. .

In order to achieve this complexity, the test for whether w is on the stack should be done in constant time. This may be done, for example, by storing a flag on each node that indicates whether it is on the stack, and performing this test by examining the flag.

Space Complexity: The Tarjan procedure requires two words of supplementary data per vertex for the index and lowlink fields, along with one bit for onStack and another for determining when index is undefined. In addition, one word is required on each stack frame to hold v and another for the current position in the edge list. Finally, the worst-case size of the stack S must be (i.e. when the graph is one giant component). This gives a final analysis of where is the machine word size. The variation of Nuutila and Soisalon-Soininen reduced this to and, subsequently, that of Pearce requires only . [3] [4]

Additional remarks

While there is nothing special about the order of the nodes within each strongly connected component, one useful property of the algorithm is that no strongly connected component will be identified before any of its successors. Therefore, the order in which the strongly connected components are identified constitutes a reverse topological sort of the DAG formed by the strongly connected components. [5]

Donald Knuth described Tarjan's SCC algorithm as one of his favorite implementations in the book The Stanford GraphBase. [6]

He also wrote: [7]

The data structures that he devised for this problem fit together in an amazingly beautiful way, so that the quantities you need to look at while exploring a directed graph are always magically at your fingertips. And his algorithm also does topological sorting as a byproduct.

Related Research Articles

<span class="mw-page-title-main">Tree (data structure)</span> Linked node hierarchical data structure

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Graph search algorithm

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Depth-first search</span> Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

<span class="mw-page-title-main">Strongly connected component</span> Partition of a graph whose components are reachable from all vertices

In the mathematical theory of directed graphs, a graph is said to be strongly connected if every vertex is reachable from every other vertex. The strongly connected components of a directed graph form a partition into subgraphs that are themselves strongly connected. It is possible to test the strong connectivity of a graph, or to find its strongly connected components, in linear time (that is, Θ(V + E )).

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

<span class="mw-page-title-main">Bridge (graph theory)</span> Edge in node-link graph whose removal would disconnect the graph

In graph theory, a bridge, isthmus, cut-edge, or cut arc is an edge of a graph whose deletion increases the graph's number of connected components. Equivalently, an edge is a bridge if and only if it is not contained in any cycle. For a connected graph, a bridge can uniquely determine a cut. A graph is said to be bridgeless or isthmus-free if it contains no bridges.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

In graph theory, a branch of mathematics, the left-right planarity test or de Fraysseix–Rosenstiehl planarity criterion is a characterization of planar graphs based on the properties of the depth-first search trees, published by de Fraysseix and Rosenstiehl and used by them with Patrice Ossona de Mendez to develop a linear time planarity testing algorithm. In a 2003 experimental comparison of six planarity testing algorithms, this was one of the fastest algorithms tested.

In computer science, Kosaraju-Sharir's algorithm is a linear time algorithm to find the strongly connected components of a directed graph. Aho, Hopcroft and Ullman credit it to S. Rao Kosaraju and Micha Sharir. Kosaraju suggested it in 1978 but did not publish it, while Sharir independently discovered it and published it in 1981. It makes use of the fact that the transpose graph has exactly the same strongly connected components as the original graph.

<span class="mw-page-title-main">Biconnected component</span> Maximal biconnected subgraph

In graph theory, a biconnected component is a maximal biconnected subgraph. Any connected graph decomposes into a tree of biconnected components called the block-cut tree of the graph. The blocks are attached to each other at shared vertices called cut vertices or separating vertices or articulation points. Specifically, a cut vertex is any vertex whose removal increases the number of connected components.

In computer science, the Hopcroft–Karp algorithm is an algorithm that takes a bipartite graph as input and produces a maximum-cardinality matching as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in time in the worst case, where is set of edges in the graph, is set of vertices of the graph, and it is assumed that . In the case of dense graphs the time bound becomes , and for sparse random graphs it runs in time with high probability.

In computer science, graph traversal refers to the process of visiting each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversal is a special case of graph traversal.

In graph theory, the strongly connected components of a directed graph may be found using an algorithm that uses depth-first search in combination with two stacks, one to keep track of the vertices in the current component and the second to keep track of the current search path. Versions of this algorithm have been proposed by Purdom (1970), Munro (1971), Dijkstra (1976), Cheriyan & Mehlhorn (1996), and Gabow (2000); of these, Dijkstra's version was the first to achieve linear time.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In mathematics and computer science, a stack-sortable permutation is a permutation whose elements may be sorted by an algorithm whose internal storage is limited to a single stack data structure. The stack-sortable permutations are exactly the permutations that do not contain the permutation pattern 231; they are counted by the Catalan numbers, and may be placed in bijection with many other combinatorial objects with the same counting function including Dyck paths and binary trees.

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

In graph theory, the weak components of a directed graph partition the vertices of the graph into subsets that are totally ordered by reachability. They form the finest partition of the set of vertices that is totally ordered in this way.

References

  1. 1 2 Tarjan, R. E. (1972), "Depth-first search and linear graph algorithms", SIAM Journal on Computing , 1 (2): 146–160, CiteSeerX   10.1.1.327.8418 , doi:10.1137/0201010
  2. "Lecture #19: Depth First Search and Strong Components" (PDF). 15-451/651: Algorithms Fall 2018. Carnegie Mellon University. pp. 7–8. Retrieved 9 August 2021.
  3. Nuutila, Esko (1994). "On Finding the Strongly Connected Components in a Directed Graph". Information Processing Letters. 49 (1): 9–14. doi:10.1016/0020-0190(94)90047-7.
  4. Pearce, David. "A Space Efficient Algorithm for Detecting Strongly Connected Components". Information Processing Letters. 116 (1): 47–52. doi:10.1016/j.ipl.2015.08.010.
  5. Harrison, Paul. "Robust topological sorting and Tarjan's algorithm in Python" . Retrieved 9 February 2011.
  6. Knuth, The Stanford GraphBase, pages 512–519.
  7. Knuth, Donald (2014-05-20). Twenty Questions for Donald Knuth.