Brandes' algorithm

Last updated

Brandes' algorithm
Graph betweenness.svg
An undirected graph colored based on the betweenness centrality of each vertex from least (red) to greatest (blue).
Class Centrality
Network theory
Data structure Connected graph
Worst-case performance (unweighted)
(weighted)
Worst-case space complexity

In network theory, Brandes' algorithm is an algorithm for calculating the betweenness centrality of vertices in a graph. The algorithm was first published in 2001 by Ulrik Brandes. [1] Betweenness centrality, along with other measures of centrality, is an important measure in many real-world networks, such as social networks and computer networks. [2] [3] [4]

Contents

Definitions

There are several metrics for the centrality of a node, one such metric being the betweenness centrality. [5] For a node in a connected graph, the betweenness centrality is defined as: [6] [7]

where is the total number of shortest paths from node to node , and is the number of these paths which pass through . For an unweighted graph, the length of a path is considered to be the number of edges it contains.

By convention, whenever , since the only path is the empty path. Also, if is either or , since shortest paths do not pass through their endpoints.

The quantity

is known as the pair dependency of on , and represents the proportion of the shortest paths which travel via . The betweenness centrality is simply the sum of the pair dependencies over all pairs. As well as the pair dependency, it is also useful to define the (single) dependency on , with respect to a particular vertex :

,

with which, we can obtain the concise formulation

.

Algorithm

Brandes' algorithm calculates the betweenness centrality of all nodes in a graph. For every vertex , there are two stages.

Single-source shortest path

The number of shortest paths between and every vertex is calculated using breadth-first search. The breadth-first search starts at , and the shortest distance of each vertex from is recorded, dividing the graph into discrete layers. Additionally, each vertex keeps track of the set of vertices which in the preceding layer which point to it, . Described in set-builder notation, it can be written as:

.

This lends itself to a simple iterative formula for :

,

which essentially states that, if is at depth , then any shortest path at depth extended by a single edge to becomes a shortest path to .

Backpropagation

Brandes proved the following recursive formula for vertex dependencies: [1]

,

where the sum is taken over all vertices that are one edge further away from than . This lemma eliminates the need to explicitly sum all of the pair dependencies. Using this formula, the single dependency of on a vertex at depth is determined by the layer at depth . Furthermore, the order of summation is irrelevant, which allows for a bottom up approach starting at the deepest layer.

It turns out that the dependencies of on all other vertices can be computed in time. During the breadth-first search, the order in which vertices are visited is logged in a stack data structure. The backpropagation step then repeatedly pops off vertices, which are naturally sorted by their distance from , descending.

For each popped node , we iterate over its predecessors : the contribution of towards is added, that is,

.

Crucially, every layer propagates its dependencies completely, before moving to the layer with lower depth, due to the nature of breadth-first search. Once the propagation reaches back to , every vertex now contains . These can simply be added to , since

.

After iterations of single-source shortest path and backpropagation, each contains the betweenness centrality for .

Pseudocode

The following pseudocode illustrates Brandes' algorithm on an unweighted directed graph. [8]

algorithm Brandes(Graph) isfor eachuinGraph.Verticesdo         CB[u] ← 0      for eachsinGraph.Verticesdofor eachvinGraph.Verticesdo             δ[v] ← 0                   // Single dependency of s on v             prev[v] ← empty list       // Immediate predecessors of v during BFS             σ[v] ← 0                   // Number of shortest paths from s to v (s implied)             dist[v] ← null             // No paths are known initially,         σ[s] ← 1                       // except the start vertex         dist[s] ← 0          Q ← queue containing only s    // Breadth-first search         S ← empty stack                // Record the order in which vertices are visited          // Single-source shortest pathswhileQ is not empty douQ.dequeue()             S.push(u)              for eachvinGraph.Neighbours[u] doif dist[v] = nullthen                     dist[v] ← dist[u] + 1                     Q.enqueue(v)                 if dist[v] = dist[u] + 1 then                     σ[v] ← σ[v] + σ[u]                     prev[v].append(u)          // Backpropagation of dependencieswhileS is not empty dovS.pop()              for eachuin prev[v] do                 δ[u] ← δ[u] + σ[u] / σ[v] * (1 + δ[v])                  ifusthen                     CB[v] ← CB[v] + δ[v]    // Halved for undirected graphs      return CB

Running time

The running time of the algorithm is expressed in terms of the number of vertices and the number of edges .

For each vertex , we run breadth-first search, which takes time. Since the graph is connected, the component subsumes the term, since the number of edges is at least .

In the backpropagation stage, every vertex is popped off the stack, and its predecessors are iterated over. However, since each predecessor entry corresponds to an edge in the graph, this stage is also bounded by .

The overall running time of the algorithm is therefore , an improvement on the time bounds achieved by prior algorithms. [1] In addition, Brandes' algorithm improves on the space complexity of naive algorithms, which typically require space. Brandes' algorithm only stores at most predecessors, along with data for each vertex, making its extra space complexity

Variants

The algorithm can be generalised to weighted graphs by using Dijkstra's algorithm instead of breadth-first search. When operating on undirected graphs, the betweenness centrality may be divided by 2, to avoid double counting each path's reversed counterpart. Variants also exist to calculate different measures of centrality, including betweenness with paths at most length , edge betweenness, load betweenness, and stress betweenness. [8]

Related Research Articles

<span class="mw-page-title-main">Simplex</span> Multi-dimensional generalization of triangle

In geometry, a simplex is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. For example,

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

In computer science, the Floyd–Warshall algorithm is an algorithm for finding shortest paths in a directed weighted graph with positive or negative edge weights. A single execution of the algorithm will find the lengths of shortest paths between all pairs of vertices. Although it does not return details of the paths themselves, it is possible to reconstruct the paths with simple modifications to the algorithm. Versions of the algorithm can also be used for finding the transitive closure of a relation , or widest paths between all pairs of vertices in a weighted graph.

<span class="mw-page-title-main">Maximum flow problem</span> Computational problem in graph theory

In optimization theory, maximum flow problems involve finding a feasible flow through a flow network that obtains the maximum possible flow rate.

<span class="mw-page-title-main">Deterministic finite automaton</span> Finite-state machine

In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automaton (DFSA)—is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string. Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, Warren McCulloch and Walter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

In the mathematical field of graph theory, the distance between two vertices in a graph is the number of edges in a shortest path connecting them. This is also known as the geodesic distance or shortest-path distance. Notice that there may be more than one shortest path between two vertices. If there is no path connecting the two vertices, i.e., if they belong to different connected components, then conventionally the distance is defined as infinite.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

The Hungarian method is a combinatorial optimization algorithm that solves the assignment problem in polynomial time and which anticipated later primal–dual methods. It was developed and published in 1955 by Harold Kuhn, who gave it the name "Hungarian method" because the algorithm was largely based on the earlier works of two Hungarian mathematicians, Dénes Kőnig and Jenő Egerváry. However, in 2006 it was discovered that Carl Gustav Jacobi had solved the assignment problem in the 19th century, and the solution had been published posthumously in 1890 in Latin.

In computer science, the Hopcroft–Karp algorithm is an algorithm that takes a bipartite graph as input and produces a maximum-cardinality matching as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in time in the worst case, where is set of edges in the graph, is set of vertices of the graph, and it is assumed that . In the case of dense graphs the time bound becomes , and for sparse random graphs it runs in time with high probability.

<span class="mw-page-title-main">Shortest-path tree</span>

In mathematics and computer science, a shortest-path tree rooted at a vertex v of a connected, undirected graph G is a spanning tree T of G, such that the path distance from root v to any other vertex u in T is the shortest path distance from v to u in G.

In mathematics and physics, a quantum graph is a linear, network-shaped structure of vertices connected on edges in which each edge is given a length and where a differential equation is posed on each edge. An example would be a power network consisting of power lines (edges) connected at transformer stations (vertices); the differential equations would then describe the voltage along each of the lines, with boundary conditions for each edge provided at the adjacent vertices ensuring that the current added over all edges adds to zero at each vertex.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In computer science, lexicographic breadth-first search or Lex-BFS is a linear time algorithm for ordering the vertices of a graph. The algorithm is different from a breadth-first search, but it produces an ordering that is consistent with breadth-first search.

In computer science, the method of contraction hierarchies is a speed-up technique for finding the shortest-path in a graph. The most intuitive applications are car-navigation systems: a user wants to drive from to using the quickest possible route. The metric optimized here is the travel time. Intersections are represented by vertices, the road sections connecting them by edges. The edge weights represent the time it takes to drive along this segment of the road. A path from to is a sequence of edges ; the shortest path is the one with the minimal sum of edge weights among all possible paths. The shortest path in a graph can be computed using Dijkstra's algorithm but, given that road networks consist of tens of millions of vertices, this is impractical. Contraction hierarchies is a speed-up method optimized to exploit properties of graphs representing road networks. The speed-up is achieved by creating shortcuts in a preprocessing phase which are then used during a shortest-path query to skip over "unimportant" vertices. This is based on the observation that road networks are highly hierarchical. Some intersections, for example highway junctions, are "more important" and higher up in the hierarchy than for example a junction leading into a dead end. Shortcuts can be used to save the precomputed distance between two important junctions such that the algorithm doesn't have to consider the full path between these junctions at query time. Contraction hierarchies do not know about which roads humans consider "important", but they are provided with the graph as input and are able to assign importance to vertices using heuristics.

<span class="mw-page-title-main">Betweenness centrality</span> Measure of a graphs centrality, based on shortest paths

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through or the sum of the weights of the edges is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

<span class="mw-page-title-main">Suffix automaton</span> Deterministic finite automaton accepting set of all suffixes of particular string

In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix automaton of a string is the smallest directed acyclic graph with a dedicated initial vertex and a set of "final" vertices, such that paths from the initial vertex to final vertices represent the suffixes of the string.

In network theory, the Wiener connector is a means of maximizing efficiency in connecting specified "query vertices" in a network. Given a connected, undirected graph and a set of query vertices in a graph, the minimum Wiener connector is an induced subgraph that connects the query vertices and minimizes the sum of shortest path distances among all pairs of vertices in the subgraph. In combinatorial optimization, the minimum Wiener connector problem is the problem of finding the minimum Wiener connector. It can be thought of as a version of the classic Steiner tree problem, where instead of minimizing the size of the tree, the objective is to minimize the distances in the subgraph.

A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.

References

  1. 1 2 3 Brandes, Ulrik (June 2001). "A faster algorithm for betweenness centrality". The Journal of Mathematical Sociology. 25 (2): 163–177. doi:10.1080/0022250X.2001.9990249. ISSN   0022-250X . Retrieved 10 May 2024.
  2. Wasserman, Stanley; Faust, Katherine (1994). Social Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences. Cambridge: Cambridge University Press. doi:10.1017/cbo9780511815478. ISBN   978-0-521-38707-1.
  3. Borgatti, Stephen P.; Everett, Martin G. (1 October 2006). "A Graph-theoretic perspective on centrality". Social Networks. 28 (4): 466–484. doi:10.1016/j.socnet.2005.11.005. ISSN   0378-8733 . Retrieved 10 May 2024.
  4. Kleinberg, Jon M. (1 September 1999). "Authoritative sources in a hyperlinked environment". Journal of the ACM. 46 (5): 604–632. doi:10.1145/324133.324140. ISSN   0004-5411 . Retrieved 10 May 2024.
  5. Sabidussi, Gert (1 December 1966). "The centrality index of a graph". Psychometrika. 31 (4): 581–603. doi:10.1007/BF02289527. ISSN   1860-0980. PMID   5232444 . Retrieved 10 May 2024.
  6. Freeman, Linton C. (1977). "A Set of Measures of Centrality Based on Betweenness". Sociometry. 40 (1): 35–41. doi:10.2307/3033543. ISSN   0038-0431. JSTOR   3033543.
  7. Anthonisse, J. M. (1 January 1971). "The rush in a directed graph". Stichting Mathematisch Centrum .
  8. 1 2 Brandes, Ulrik (May 2008). "On variants of shortest-path betweenness centrality and their generic computation". Social Networks. 30 (2): 136–145. doi:10.1016/j.socnet.2007.11.001. ISSN   0378-8733 . Retrieved 10 May 2024.