Random walk closeness centrality

Last updated

Random walk closeness centrality is a measure of centrality in a network, which describes the average speed with which randomly walking processes reach a node from other nodes of the network. It is similar to the closeness centrality except that the farness is measured by the expected length of a random walk rather than by the shortest path.

Contents

The concept was first proposed by White and Smyth (2003) under the name Markov centrality. [1]

Intuition

Consider a network with a finite number of nodes and a random walk process that starts in a certain node and proceeds from node to node along the edges. From each node, it chooses randomly the edge to be followed. In an unweighted network, the probability of choosing a certain edge is equal across all available edges, while in a weighted network it is proportional to the edge weights. A node is considered to be close to other nodes, if the random walk process initiated from any node of the network arrives to this particular node in relatively few steps on average.

Definition

Consider a weighted network – either directed or undirected – with n nodes denoted by j=1, …, n; and a random walk process on this network with a transition matrix M. The element of M describes the probability of the random walker that has reached node i, proceeds directly to node j. These probabilities are defined in the following way.

where is the (i,j)th element of the weighting matrix A of the network. When there is no edge between two nodes, the corresponding element of the A matrix is zero.

The random walk closeness centrality of a node i is the inverse of the average mean first passage time to that node:

where is the mean first passage time from node j to node i.

Mean first passage time

The mean first passage time from node i to node j is the expected number of steps it takes for the process to reach node j from node i for the first time:

where P(i,j,r) denotes the probability that it takes exactly r steps to reach j from i for the first time. To calculate these probabilities of reaching a node for the first time in r steps, it is useful to regard the target node as an absorbing one, and introduce a transformation of M by deleting its j-th row and column and denoting it by . As the probability of a process starting at i and being in k after r-1 steps is simply given by the (i,k)th element of , P(i,j,r) can be expressed as

Substituting this into the expression for mean first passage time yields

Using the formula for the summation of geometric series for matrices yields

where I is the n-1 dimensional identity matrix.

For computational convenience, this expression can be vectorized as

where is the vector for first passage times for a walk ending at node j, and e is an n-1 dimensional vector of ones.

Mean first passage time is not symmetric, even for undirected graphs.

In model networks

According to simulations performed by Noh and Rieger (2004), the distribution of random walk closeness centrality in a Barabási-Albert model is mainly determined by the degree distribution. In such a network, the random walk closeness centrality of a node is roughly proportional to, but does not increase monotonically with its degree.

Applications for real networks

Random walk closeness centrality is more relevant measure than the simple closeness centrality in case of applications where the concept of shortest paths is not meaningful or is very restrictive for a reasonable assessment of the nature of the system. This is the case for example when the analyzed process evolves in the network without any specific intention to reach a certain point, or without the ability of finding the shortest path to reach its target. One example for a random walk in a network is the way a certain coin circulates in an economy: it is passed from one person to another through transactions, without any intention of reaching a specific individual. Another example where the concept of shortest paths is not very useful is a densely connected network. Furthermore, as shortest paths are not influenced by self-loops, random walk closeness centrality is a more adequate measure than closeness centrality when analyzing networks where self-loops are important.

An important application on the field of economics is the analysis of the input-output model of an economy, which is represented by a densely connected weighted network with important self-loops. [2]

The concept is widely used in natural sciences as well. One biological application is the analysis of protein-protein interactions. [3]

Random walk betweenness centrality

A related concept, proposed by Newman, [4] is random walk betweenness centrality. Just as random walk closeness centrality is a random walk counterpart of closeness centrality, random walk betweenness centrality is, similarly, the random walk counterpart of betweenness centrality. Unlike the usual betweenness centrality measure, it does not only count shortest paths passing through the given node, but all possible paths crossing it.

Formally, the random walk betweenness centrality of a node is

where the element of matrix R contains the probability of a random walk starting at node j with absorbing node k, passing through node i.

Calculating random walk betweenness in large networks is computationally very intensive. [5]

Second order centrality

Another random walk based centrality is the second order centrality. [6] Instead of counting the shortest paths passing through a given node (as for random walk betweenness centrality), it focuses on another characteristic of random walks on graphs. The expectation of the standard deviation of the return times of a random walk to a node constitutes its centrality. The lower that deviation, the more central that node is.

Calculating the second order betweenness on large arbitrary graphs is also intensive, as its complexity is (worst case achieved on the Lollipop graph).

See also

Related Research Articles

Travelling salesman problem NP-hard problem in combinatorial optimization

The travelling salesman problem asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

Shortest path problem Computational problem of graph theory

In graph theory, the shortest path problem is the problem of finding a path between two vertices in a graph such that the sum of the weights of its constituent edges is minimized.

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

Centrality Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

In queueing theory, a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution. It was the first significant development in the theory of networks of queues, and generalising and applying the ideas of the theorem to search for similar product-form solutions in other networks has been the subject of much research, including ideas used in the development of the Internet. The networks were first identified by James R. Jackson and his paper was re-printed in the journal Management Science’s ‘Ten Most Influential Titles of Management Sciences First Fifty Years.’

In power engineering, nodal admittance matrix or Y Matrix or Ybus is an N x N matrix describing a linear power system with N buses. It represents the nodal admittance of the buses in a power system. In realistic systems which contain thousands of buses, the Y matrix is quite sparse. Each bus in a real power system is usually connected to only a few other buses through the transmission lines. The Y Matrix is also one of the data requirements needed to formulate a power flow study.

Mixing patterns refer to systematic tendencies of one type of nodes in a network to connect to another type. For instance, nodes might tend to link to others that are very similar or very different. This feature is common in many social networks, although it also appears sometimes in non-social networks. Mixing patterns are closely related to assortativity; however, for the purposes of this article, the term is used to refer to assortative or disassortative mixing based on real-world factors, either topological or sociological.

Different definitions have been given for the dimension of a complex network or graph. For example, metric dimension is defined in terms of the resolving set for a graph. Dimension has also been defined based on the box covering method applied to graphs. Here we describe the definition based on the complex network zeta function. This generalises the definition based on the scaling property of the volume with distance. The best definition depends on the application.

Network science Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

Closeness centrality

In a connected graph, closeness centrality of a node is a measure of centrality in a network, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus, the more central a node is, the closer it is to all other nodes.

The random walker algorithm is an algorithm for image segmentation. In the first description of the algorithm, a user interactively labels a small number of pixels with known labels, e.g., "object" and "background". The unlabeled pixels are each imagined to release a random walker, and the probability is computed that each pixel's random walker first arrives at a seed bearing each label, i.e., if a user places K seeds, each with a different label, then it is necessary to compute, for each pixel, the probability that a random walker leaving the pixel will first arrive at each seed. These probabilities may be determined analytically by solving a system of linear equations. After computing these probabilities for each pixel, the pixel is assigned to the label for which it is most likely to send a random walker. The image is modeled as a graph, in which each pixel corresponds to a node which is connected to neighboring pixels by edges, and the edges are weighted to reflect the similarity between the pixels. Therefore, the random walk occurs on the weighted graph.

Katz centrality

In graph theory, the Katz centrality of a node is a measure of centrality in a network. It was introduced by Leo Katz in 1953 and is used to measure the relative degree of influence of an actor within a social network. Unlike typical centrality measures which consider only the shortest path between a pair of actors, Katz centrality measures influence by taking into account the total number of walks between a pair of actors.

Betweenness centrality Measure of a graphs centrality, based on shortest paths

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through or the sum of the weights of the edges is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

Multidimensional network Networks with multiple kinds of relations

In network theory, multidimensional networks, a special type of multilayer network, are networks with multiple kinds of relations. Increasingly sophisticated attempts to model real-world systems as multidimensional networks have yielded valuable insight in the fields of social network analysis, economics, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.

Efficiency (network science)

In network science, the efficiency of a network is a measure of how efficiently it exchanges information and it is also called communication efficiency. The underlying idea is that the more distant two nodes are in the network, the less efficient their communication will be. The concept of efficiency can be applied to both local and global scales in a network. On a global scale, efficiency quantifies the exchange of information across the whole network where information is concurrently exchanged. The local efficiency quantifies a network's resistance to failure on a small scale. That is the local efficiency of a node characterizes how well information is exchanged by its neighbors when it is removed.

Biased random walk on a graph Structural analysis of a network

In network science, a biased random walk on a graph is a time path process in which an evolving variable jumps from its current state to one of various potential new states; unlike in a pure random walk, the probabilities of the potential new states are unequal.

Event detection for WSN

This article is about event detection for WSN.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

Configuration model

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

In network science, the network entropy is a disorder measure derived from information theory to describe the level of randomness and the amount of information encoded in a graph. It is a relevant metric to quantitatively characterize real complex networks and can also be used to quantify network complexity

References

  1. White, Scott; Smyth, Padhraic (2003). Algorithms for Estimating Relative Importance in Networks (PDF). ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/956750.956782. ISBN   1581137370.
  2. Blöchl F, Theis FJ, Vega-Redondo F, and Fisher E: Vertex Centralities in Input-Output Networks Reveal the Structure of Modern Economies, Physical Review E, 83(4):046127, 2011.
  3. Aidong Zhang: Protein Interaction Networks: Computational Analysis (Cambridge University Press) 2007
  4. Newman, M.E. J.: A measure of betweenness centrality based on random walks. Social Networks, Volume 27, Issue 1, January 2005, Pages 39–54
  5. Kang, U., Papadimitriou, S., Sun, J., and Tong, H.: Centralities in Large Networks: Algorithms and Observations. SIAM International Conference on Data Mining 2011, Mesa, Arizona, USA.
  6. A.-M. Kermarrec, E. Le Merrer, B. Sericola, G. Trédan: Second order centrality: Distributed assessment of nodes criticity in complex networks. Elsevier Computer Communications 34(5):619-628, 2011.