In graph theory and network analysis, node influence metrics are measures that rank or quantify the influence of every node (also called vertex) within a graph. They are related to centrality indices. Applications include measuring the influence of each person in a social network, understanding the role of infrastructure nodes in transportation networks, the Internet, or urban networks, and the participation of a given node in disease dynamics.
The traditional approach to understanding node importance is via centrality indicators. Centrality indices are designed to produce a ranking which accurately identifies the most influential nodes. Since the mid 2000s, however, social scientists and network physicists have begun to question the suitability of centrality indices for understanding node influence. Centralities may indicate the most influential nodes, but they are rather less informative for the vast majority of nodes which are not highly influential.
Borgatti and Everett's 2006 review article [1] showed that the accuracy of centrality indices is highly dependent on network topology. This finding has been repeatedly observed since then. (e.g. [2] [3] ). In 2012, Bauer and colleagues reminded us that centrality indices only rank nodes but do not quantify the difference between them. [4] In 2013, Sikic and colleagues presented strong evidence that centrality indices considerably underestimate the power of non-hub nodes. [5] The reason is quite clear. The accuracy of a centrality measure depends on network topology, but complex networks have heterogeneous topology. Hence a centrality measure which is appropriate for identifying highly influential nodes will most likely be inappropriate for the remainder of the network. [3]
This has inspired the development of novel methods designed to measure the influence of all network nodes. The most general of these are the accessibility, which uses the diversity of random walks to measure how accessible the rest of the network is from a given start node, [6] and the expected force, derived from the expected value of the force of infection generated by a node. [3] Both of these measures can be meaningfully computed from the structure of the network alone.
The Accessibility is derived from the theory of random walks. It measures the diversity of self-avoiding walks which start from a given node. A walk on a network is a sequence of adjacent vertices; a self-avoiding walk visits (lists) each vertex at most once. The original work used simulated walks of length 60 to characterize the network of urban streets in a Brazilian city. [6] It was later formalized as a modified form of hierarchical degree which controls for both transmission probabilities and the diversity of walks of a given fixed length. [7]
The hierarchical degree measures the number of nodes reachable from a start node by performing walks of length . For a fixed and walk type, each of these neighbors is reached with a (potentially different) probability . Given a vector of such probabilities, the accessibility of node at scale is defined
The probabilities can be based on uniform-probability random walks, or additionally modulated by edge weights and/or explicit (per edge) transmission probabilities. [7]
The accessibility has been shown to reveal community structure in urban networks, [6] corresponds to the number of nodes which can be visited in a defined time period, [7] and is predictive of the outcome of epidemiological SIR model spreading processes on networks with large diameter and low density. [2]
The expected force measures node influence from an epidemiological perspective. It is the expected value of the force of infection generated by the node after two transmissions.
The expected force of a node is given by
where the sum is taken over the set of all possible transmission clusters resulting from two transmissions starting from . That is, node and two of its neighbors or , one of its neighbors (called infected) and a neighbor of the infected neighbor. contains all possible orderings of the transmission events, so two clusters may contain the same nodes if they got infected in a different order. is the normalized cluster degree of cluster , that is, the number of edges with exactly one endpoint in cluster .
The definition naturally extends to directed networks by limiting the enumeration by edge direction. Likewise, extension to weighted networks, or networks with heterogeneous transmission probabilities, is a matter of adjusting the normalization of to include the probability that that cluster forms. It is also possible to use more than two transmissions to define the set . [3]
The expected force has been shown to strongly correlate with SI, SIS, and SIR epidemic outcomes over a broad range of network topologies, both simulated and empirical. [3] [8] It has also been used to measure the pandemic potential of world airports, [9] and mentioned in the context of digital payments, [10] ecology, [11] fitness, [12] and project management. [13]
Others suggest metrics which explicitly encode the dynamics of a specified process unfolding on the network. The dynamic influence is the proportion of infinite walks starting from each node, where walk steps are scaled such that the linear dynamics of the system are expected to converge to a non-null steady state. [14] The Impact sums, over increasing walk lengths, the probability of transmission to the end node of the walk and that the end node has not been previously visited by a shorter walk. [4] While both measures well predict the outcome of the dynamical systems they encode, in each case the authors admit that results from one dynamic do not translate to other dynamics.
A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as
A small-world network is a graph characterized by a high clustering coefficient and low distances. On an example of social network, high clustering implies the high probability that two friends of one person are friends themselves. The low distances, on the other hand, mean that there is a short chain of social connections between any two people. Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network, that is:
In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.
The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.
In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But overlapping communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to.
The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used to formulate it in his popular science book Six Degrees.
In applied probability theory, the Simon model is a class of stochastic models that results in a power-law distribution function. It was proposed by Herbert A. Simon to account for the wide range of empirical distributions following a power-law. It models the dynamics of a system of elements with associated counters. In this model the dynamics of the system is based on constant growth via addition of new elements as well as incrementing the counters at a rate proportional to their current values.
Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.
A Boolean network consists of a discrete set of Boolean variables each of which has a Boolean function assigned to it which takes inputs from a subset of those variables and output that determines the state of the variable it is assigned to. This set of functions in effect determines a topology (connectivity) on the set of variables, which then become nodes in a network. Usually, the dynamics of the system is taken as a discrete time series where the state of the entire network at time t+1 is determined by evaluating each variable's function on the state of the network at time t. This may be done synchronously or asynchronously.
Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."
Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.
Evolving networks are networks that change as a function of time. They are a natural extension of network science since almost all real world networks evolve over time, either by adding or removing nodes or links over time. Often all of these processes occur simultaneously, such as in social networks where people make and lose friends over time, thereby creating and destroying edges, and some people become part of new social networks or leave their networks, changing the nodes in the network. Evolving network concepts build on established network theory and are now being introduced into studying networks in many diverse fields.
Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scale-free topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks.
In network theory, multidimensional networks, a special type of multilayer network, are networks with multiple kinds of relations. Increasingly sophisticated attempts to model real-world systems as multidimensional networks have yielded valuable insight in the fields of social network analysis, economics, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.
A temporal network, also known as a time-varying network, is a network whose links are active only at certain points in time. Each link carries information on when it is active, along with other possible characteristics such as a weight. Time-varying networks are of particular relevance to spreading processes, like the spread of information and disease, since each link is a contact opportunity and the time ordering of contacts is included.
In network science, a biased random walk on a graph is a time path process in which an evolving variable jumps from its current state to one of various potential new states; unlike in a pure random walk, the probabilities of the potential new states are unequal.
In network science, a hub is a node with a number of links that greatly exceeds the average. Emergence of hubs is a consequence of a scale-free property of networks. While hubs cannot be observed in a random network, they are expected to emerge in scale-free networks. The uprise of hubs in scale-free networks is associated with power-law distribution. Hubs have a significant impact on the network topology. Hubs can be found in many real networks, such as the brain or the Internet.
In the scale-free network theory, a mediation-driven attachment (MDA) model appears to embody a preferential attachment rule tacitly rather than explicitly. According to MDA rule, a new node first picks a node from the existing network at random and connect itself not with that but with one of the neighbors also picked at random.
In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.
In network science, the network entropy is a disorder measure derived from information theory to describe the level of randomness and the amount of information encoded in a graph. It is a relevant metric to quantitatively characterize real complex networks and can also be used to quantify network complexity