Configuration model

Last updated
Figure 1. Degree sequence and different network realizations in the configuration model Degree sequence and different realizations in the configuration model.jpg
Figure 1. Degree sequence and different network realizations in the configuration model

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

Contents

Rationale for the model

In the configuration model, the degree of each vertex is pre-defined, rather than having a probability distribution from which the given degree is chosen. [2] As opposed to the Erdős–Rényi model, the degree sequence of the configuration model is not restricted to have a Poisson distribution, the model allows the user to give the network any desired degree distribution.

Algorithm

The following algorithm describes the generation of the model:

  1. Take a degree sequence, i. e. assign a degree to each vertex. The degrees of the vertices are represented as half-links or stubs. The sum of stubs must be even in order to be able to construct a graph (). The degree sequence can be drawn from a theoretical distribution or it can represent a real network (determined from the adjacency matrix of the network).
  2. Choose two stubs uniformly at random and connect them to form an edge. Choose another pair from the remaining stubs and connect them. Continue until you run out of stubs. The result is a network with the pre-defined degree sequence. The realization of the network changes with the order in which the stubs are chosen, they might include cycles (b), self-loops (c) or multi-links (d) (Figure 1).

Self-loops, multi-edges and implications

The algorithm described above matches any stubs with the same probability. The uniform distribution of the matching is an important property in terms of calculating other features of the generated networks. The network generation process does not exclude the event of generating a self-loop or a multi-link. If we designed the process where self-loops and multi-edges are not allowed, the matching of the stubs would not follow a uniform distribution.

The expected total number of multi-links in a configuration model network would be:

where is the n-th moment of the degree distribution. Therefore, the average number of self-loops and multi-links is a constant for some large networks, and the density of self-loops and multi-links, meaning the number per node, goes to zero as as long as is constant and finite. For some power-law degree distributions where the second moment diverges, density of multi-links may not vanish or may do so more slowly than . [2]

A further consequence of self-loops and multi-edges is that not all possible networks are generated with the same probability. In general, all possible realizations can be generated by permuting the stubs of all vertices in every possible way. The number of permutation of the stubs of node is , so the number of realizations of a degree sequence is . This would mean that each realization occurs with the same probability. However, self-loops and multi-edges can change the number of realizations, since permuting self-edges can result an unchanged realization. Given that the number of self-loops and multi-links vanishes as , the variation in probabilities of different realization will be small but present. [2]

Properties

Edge probability

A stub of node can be connected to other stubs (there are stubs altogether, and we have to exclude the one we are currently observing). The vertex has stubs to which node can be connected with the same probability (because of the uniform distribution). The probability of a stub of node being connected to one of these stubs is . Since node has stubs, the probability of being connected to is ( for sufficiently large ). Note that this formula can only be viewed as a probability if , and more precisely it describes the expected number of edges between nodes and . Note that this formula does not apply to the case of self-edges. [2]

Given a configuration model with a degree distribution , the probability of a randomly chosen node having degree is . But if we took one of the vertices to which we can arrive following one of edges of i, the probability of having degree k is . (The probability of reaching a node with degree k is , and there are such nodes.) This fraction depends on as opposed to the degree of the typical node with . Thus, a neighbor of a typical node is expected to have higher degree than the typical node itself. This feature of the configuration model describes well the phenomenon of "my friends having more friends than I do".

Clustering coefficient

The clustering coefficient (the average probability that the neighbors of a node are connected) is computed approximately as follows:

where denotes the probability that a random edge reaches a degree- vertex, and the factors of the form "" rather than "" appear because one stub has been accounted for by the fact that these are neighbors of a common vertex. Evaluating the above results in

Using and , with denoting the degree distribution, denoting the average degree, and denoting the number of vertices, the above becomes

with denoting the second moment of the degree distribution. Assuming that and are constant, the above behaves as

where the constant depends on . [2] Thus, the clustering coefficient becomes small in the limit.

Giant component

In the configuration model, a giant component (GC) exists if

where and are the first and second moments of the degree distribution. That means that, the critical threshold solely depends on quantities which are uniquely determined by the degree distribution .

Configuration model generates locally tree-like networks, meaning that any local neighborhood in such a network takes the form of a tree. More precisely, if you start at any node in the network and form the set of all nodes at distance or less from that starting node, the set will, with probability tending to 1 as n → ∞, take the form of a tree. [3] In tree-like structures, the number of second neighbors averaged over the whole network, , is:

Then, in general, the average number at distance can be written as:

Which implies that if the ratio of is larger than one, then the network can have a giant component. This is famous as the Molloy-Reed criterion. [4] The intuition behind this criterion is that if the giant component (GC) exists, then the average degree of a randomly chosen vertex in a connected component should be at least 2. Molloy-Reed criterion can also be expressed as: which implies that, although the size of the GC may depend on and , the number of nodes of degree 0 and 2 have no contribution in the existence of the giant component. [3]

Diameter

Configuration model can assume any degree distribution and shows the small-world effect, since to leading order the diameter of the configuration model is just . [5]

Components of finite size

As total number of vertices tends to infinity, the probability to find two giant components is vanishing. This means that in the sparse regime, the model consist of one giant component (if any) and multiple connected components of finite size. The sizes of the connected components are characterized by their size distribution - the probability that a randomly sampled vertex belongs to a connected component of size There is a correspondence between the degree distribution and the size distribution When total number of vertices tends to infinity, , the following relation takes place: [6]

where and denotes the -fold convolution power. Moreover, explicit asymptotes for are known when and is close to zero. [6] The analytical expressions for these asymptotes depend on finiteness of the moments of the degree distribution tail exponent (when features a heavy tail), and the sign of Molloy–Reed criterion. The following table summarises these relationships (the constants are provided in [6] ).

Moments of Tail of
light tail-1 or 1
0
heavy tail, -1
0
1

heavy tail, -1
0
1
heavy tail, -1
0
1

heavy tail, 1
heavy tail, 1

Modelling

Comparison to real-world networks

Three general properties of complex networks are heterogeneous degree distribution, short average path length and high clustering. [1] [7] [8] Having the opportunity to define any arbitrary degree sequence, the first condition can be satisfied by design, but as shown above, the global clustering coefficient is an inverse function of the network size, so for large configuration networks, clustering tends to be small. This feature of the baseline model contradicts the known properties of empirical networks, but extensions of the model can solve this issue (see [9] ). All the networks generated by this model are locally tree-like provided the average of the excess degree distribution is either constant or grows slower than the square root of number of links, . In other words, this model prevents forming substructures such as loops in the large size limit. Vanishing of clustering coefficient, is a special case of this more general result. While the tree-like property makes the model not very realistic, so many calculations, such as the generating function methods, are possible for the configuration model thanks to this feature. [3]

Application: modularity calculation

The configuration model is applied as benchmark in the calculation of network modularity. Modularity measures the degree of division of the network into modules. It is computed as follows:

[10]

in which the adjacency matrix of the network is compared to the probability of having an edge between node and (depending on their degrees) in the configuration model (see the page modularity for details).

Directed configuration model

In the DCM (directed configuration model), [11] each node is given a number of half-edges called tails and heads. Then tails and heads are matched uniformly at random to form directed edges. The size of the giant component, [11] [12] the typical distance, [13] and the diameter [14] of DCM have been studied mathematically. There also has been extensive research on random walks on DCM. [15] [16] [17] Some real-world complex networks have been modelled by DCM, such as neural networks, [18] finance [19] and social networks. [20]

Directed configuration model Dcm.svg
Directed configuration model

Related Research Articles

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Random graph</span> Graph generated by a random process

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the heat bath, so that the states of the system will differ in total energy.

<span class="mw-page-title-main">Degree distribution</span>

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

<span class="mw-page-title-main">Giant component</span> Large connected component of a random graph

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

<span class="mw-page-title-main">Barabási–Albert model</span> Scale-free network generation algorithm

The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

<span class="mw-page-title-main">Erdős–Rényi model</span> Two closely related models for generating random graphs

In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959. Edgar Gilbert introduced the other model contemporaneously with and independently of Erdős and Rényi. In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely. In the model introduced by Gilbert, also called the Erdős–Rényi–Gilbert model, each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.

The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated probability measure, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks, and applications such as genomics, corpus linguistics and artificial intelligence, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

<span class="mw-page-title-main">Modularity (networks)</span> Measure of network community structure

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

<span class="mw-page-title-main">Thermal fluctuations</span> Random temperature-influenced deviations of particles from their average state

In statistical mechanics, thermal fluctuations are random deviations of an atomic system from its average state, that occur in a system at equilibrium. All thermal fluctuations become larger and more frequent as the temperature increases, and likewise they decrease as temperature approaches absolute zero.

Evolving networks are dynamic networks that change through time. In each period there are new nodes and edges that join the network while the old ones disappear. Such dynamic behaviour is characteristic for most real-world networks, regardless of their range - global or local. However, networks differ not only in their range but also in their topological structure. It is possible to distinguish:

Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.

Evolution of a random network is a dynamical process in random networks, usually leading to emergence of giant components, accompanied with striking consequences on the network topology.

<span class="mw-page-title-main">Lancichinetti–Fortunato–Radicchi benchmark</span> Algorithm

Lancichinetti–Fortunato–Radicchibenchmark is an algorithm that generates benchmark networks. They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

<span class="mw-page-title-main">Maximum-entropy random graph model</span>

Maximum-entropy random graph models are random graph models used to study complex networks subject to the principle of maximum entropy under a set of structural constraints, which may be global, distributional, or local.

<span class="mw-page-title-main">Soft configuration model</span> Random graph model in applied mathematics

In applied mathematics, the soft configuration model (SCM) is a random graph model subject to the principle of maximum entropy under constraints on the expectation of the degree sequence of sampled graphs. Whereas the configuration model (CM) uniformly samples random graphs of a specific degree sequence, the SCM only retains the specified degree sequence on average over all network realizations; in this sense the SCM has very relaxed constraints relative to those of the CM ("soft" rather than "sharp" constraints). The SCM for graphs of size has a nonzero probability of sampling any graph of size , whereas the CM is restricted to only graphs having precisely the prescribed connectivity structure.

<span class="mw-page-title-main">Activity-driven model</span>

In network science, the activity-driven model is a temporal network model in which each node has a randomly-assigned "activity potential", which governs how it links to other nodes over time.

References

  1. 1 2 Network Science by Albert-László Barabási.
  2. 1 2 3 4 5 Newman, Mark (2010-03-25). Networks: An Introduction – Oxford Scholarship. Oxford University Press. doi:10.1093/acprof:oso/9780199206650.001.0001. ISBN   9780191594175.
  3. 1 2 3 Newman, Mark (2018-10-18). Networks. Vol. 1. Oxford University Press. doi:10.1093/oso/9780198805090.001.0001. ISBN   978-0-19-880509-0.
  4. Molloy, Michael; Reed, Bruce (1995-03-01). "A critical point for random graphs with a given degree sequence". Random Structures & Algorithms. 6 (2–3): 161–180. CiteSeerX   10.1.1.24.6195 . doi:10.1002/rsa.3240060204. ISSN   1098-2418.
  5. Chung, Fan; Lu, Linyuan (2002-12-10). "The average distances in random graphs with given expected degrees". Proceedings of the National Academy of Sciences. 99 (25): 15879–15882. Bibcode:2002PNAS...9915879C. doi: 10.1073/pnas.252631999 . ISSN   0027-8424. PMC   138532 . PMID   12466502.
  6. 1 2 3 Kryven, I (2017). "General expression for the component size distribution in infinite configuration networks". Physical Review E. 95 (5): 052303. arXiv: 1703.05413 . Bibcode:2017PhRvE..95e2303K. doi:10.1103/PhysRevE.95.052303. hdl:11245.1/fa1b270b-61a5-4f20-b496-ddf446fdfe80. PMID   28618550. S2CID   8421307.
  7. Barabási, Albert-László; Albert, Réka (1999-10-15). "Emergence of Scaling in Random Networks". Science. 286 (5439): 509–512. arXiv: cond-mat/9910332 . Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. ISSN   0036-8075. PMID   10521342. S2CID   524106.
  8. Watts, Duncan J.; Strogatz, Steven H. (1998). "Collective dynamics of 'small-world' networks". Nature. 393 (6684): 440–442. Bibcode:1998Natur.393..440W. doi:10.1038/30918. ISSN   1476-4687. PMID   9623998. S2CID   4429113.
  9. Newman, M. E. J. (2009). "Random Graphs with Clustering". Physical Review Letters. 103 (5): 058701. arXiv: 0903.4009 . Bibcode:2009PhRvL.103e8701N. doi:10.1103/physrevlett.103.058701. PMID   19792540. S2CID   28214709.
  10. Newman, M. E. J. (2004). "Finding and evaluating community structure in networks". Physical Review E. 69 (2): 026113. arXiv: cond-mat/0308217 . Bibcode:2004PhRvE..69b6113N. doi:10.1103/physreve.69.026113. PMID   14995526. S2CID   197314.
  11. 1 2 COOPER, COLIN; FRIEZE, ALAN (May 2004). "The Size of the Largest Strongly Connected Component of a Random Digraph with a Given Degree Sequence". Combinatorics, Probability and Computing. 13 (3): 319–337. doi:10.1017/S096354830400611X. ISSN   1469-2163. S2CID   27511938.
  12. Cai, Xing Shi; Perarnau, Guillem (10 April 2020). "The giant component of the directed configuration model revisited". arXiv: 2004.04998 [math.PR].
  13. van der Hoorn, Pim; Olvera-Cravioto, Mariana (June 2018). "Typical distances in the directed configuration model". The Annals of Applied Probability. 28 (3): 1739–1792. arXiv: 1511.04553 . doi:10.1214/17-AAP1342. S2CID   13683470.
  14. Cai, Xing Shi; Perarnau, Guillem (10 March 2020). "The diameter of the directed configuration model". arXiv: 2003.04965 [math.PR].
  15. Bordenave, Charles; Caputo, Pietro; Salez, Justin (1 April 2018). "Random walk on sparse random digraphs". Probability Theory and Related Fields. 170 (3): 933–960. arXiv: 1508.06600 . doi:10.1007/s00440-017-0796-7. ISSN   1432-2064. S2CID   55211047.
  16. Caputo, Pietro; Quattropani, Matteo (1 December 2020). "Stationary distribution and cover time of sparse directed configuration models". Probability Theory and Related Fields. 178 (3): 1011–1066. doi: 10.1007/s00440-020-00995-6 . hdl: 11385/196435 . ISSN   1432-2064. S2CID   202565916.
  17. Cai, Xing Shi; Perarnau, Guillem (14 October 2020). "Minimum stationary values of sparse random directed graphs". arXiv: 2010.07246 [math.PR].
  18. Amini, Hamed (1 November 2010). "Bootstrap Percolation in Living Neural Networks". Journal of Statistical Physics. 141 (3): 459–475. arXiv: 0910.0627 . Bibcode:2010JSP...141..459A. doi:10.1007/s10955-010-0056-z. ISSN   1572-9613. S2CID   7601022.
  19. Amini, Hamed; Minca, Andreea (2013). "Mathematical Modeling of Systemic Risk". Advances in Network Analysis and its Applications. Mathematics in Industry. Vol. 18. Springer. pp. 3–26. doi:10.1007/978-3-642-30904-5_1. ISBN   978-3-642-30903-8. S2CID   166867930.
  20. Li, Hui (July 2018). "Attack Vulnerability of Online Social Networks". 2018 37th Chinese Control Conference (CCC). pp. 1051–1056. doi:10.23919/ChiCC.2018.8482277. ISBN   978-988-15639-5-8. S2CID   52933445.