Degree distribution

Last updated

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

Contents

Definition

The degree of a node in a network (sometimes referred to incorrectly as the connectivity) is the number of connections or edges the node has to other nodes. If a network is directed, meaning that edges point in one direction from one node to another node, then nodes have two different degrees, the in-degree, which is the number of incoming edges, and the out-degree, which is the number of outgoing edges.

The degree distribution P(k) of a network is then defined to be the fraction of nodes in the network with degree k. Thus if there are n nodes in total in a network and nk of them have degree k, we have .

The same information is also sometimes presented in the form of a cumulative degree distribution, the fraction of nodes with degree smaller than k, or even the complementary cumulative degree distribution, the fraction of nodes with degree greater than or equal to k (1 - C) if one considers C as the cumulative degree distribution; i.e. the complement of C.

Observed degree distributions

The degree distribution is very important in studying both real networks, such as the Internet and social networks, and theoretical networks. The simplest network model, for example, the (Erdős–Rényi model) random graph, in which each of n nodes is independently connected (or not) with probability p (or 1 − p), has a binomial distribution of degrees k:

(or Poisson in the limit of large n, if the average degree is held fixed). Most networks in the real world, however, have degree distributions very different from this. Most are highly right-skewed, meaning that a large majority of nodes have low degree but a small number, known as "hubs", have high degree. Some networks, notably the Internet, the World Wide Web, and some social networks were argued to have degree distributions that approximately follow a power law: , where γ is a constant. Such networks are called scale-free networks and have attracted particular attention for their structural and dynamical properties. [1] [2] [3] [4] However, a survey of a wide range of real world networks suggests that scale-free networks are rare when assessed using statistically rigorous measures. [5] Some researchers have disputed these findings arguing that the definitions used in the study are inappropriately strict, [6] while others have argued that the precise functional form of the degree distribution is less important than knowing whether the degree distribution is fat-tailed or not. [7] The over-interpretation of specific forms of the degree distribution has also been criticised for failing to consider how networks may evolve over time. [8]

Excess degree distribution

Excess degree distribution is the probability distribution, for a node reached by following an edge, of the number of other edges attached to that node. [9] In other words, it is the distribution of outgoing links from a node reached by following a link.

Suppose a network has a degree distribution , by selecting one node (randomly or not) and going to one of its neighbors (assuming to have one neighbor at least), then the probability of that node to have neighbors is not given by . The reason is that, whenever some node is selected in a heterogeneous network, it is more probable to reach the hubs by following one of the existing neighbors of that node. The true probability of such nodes to have degree is which is called the excess degree of that node. In the configuration model, which correlations between the nodes have been ignored and every node is assumed to be connected to any other nodes in the network with the same probability, the excess degree distribution can be found as: [9]

where is the mean-degree (average degree) of the model. It follows to that fact that the average degree of the neighbor of any node is greater than the average degree of that node. In social networks, it mean that your friends, on average, have more friends than you. This is famous as the friendship paradox. It can be shown that a network can have a giant component, if its average excess degree is larger than one:

Bear in mind that the last two equations are just for the configuration model and to derive the excess degree distribution of a real-word network, we should also add degree correlations into account. [9]

Generating functions method

Generating functions can be used to calculate different properties of random networks. Given the degree distribution and the excess degree distribution of some network, and respectively, it is possible to write two power series in the following forms:

and

can also be obtained from derivatives of :

If we know the generating function for a probability distribution then we can recover the values of by differentiating:

Some properties, e.g. the moments, can be easily calculated from and its derivatives:

And in general: [9]

For Poisson-distributed random networks, such as the ER graph, , that is the reason why the theory of random networks of this type is especially simple. The probability distributions for the 1st and 2nd-nearest neighbors are generated by the functions and . By extension, the distribution of -th neighbors is generated by:

, with iterations of the function acting on itself. [10]

The average number of 1st neighbors, , is and the average number of 2nd neighbors is:

Degree distribution for directed networks

In/out degree distribution for Wikipedia's hyperlink graph (logarithmic scales) Enwiki-degree-distribution.png
In/out degree distribution for Wikipedia's hyperlink graph (logarithmic scales)

In a directed network, each node has some in-degree and some out-degree which are the number of links which have run into and out of that node respectfully. If is the probability that a randomly chosen node has in-degree and out-degree then the generating function assigned to this joint probability distribution can be written with two valuables and as:

Since every link in a directed network must leave some node and enter another, the net average number of links entering a node is zero. Therefore,

,

which implies that, the generation function must satisfy:

where is the mean degree (both in and out) of the nodes in the network;

Using the function , we can again find the generation function for the in/out-degree distribution and in/out-excess degree distribution, as before. can be defined as generating functions for the number of arriving links at a randomly chosen node, and can be defined as the number of arriving links at a node reached by following a randomly chosen link. We can also define generating functions and for the number leaving such a node: [10]

Here, the average number of 1st neighbors, , or as previously introduced as , is and the average number of 2nd neighbors reachable from a randomly chosen node is given by: . These are also the numbers of 1st and 2nd neighbors from which a random node can be reached, since these equations are manifestly symmetric in and . [10]

Degree distribution for signed networks

In a signed network, each node has a positive-degree and a negative degree which are the positive number of links and negative number of links connected to that node respectfully. So and denote negative degree distribution and positive degree distribution of the signed network. [11] [12]

See also

Related Research Articles

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically solve certain differential equations. The idea is to write the solution of the differential equation as a sum of certain "basis functions" and then to choose the coefficients in the sum in order to satisfy the differential equation as well as possible.

In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate thermodynamic variables of the system, such as the total energy, free energy, entropy, and pressure, can be expressed in terms of the partition function or its derivatives. The partition function is dimensionless.

<span class="mw-page-title-main">Random graph</span> Graph generated by a random process

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

<span class="mw-page-title-main">Markov random field</span> Set of random variables

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.

<span class="mw-page-title-main">Giant component</span> Large connected component of a random graph

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

Photon polarization is the quantum mechanical description of the classical polarized sinusoidal plane electromagnetic wave. An individual photon can be described as having right or left circular polarization, or a superposition of the two. Equivalently, a photon can be described as having horizontal or vertical linear polarization, or a superposition of the two.

In quantum computing, quantum finite automata (QFA) or quantum state machines are a quantum analog of probabilistic automata or a Markov decision process. They provide a mathematical abstraction of real-world quantum computers. Several types of automata may be defined, including measure-once and measure-many automata. Quantum finite automata can also be understood as the quantization of subshifts of finite type, or as a quantization of Markov chains. QFAs are, in turn, special cases of geometric finite automata or topological finite automata.

<span class="mw-page-title-main">Boolean network</span> Discrete set of boolean variables

A Boolean network consists of a discrete set of boolean variables each of which has a Boolean function assigned to it which takes inputs from a subset of those variables and output that determines the state of the variable it is assigned to. This set of functions in effect determines a topology (connectivity) on the set of variables, which then become nodes in a network. Usually, the dynamics of the system is taken as a discrete time series where the state of the entire network at time t+1 is determined by evaluating each variable's function on the state of the network at time t. This may be done synchronously or asynchronously.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

<span class="mw-page-title-main">Hilbert space</span> Type of topological vector space

In mathematics, Hilbert spaces allow the methods of linear algebra and calculus to be generalized from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise naturally and frequently in mathematics and physics, typically as function spaces. Formally, a Hilbert space is a vector space equipped with an inner product that induces a distance function for which the space is a complete metric space.

<span class="mw-page-title-main">Thermal fluctuations</span> Random temperature-influenced deviations of particles from their average state

In statistical mechanics, thermal fluctuations are random deviations of an atomic system from its average state, that occur in a system at equilibrium. All thermal fluctuations become larger and more frequent as the temperature increases, and likewise they decrease as temperature approaches absolute zero.

Coherent states have been introduced in a physical context, first as quasi-classical states in quantum mechanics, then as the backbone of quantum optics and they are described in that spirit in the article Coherent states. However, they have generated a huge variety of generalizations, which have led to a tremendous amount of literature in mathematical physics. In this article, we sketch the main directions of research on this line. For further details, we refer to several existing surveys.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.

<span class="mw-page-title-main">Configuration model</span>

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

<span class="mw-page-title-main">Maximum-entropy random graph model</span>

Maximum-entropy random graph models are random graph models used to study complex networks subject to the principle of maximum entropy under a set of structural constraints, which may be global, distributional, or local.

<span class="mw-page-title-main">Soft configuration model</span> Random graph model in applied mathematics

In applied mathematics, the soft configuration model (SCM) is a random graph model subject to the principle of maximum entropy under constraints on the expectation of the degree sequence of sampled graphs. Whereas the configuration model (CM) uniformly samples random graphs of a specific degree sequence, the SCM only retains the specified degree sequence on average over all network realizations; in this sense the SCM has very relaxed constraints relative to those of the CM ("soft" rather than "sharp" constraints). The SCM for graphs of size has a nonzero probability of sampling any graph of size , whereas the CM is restricted to only graphs having precisely the prescribed connectivity structure.

In the context of quantum computing, the quantum walk search is a quantum algorithm for finding a marked node in a graph.

References

  1. Barabási, Albert-László; Albert, Réka (1999-10-15). "Emergence of Scaling in Random Networks". Science. 286 (5439): 509–512. arXiv: cond-mat/9910332 . Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. ISSN   0036-8075. PMID   10521342. S2CID   524106.
  2. Albert, Réka; Barabási, Albert-László (2000-12-11). "Topology of Evolving Networks: Local Events and Universality" (PDF). Physical Review Letters. 85 (24): 5234–5237. arXiv: cond-mat/0005085 . Bibcode:2000PhRvL..85.5234A. doi:10.1103/physrevlett.85.5234. hdl:2047/d20000695. ISSN   0031-9007. PMID   11102229. S2CID   81784. Archived (PDF) from the original on 2018-07-21. Retrieved 2019-09-25.
  3. Dorogovtsev, S. N.; Mendes, J. F. F.; Samukhin, A. N. (2001-05-21). "Size-dependent degree distribution of a scale-free growing network". Physical Review E. 63 (6): 062101. arXiv: cond-mat/0011115 . Bibcode:2001PhRvE..63f2101D. doi:10.1103/physreve.63.062101. ISSN   1063-651X. PMID   11415146. S2CID   119063903.
  4. Pachon, Angelica; Sacerdote, Laura; Yang, Shuyi (2018). "Scale-free behavior of networks with the copresence of preferential and uniform attachment rules". Physica D: Nonlinear Phenomena. 371: 1–12. arXiv: 1704.08597 . Bibcode:2018PhyD..371....1P. doi:10.1016/j.physd.2018.01.005. S2CID   119320331.
  5. Broido, Anna D.; Clauset, Aaron (2019-03-04). "Scale-free networks are rare". Nature Communications. 10 (1): 1017. doi: 10.1038/s41467-019-08746-5 . ISSN   2041-1723. PMC   6399239 . PMID   30833554.
  6. Voitalov, Ivan; van der Hoorn, Pim; van der Hofstad, Remco; Krioukov, Dmitri (18 October 2019). "Scale-free networks well done". Physical Review Research. 1 (3): 033034. doi: 10.1103/PhysRevResearch.1.033034 .
  7. Holme, Petter (2019-03-04). "Rare and everywhere: Perspectives on scale-free networks". Nature Communications. 10 (1): 1016. Bibcode:2019NatCo..10.1016H. doi:10.1038/s41467-019-09038-8. ISSN   2041-1723. PMC   6399274 . PMID   30833568.
  8. Falkenberg, Max; Lee, Jong-Hyeok; Amano, Shun-ichi; Ogawa, Ken-ichiro; Yano, Kazuo; Miyake, Yoshihiro; Evans, Tim S.; Christensen, Kim (18 June 2020). "Identifying time dependence in network growth". Physical Review Research. 2 (2): 023352. arXiv: 2001.09118 . doi: 10.1103/PhysRevResearch.2.023352 .
  9. 1 2 3 4 Newman, Mark (2018-10-18). Networks. Vol. 1. Oxford University Press. doi:10.1093/oso/9780198805090.001.0001. ISBN   978-0-19-880509-0. Archived from the original on 2020-04-15. Retrieved 2020-04-19.
  10. 1 2 3 Newman, M. E. J.; Strogatz, S. H.; Watts, D. J. (2001-07-24). "Random graphs with arbitrary degree distributions and their applications". Physical Review E. 64 (2): 026118. arXiv: cond-mat/0007235 . doi: 10.1103/PhysRevE.64.026118 . ISSN   1063-651X. PMID   11497662.
  11. Saberi M, Khosrowabadi R, Khatibi A, Misic B, Jafari G (January 2021). "Topological impact of negative links on the stability of resting-state brain network". Scientific Reports. 11 (1): 2176. doi:10.1038/s41598-021-81767-7. PMC   7838299 . PMID   33500525.
  12. Ciotti V (2015). "Degree correlations in signed social networks". Physica A: Statistical Mechanics and Its Applications. 422: 25–39. arXiv: 1412.1024 . doi:10.1016/j.physa.2014.11.062. S2CID   4995458. Archived from the original on 2021-10-02. Retrieved 2021-02-10.