Maximum-entropy random graph model

Last updated

Maximum-entropy random graph models are random graph models used to study complex networks subject to the principle of maximum entropy under a set of structural constraints, [1] which may be global, distributional, or local.

Contents

Overview

Any random graph model (at a fixed set of parameter values) results in a probability distribution on graphs, and those that are maximum entropy within the considered class of distributions have the special property of being maximally unbiased null models for network inference [2] (e.g. biological network inference). Each model defines a family of probability distributions on the set of graphs of size (for each for some finite ), parameterized by a collection of constraints on observables defined for each graph (such as fixed expected average degree, degree distribution of a particular form, or specific degree sequence), enforced in the graph distribution alongside entropy maximization by the method of Lagrange multipliers. Note that in this context "maximum entropy" refers not to the entropy of a single graph, but rather the entropy of the whole probabilistic ensemble of random graphs.

Several commonly studied random network models are in fact maximum entropy, for example the ER graphs and (which each have one global constraint on the number of edges), as well as the configuration model (CM). [3] and soft configuration model (SCM) (which each have local constraints, one for each nodewise degree-value). In the two pairs of models mentioned above, an important distinction [4] [5] is in whether the constraint is sharp (i.e. satisfied by every element of the set of size- graphs with nonzero probability in the ensemble), or soft (i.e. satisfied on average across the whole ensemble). The former (sharp) case corresponds to a microcanonical ensemble, [6] the condition of maximum entropy yielding all graphs satisfying as equiprobable; the latter (soft) case is canonical, [7] producing an exponential random graph model (ERGM).

ModelConstraint typeConstraint variableProbability distribution
ER, Sharp, globalTotal edge-count
ER, Soft, globalExpected total edge-count
Configuration model Sharp, localDegree of each vertex,
Soft configuration model Soft, localExpected degree of each vertex,

Canonical ensemble of graphs (general framework)

Suppose we are building a random graph model consisting of a probability distribution on the set of simple graphs with vertices. The Gibbs entropy of this ensemble will be given by

We would like the ensemble-averaged values of observables (such as average degree, average clustering, or average shortest path length) to be tunable, so we impose "soft" constraints on the graph distribution:

where label the constraints. Application of the method of Lagrange multipliers to determine the distribution that maximizes while satisfying , and the normalization condition results in the following: [1]

where is a normalizing constant (the partition function) and are parameters (Lagrange multipliers) coupled to the correspondingly indexed graph observables, which may be tuned to yield graph samples with desired values of those properties, on average; the result is an exponential family and canonical ensemble; specifically yielding an ERGM.

The Erdős–Rényi model

In the canonical framework above, constraints were imposed on ensemble-averaged quantities . Although these properties will on average take on values specifiable by appropriate setting of , each specific instance may have , which may be undesirable. Instead, we may impose a much stricter condition: every graph with nonzero probability must satisfy exactly. Under these "sharp" constraints, the maximum-entropy distribution is determined. We exemplify this with the Erdős–Rényi model .

The sharp constraint in is that of a fixed number of edges , [8] that is , for all graphs drawn from the ensemble (instantiated with a probability denoted ). This restricts the sample space from (all graphs on vertices) to the subset . This is in direct analogy to the microcanonical ensemble in classical statistical mechanics, wherein the system is restricted to a thin manifold in the phase space of all states of a particular energy value.

Upon restricting our sample space to , we have no external constraints (besides normalization) to satisfy, and thus we'll select to maximize without making use of Lagrange multipliers. It is well known that the entropy-maximizing distribution in the absence of external constraints is the uniform distribution over the sample space (see maximum entropy probability distribution), from which we obtain:

where the last expression in terms of binomial coefficients is the number of ways to place edges among possible edges, and thus is the cardinality of .

Generalizations

A variety of maximum-entropy ensembles have been studied on generalizations of simple graphs. These include, for example, ensembles of simplicial complexes, [9] and weighted random graphs with a given expected degree sequence [10]

See also

Related Research Articles

<span class="mw-page-title-main">Entropy (information theory)</span> Expected amount of information needed to specify the output of a stochastic data source

In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data.

<span class="mw-page-title-main">Random graph</span> Graph generated by a random process

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

Quantum statistical mechanics is statistical mechanics applied to quantum mechanical systems. In quantum mechanics a statistical ensemble is described by a density operator S, which is a non-negative, self-adjoint, trace-class operator of trace 1 on the Hilbert space H describing the quantum system. This can be shown under various mathematical formalisms for quantum mechanics. One such formalism is provided by quantum logic.

In statistical mechanics, the microcanonical ensemble is a statistical ensemble that represents the possible states of a mechanical system whose total energy is exactly specified. The system is assumed to be isolated in the sense that it cannot exchange energy or particles with its environment, so that the energy of the system does not change with time.

In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the heat bath, so that the states of the system will differ in total energy.

In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for the most general way to quantify information while preserving additivity for independent events. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

<span class="mw-page-title-main">Degree distribution</span>

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

<span class="mw-page-title-main">Giant component</span> Large connected component of a random graph

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

In information theory, information dimension is an information measure for random vectors in Euclidean space, based on the normalized entropy of finely quantized versions of the random vectors. This concept was first introduced by Alfréd Rényi in 1959.

<span class="mw-page-title-main">Thermal fluctuations</span> Random temperature-influenced deviations of particles from their average state

In statistical mechanics, thermal fluctuations are random deviations of an atomic system from its average state, that occur in a system at equilibrium. All thermal fluctuations become larger and more frequent as the temperature increases, and likewise they decrease as temperature approaches absolute zero.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

Maximal entropy random walk (MERW) is a popular type of biased random walk on a graph, in which transition probabilities are chosen accordingly to the principle of maximum entropy, which says that the probability distribution which best represents the current state of knowledge is the one with largest entropy. While standard random walk chooses for every vertex uniform probability distribution among its outgoing edges, locally maximizing entropy rate, MERW maximizes it globally by assuming uniform probability distribution among all paths in a given graph.

<span class="mw-page-title-main">Configuration model</span>

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

<span class="mw-page-title-main">Soft configuration model</span> Random graph model in applied mathematics

In applied mathematics, the soft configuration model (SCM) is a random graph model subject to the principle of maximum entropy under constraints on the expectation of the degree sequence of sampled graphs. Whereas the configuration model (CM) uniformly samples random graphs of a specific degree sequence, the SCM only retains the specified degree sequence on average over all network realizations; in this sense the SCM has very relaxed constraints relative to those of the CM ("soft" rather than "sharp" constraints). The SCM for graphs of size has a nonzero probability of sampling any graph of size , whereas the CM is restricted to only graphs having precisely the prescribed connectivity structure.

A set of networks that satisfies given structural characteristics can be treated as a network ensemble. Brought up by Ginestra Bianconi in 2007, the entropy of a network ensemble measures the level of the order or uncertainty of a network ensemble.

In network science, the network entropy is a disorder measure derived from information theory to describe the level of randomness and the amount of information encoded in a graph. It is a relevant metric to quantitatively characterize real complex networks and can also be used to quantify network complexity

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

In computational and mathematical biology, a biological lattice-gas cellular automaton (BIO-LGCA) is a discrete model for moving and interacting biological agents, a type of cellular automaton. The BIO-LGCA is based on the lattice-gas cellular automaton (LGCA) model used in fluid dynamics. A BIO-LGCA model describes cells and other motile biological agents as point particles moving on a discrete lattice, thereby interacting with nearby particles. Contrary to classic cellular automaton models, particles in BIO-LGCA are defined by their position and velocity. This allows to model and analyze active fluids and collective migration mediated primarily through changes in momentum, rather than density. BIO-LGCA applications include cancer invasion and cancer progression.

References

  1. 1 2 Park, Juyong; M.E.J. Newman (2004-05-25). "The statistical mechanics of networks". arXiv: cond-mat/0405566 .
  2. van der Hoorn, Pim; Gabor Lippner; Dmitri Krioukov (2017-10-10). "Sparse Maximum-Entropy Random Graphs with a Given Power-Law Degree Distribution". arXiv: 1705.10261 .
  3. Newman, Mark (2010). Networks: An Introduction - Oxford Scholarship. doi:10.1093/acprof:oso/9780199206650.001.0001. ISBN   9780199206650. Archived from the original on 2023-02-04. Retrieved 2018-09-13.
  4. Garlaschelli, Diego; den Hollander, Frank; Roccaverde, Andrea (2018-07-13). "Covariance Structure Behind Breaking of Ensemble Equivalence in Random Graphs". Journal of Statistical Physics. 173 (3–4): 644–662. arXiv: 1711.04273 . Bibcode:2018JSP...173..644G. doi:10.1007/s10955-018-2114-x. ISSN   0022-4715.
  5. Roccaverde, Andrea (August 2018). "Is breaking of ensemble equivalence monotone in the number of constraints?". Indagationes Mathematicae. 30: 7–25. arXiv: 1807.02791 . doi:10.1016/j.indag.2018.08.001. ISSN   0019-3577.
  6. Bianconi, G. (2018-08-21). Multilayer Networks: Structure and Function. Oxford University Press. ISBN   9780198753919. Archived from the original on 2023-02-04. Retrieved 2018-09-13.
  7. Anand, K.; Bianconi, G. (2009). "Entropy measures for networks: Toward an information theory of complex topologies". Physical Review E. 80 (4): 045102. arXiv: 0907.1514 . Bibcode:2009PhRvE..80d5102A. doi:10.1103/PhysRevE.80.045102. PMID   19905379.
  8. Erdős, P.; Rényi, A. (2022). "On Random Graphs. I" (PDF). Publicationes Mathematicae. 6 (3–4): 290–297. doi:10.5486/PMD.1959.6.3-4.12. Archived (PDF) from the original on 2020-08-07. Retrieved 2018-09-13.
  9. Zuev, Konstantin; Or Eisenberg; Dmitri Krioukov (2015-10-29). "Exponential Random Simplicial Complexes". arXiv: 1502.05032 .
  10. Hillar, Christopher; Andre Wibisono (2013-08-26). "Maximum entropy distributions on graphs". arXiv: 1301.3321 .