Exponential family random graph models

Last updated

Exponential Random Graph Models (ERGMs) are a family of statistical models for analyzing data from social and other networks. [1] [2] Examples of networks examined using ERGM include knowledge networks, [3] organizational networks, [4] colleague networks, [5] social media networks, networks of scientific development, [6] and others.

Contents

Background

Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity. [7] [8] However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression. [9] [2] Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties. [10] Degree-preserving randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.

Definition

The Exponential family is a broad family of models for covering many types of data, not just networks. An ERGM is a model from this family which describes networks.

Formally a random graph consists of a set of nodes and a collection of tie variables , indexed by pairs of nodes , where if the nodes are connected by an edge and otherwise. A pair of nodes is called a dyad and a dyad is an edge if .

The basic assumption of these models is that the structure in an observed graph can be explained by a given vector of sufficient statistics which are a function of the observed network and, in some cases, nodal attributes. This way, it is possible to describe any kind of dependence between the undyadic variables:

where is a vector of model parameters associated with and is a normalising constant.

These models represent a probability distribution on each possible network on nodes. However, the size of the set of possible networks for an undirected network (simple graph) of size is . Because the number of possible networks in the set vastly outnumbers the number of parameters which can constrain the model, the ideal probability distribution is the one which maximizes the Gibbs entropy. [11]

Example

Let be a set of three nodes and let be the set of all undirected, loopless graphs on . Loopless implies that for all it is and undirected implies that for all it is , so that there are three binary tie variables () and different graphs in this example.

Define a two-dimensional vector of statistics by , where is defined to be the number of edges in the graph and is defined to be the number of closed triangles in . Finally, let the parameter vector be defined by , so that the probability of every graph in this example is given by:

We note that in this example, there are just four graph isomorphism classes: the graph with zero edges, three graphs with exactly one edge, three graphs with exactly two edges, and the graph with three edges. Since isomorphic graphs have the same number of edges and the same number of triangles, they also have the same probability in this example ERGM. For a representative of each isomorphism class, we first compute the term , which is proportional to the probability of (up to the normalizing constant ).

If is the graph with zero edges, then it is and , so that

If is a graph with exactly one edge, then it is and , so that

If is a graph with exactly two edges, then it is and , so that

If is the graph with exactly three edges, then it is and , so that

The normalizing constant is computed by summing over all eight different graphs . This yields:

Finally, the probability of every graph is given by . Explicitly, we get that the graph with zero edges has probability , every graph with exactly one edge has probability , every graph with exactly two edges has probability , and the graph with exactly three edges has probability in this example.

Intuitively, the structure of graph probabilities in this ERGM example are consistent with typical patterns of social or other networks. The negative parameter () associated with the number of edges implies that - all other things being equal - networks with fewer edges have a higher probability than networks with more edges. This is consistent with the sparsity that is often found in empirical networks, namely that the empirical number of edges typically grows at a slower rate than the maximally possible number of edges. The positive parameter () associated with the number of closed triangles implies that - all other things being equal - networks with more triangles have a higher probability than networks with fewer triangles. This is consistent with a tendency for triadic closure that is often found in certain types of social networks. Compare these patterns with the graph probabilities computed above. The addition of every edge divides the probability by two. However, when going from a graph with two edges to the graph with three edges, the number of triangles increases by one - which additionally multiplies the probability by three.

We note that the explicit calculation of all graph probabilities is only possible since there are so few different graphs in this example. Since the number of different graphs scales exponentially in the number of tie variables - which in turn scales quadratic in the number of nodes -, computing the normalizing constant is in general computationally intractable, already for a moderate number of nodes.

Sampling from an ERGM

Exact sampling from a given ERGM is computationally intractable in general since computing the normalizing constant requires summation over all . Efficient approximate sampling from an ERGM can be done via Markov chains and is applied in current methods to approximate expected values and to estimate ERGM parameters. [12] Informally, given an ERGM on a set of graphs with probability mass function , one selects an initial graph (which might be arbitrarily, or randomly, chosen or might represent an observed network) and implicitly defines transition probabilities (or jump probabilities) , which are the conditional probabilities that the Markov chain is on graph after Step , given that it is on graph after Step . The transition probabilities do not depend on the graphs in earlier steps (), which is a defining property of Markov chains, and they do not depend on , that is, the Markov chain is time-homogeneous. The goal is to define the transition probabilities such that for all it is

independent of the initial graph . If this is achieved, one can run the Markov chain for a large number of steps and then returns the current graph as a random sample from the given ERGM. The probability to return a graph after a finite but large number of update steps is approximately the probability defined by the ERGM.

Current methods for sampling from ERGMs with Markov chains [12] usually define an update step by two sub-steps: first, randomly select a candidate in a neighborhood of the current graph and, second, to accept with a probability that depends on the probability ratio of the current graph and the candidate . (If the candidate is not accepted, the Markov chain remains on the current graph .) If the set of graphs is unconstrained (i.e., contains any combination of values on the binary tie variables), a simple method for candidate selection is to choose one tie variable uniformly at random and to define the candidate by flipping this single variable (i.e., to set ; all other variables take the same value as in ). A common way to define the acceptance probability is to accept with the conditional probability

where the graph probabilities are defined by the ERGM. Crucially, the normalizing constant cancels out in this fraction, so that the acceptance probabilities can be computed efficiently.

Related Research Articles

A likelihood function measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space. It can be used to calculate the informational difference between measurements.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one reference probability distribution P is different from a second probability distribution Q. Mathematically, it is defined as

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

<span class="mw-page-title-main">Giant component</span> Large connected component of a random graph

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

<span class="mw-page-title-main">Random geometric graph</span> In graph theory, the mathematically simplest spatial network

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

Computational anatomy (CA) is a discipline within medical imaging focusing on the study of anatomical shape and form at the visible or gross anatomical scale of morphology. The field is broadly defined and includes foundations in anatomy, applied mathematics and pure mathematics, including medical imaging, neuroscience, physics, probability, and statistics. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. The central focus of the sub-field of computational anatomy within medical imaging is mapping information across anatomical coordinate systems most often dense information measured within a magnetic resonance image (MRI). The introduction of flows into CA, which are akin to the equations of motion used in fluid dynamics, exploit the notion that dense coordinates in image analysis follow the Lagrangian and Eulerian equations of motion. In models based on Lagrangian and Eulerian flows of diffeomorphisms, the constraint is associated to topological properties, such as open sets being preserved, coordinates not crossing implying uniqueness and existence of the inverse mapping, and connected sets remaining connected. The use of diffeomorphic methods grew quickly to dominate the field of mapping methods post Christensen's original paper, with fast and symmetric methods becoming available.

<span class="mw-page-title-main">Maximum-entropy random graph model</span>

Maximum-entropy random graph models are random graph models used to study complex networks subject to the principle of maximum entropy under a set of structural constraints, which may be global, distributional, or local.

<span class="mw-page-title-main">Autologistic actor attribute models</span>

Autologistic actor attribute models (ALAAMs) are a family of statistical models used to model the occurrence of node attributes in network data. They are frequently used with social network data to model social influence, the process by which connections in a social network influence the outcomes experienced by nodes. The dependent variable can strictly be binary. However, they may be applied to any type of network data that incorporates binary, ordinal or continuous node attributes as dependent variables.

References

  1. Lusher, Dean; Koskinen, Johan; Robins, Garry (2012). Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications (Structural Analysis in the Social Sciences). doi:10.1017/CBO9780511894701. ISBN   9780521141383. OCLC   1120539699.
  2. 1 2 Harris, Jenine K (2014). An introduction to exponential random graph modeling. ISBN   9781452220802. OCLC   870698788.
  3. Brennecke, Julia; Rank, Olaf (2017-05-01). "The firm's knowledge network and the transfer of advice among corporate inventors—A multilevel network study". Research Policy. 46 (4): 768–783. doi:10.1016/j.respol.2017.02.002. ISSN   0048-7333.
  4. Harris, Jenine K (2013). "Communication Ties Across the National Network of Local Health Departments". AMEPRE American Journal of Preventive Medicine. 44 (3): 247–253. doi:10.1016/j.amepre.2012.10.028. ISSN   0749-3797. OCLC   4937103196. PMID   23415121.
  5. Brennecke, Julia (2019). "Dissonant Ties in Intraorganizational Networks: Why Individuals Seek Problem-Solving Assistance from Difficult Colleagues". AMJ Academy of Management Journal. ISSN   0001-4273. OCLC   8163488129.
  6. Harris, Jenine K; Luke, Douglas A; Shelton, Sarah C; Zuckerman, Rachael B (2009). "Forty Years of Secondhand Smoke Research. The Gap Between Discovery and Delivery". American Journal of Preventive Medicine. 36 (6): 538–548. doi:10.1016/j.amepre.2009.01.039. ISSN   0749-3797. OCLC   6980180781. PMID   19372026.
  7. Wasserman, Stanley; Faust, Katherine (1994). Social Network Analysis: Methods and Applications. ISBN   978-0-521-38707-1.
  8. Newman, M.E.J. (2003). "The Structure and Function of Complex Networks". SIAM Review. 45 (2): 167–256. arXiv: cond-mat/0303516 . Bibcode:2003SIAMR..45..167N. doi:10.1137/S003614450342480.
  9. Contractor, Noshir; Wasserman, Stanley; Faust, Katherine (2006). "Testing Multitheoretical, Multilevel Hypotheses About Organizational Networks: An Analytic Framework and Empirical Example" (PDF). Academy of Management Review. 31 (3): 681–703. doi:10.5465/AMR.2006.21318925. S2CID   10837327. Archived from the original (PDF) on 2020-02-25.
  10. Robins, G.; Pattison, P.; Kalish, Y.; Lusher, D. (2007). "An introduction to exponential random graph models for social networks". Social Networks. 29 (2): 173–191. doi:10.1016/j.socnet.2006.08.002. hdl: 1959.3/216571 .
  11. Newman, M.E.J. (2010-03-25). "Other Network Models". Networks. pp. 565–585. ISBN   978-0-19-920665-0.
  12. 1 2 Hunter, D. R; Handcock, M. S. (2006). "Inference in curved exponential family models for networks". Journal of Computational and Graphical Statistics. 15 (3): 565–583. CiteSeerX   10.1.1.205.9670 . doi:10.1198/106186006X133069.

Further reading