Autologistic actor attribute models

Last updated
Progression of the flu (contagion) on a social network

Autologistic actor attribute models (ALAAMs) are a family of statistical models used to model the occurrence of node attributes (individual-level outcomes) in network data. They are frequently used with social network data to model social influence, the process by which connections in a social network influence the outcomes experienced by nodes. The dependent variable can strictly be binary. However, they may be applied to any type of network data that incorporates binary, ordinal or continuous node attributes as dependent variables.

Contents

Background

Autologistic actor attributes models (ALAAMs) are a method for social network analysis. They were originally proposed as alteration of Exponential Random Graph Models (ERGMs) to allow for the study of social influence. [1] ERGMs are a family of statistical models for modeling social selection, how ties within a network form on the basis of node attributes and other ties in the network. ALAAMs adapt the structure of ERGM models, but rather than predicting tie formation based on fixed node attributes, they predict node attributes based on fixed ties. This allows for the modeling of social influence processes, for instance how friendship among adolescents (network ties) may influence whether they smoke (node attributes), influences of networks on other health-related practices, [2] and how attitudes or perceived attitudes may change. [3]

ALAAMs are distinct from other models of social influence on networks, such as epidemic/SIR models, because ALAAMs are used for the analysis of cross-sectional data, observed at only a single point in time.

Nodal attributes can be binary, ordinal, or even continuous. Recently, the software of a Melbourne-based research group has incorporated a multilevel approach for ALAAMs in their MPNet software for directed and undirected networks, as well as valued ties (dyadic attributes). The software strictly does not accept missing variables. Cases will need to be deleted if one of their nodal variables is missing. The software is also not able to study ties 'out of the network cluster.' For example: when pupils in classes not only mention friends in their class, but also friends outside of the class(/school).

An alternative to this model to study a nodal attribute as a dependent variable in cross-sectional data is the Multiple Membership model extension for network analysis (can also be extended to make it longitudinal). Unlike ALAAM, it can be used on a continuous dependent variable, is able to handle missingness, can make use of multiple networks (multiplex) and can take ties 'out of the cluster' into account as well.

Definition

ALAAMs, like ERGMs, are part of the Exponential family of probability models. ALAAMs are exponential models that describe, for a network, a joint probability distribution for whether or not each node in the network exhibits a certain node-level attribute.

where is a vector of weights, associated with , the vector of model parameters, and is a normalization constant to ensure that the probabilities of all possible combination of node attributes sum to one. [4]

Estimation

Estimation of model parameters, and evaluation of standard errors (for the purposes of hypothesis testing), is conducted using Markov chain Monte Carlo maximum likelihood estimation (MCMC-MLE), building on approaches such as the Metropolis–Hastings algorithm. Such approaches are required to estimate the model's parameters across an intractable sample space for moderately-size networks. [5] After model estimation, good-of-fit testing, through the sampling of random networks from the fitted model, should be performed to ensure that the model adequately fits the observed data. [6]

ALAAM estimation, while not perfect, has been demonstrated to be relatively robust to partially missing data, due to random sampling or snowball sampling data collection techniques. [7]

Currently, these algorithms for estimating ALAAMs are implemented in the PNet [8] and MPNet software, published by Melnet, a research group at the University of Melbourne [9]

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

A likelihood function measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In Bayesian probability theory, if, given a likelihood function , the posterior distribution is in the same probability distribution family as the prior probability distribution , the prior and posterior are then called conjugate distributions with respect to that likelihood function and the prior is called a conjugate prior for the likelihood function .

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees. Random forests correct for decision trees' habit of overfitting to their training set.

<span class="mw-page-title-main">Regression dilution</span> Statistical bias in linear regressions

Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero, caused by errors in the independent variable.

Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.

Bootstrapping is a procedure for estimating the distribution of an estimator by resampling one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

<span class="mw-page-title-main">Random geometric graph</span> In graph theory, the mathematically simplest spatial network

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

In statistics, explained variation measures the proportion to which a mathematical model accounts for the variation (dispersion) of a given data set. Often, variation is quantified as variance; then, the more specific term explained variance can be used.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

In probability and statistics, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

<span class="mw-page-title-main">Exponential family random graph models</span>

Exponential Random Graph Models (ERGMs) are a family of statistical models for analyzing data from social and other networks. Examples of networks examined using ERGM include knowledge networks, organizational networks, colleague networks, social media networks, networks of scientific development, and others.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable is known as the natural exponential family of .

References

  1. Daraganova, G., & Robins, G. (2013). Autologistic actor attribute models. Exponential random graph models for social networks: Theory, methods and applications, 102-114.
  2. Fujimoto, K., Wang, P., Flash, C. A., Kuhns, L. M., Zhao, Y., Amith, M., & Schneider, J. A. (2019). Network modeling of PrEP uptake on referral networks and health venue utilization among young men who have sex with men. AIDS and Behavior, 23(7), 1698-1707.
  3. Lusher, D., & Robins, G. (2013). Personal attitudes, perceived attitudes, and social struc-tures: a social selection model. Exponential Random Graph Models for Social Networks: Theory, Methods and Applications. Cambridge University Press, New York, NY.
  4. Daraganova, G., & Robins, G. (2013). Autologistic actor attribute models. Exponential random graph models for social networks: Theory, methods and applications, 102-114.
  5. Snijders, T. A. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 1-40.
  6. Lusher, D., Koskinen, J., & Robins, G. (Eds.). (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge University Press.
  7. Stivala, A. D., Gallagher, H. C., Rolls, D. A., Wang, P., & Robins, G. L. (2020). Using Sampled Network Data With The Autologistic Actor Attribute Model. arXiv preprint arXiv:2002.00849.
  8. Peng Wang, Garry Robins, Philippa Pattison (2009) PNet: program for the simulation and estimation of exponential random graph models. Melbourne School of Psychological Sciences, The University of Melbourne.
  9. "PNet". MelNet. Retrieved 2020-04-29.