Mathematical models of social learning

Last updated

Mathematical models of social learning aim to model opinion dynamics in social networks. Consider a social network in which people (agents) hold a belief or opinion about the state of something in the world, such as the quality of a particular product, the effectiveness of a public policy, or the reliability of a news agency. In all these settings, people learn about the state of the world via observation or communication with others. Models of social learning try to formalize these interactions to describe how agents process the information received from their friends in the social network. [1] Some of the main questions asked in the literature include: [2]

Contents

  1. whether agents reach a consensus;
  2. whether social learning effectively aggregates scattered information, or put differently, whether the consensus belief matches the true state of the world or not;
  3. how effective media sources, politicians, and prominent agents can be in belief formation of the entire network. In other words, how much room is there for belief manipulation and misinformation?

Bayesian learning

Bayesian learning is a model which assumes that agents update their beliefs using Bayes' rule. Indeed, each agent's belief about different states of the world can be seen as a probability distribution over a set of opinions, and Bayesian updating assumes that this distribution is updated in a statistically optimal manner using Bayes' rule. Moreover, Bayesian models typically make certain demanding assumptions about agents, e.g., that they have a reliable model of the world and that the social learning rule of each agent is common knowledge among all members of the community.

More rigorously, let the underlying state be θ. This parameter could correspond to an opinion among people about a certain social, economic, or political issue. At first, each individual has a prior probability of θ which can be shown by P(θ). This prior could be a result of the agents' personal observations of the world. Then each person updates their belief by receiving some signal s. According to the Bayesian approach, the updating procedure will follow this rule:

where the term is the conditional probability over signal space given the true state of the world. [2]

Non-Bayesian learning

Bayesian learning is often considered the benchmark model for social learning, in which individuals use Bayes' rule to incorporate new pieces of information to their belief. However, it has been shown that such a Bayesian "update" is fairly sophisticated and imposes an unreasonable cognitive load on agents which might not be realistic for human beings. [3]

Therefore, scientists have studied simpler non-Bayesian models, most notably the DeGroot model, introduced by DeGroot in 1974, which is one of the first models for describing how humans interact with each other in a social network. In this setting, there is a true state of the world, and each agent receives a noisy independent signal from this true value and communicates with other agents repeatedly. According to the DeGroot model, each agent takes a weighted average of their neighbors' opinions at each step to update their own belief.

The statistician George E. P. Box once said, "All models are wrong; however, some of them are useful." Along the same lines, the DeGroot model is a fairly simple model but it can provide us with useful insights about the learning process in social networks. Indeed, the simplicity of this model makes it tractable for theoretical studies. Specifically, we can analyze different network structure to see for which structures these naive agents can successfully aggregate decentralized information. Since the DeGroot model can be considered a Markov chain, provided that a network is strongly connected (so there is a direct path from any agent to any other) and satisfies a weak aperiodicity condition, beliefs will converge to a consensus. When consensus is reached, the belief of each agent is a weighted average of agents' initial beliefs. These weights provide a measure of social influence.

In the case of a converging opinion dynamic, the social network is called wise if the consensus belief is equal to the true state of the world. It can be shown that the necessary and sufficient condition for wisdom is that the influence of the most influential agent vanishes as the network grows. The speed of convergence is irrelevant to the wisdom of the social network. [4]

Empirical evaluation of models

Along with the theoretical framework for modeling social learning phenomenon, there has been a great amount of empirical research to assess the explanatory power of these models. In one such experiment, 665 subjects in 19 villages in Karnataka, India, were studied while communicating information with each other to learn the true state of the world. This study attempted to distinguish between two most prominent models of information aggregation in social networks, namely, Bayesian learning and DeGroot learning. The study showed that agents' aggregate behavior is statistically significantly better described by the DeGroot learning model. [3]

Related Research Articles

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function is the probability of observing data assuming is the actual parameter.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior probability contains everything there is to know about an uncertain proposition, given prior knowledge and a mathematical model describing the observations available at a particular time. After the arrival of new information, the current posterior probability may serve as the prior in another round of Bayesian updating.

<span class="mw-page-title-main">Loss function</span> Mathematical relation assigning a probability event to a cost

In mathematical optimization and decision theory, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its opposite, in which case it is to be maximized. The loss function could include terms from several levels of the hierarchy.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

A prior probability distribution of an uncertain quantity, often simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for sampling from a specified multivariate probability distribution when direct sampling from the joint distribution is difficult, but sampling from the conditional distribution is more practical. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out. Empirical Bayes, also known as maximum marginal likelihood, represents a convenient approach for setting hyperparameters, but has been mostly supplanted by fully Bayesian hierarchical analyses since the 2000s with the increasing availability of well-performing computation techniques. It is still commonly used, however, for variational methods in Deep Learning, such as variational autoencoders, where latent variable spaces are high-dimensional.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation.

<span class="mw-page-title-main">Boltzmann machine</span> Type of stochastic recurrent neural network

A Boltzmann machine, named after Ludwig Boltzmann is a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model, that is a stochastic Ising model. It is a statistical physics technique applied in the context of cognitive science. It is also classified as a Markov random field.

In game theory, a Bayesian game is a strategic decision-making model which assumes players have incomplete information. Players hold private information relevant to the game, meaning that the payoffs are not common knowledge. Bayesian games model the outcome of player interactions using aspects of Bayesian probability. They are notable because they allowed, for the first time in game theory, for the specification of the solutions to games with incomplete information.

One-shot learning is an object categorization problem, found mostly in computer vision. Whereas most machine learning-based object categorization algorithms require training on hundreds or thousands of examples, one-shot learning aims to classify objects from one, or only a few, examples. The term few-shot learning is also used for these problems, especially when more than one example is needed.

Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.

Aumann's agreement theorem was stated and proved by Robert Aumann in a paper titled "Agreeing to Disagree", which introduced the set theoretic description of common knowledge. The theorem concerns agents who share a common prior and update their probabilistic beliefs by Bayes' rule. It states that if the probabilistic beliefs of such agents, regarding a fixed event, are common knowledge then these probabilities must coincide. Thus, agents cannot agree to disagree, that is have common knowledge of a disagreement over the posterior probability of a given event.

In Bayesian inference, the Bernstein–von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in the limit of infinite data to a multivariate normal distribution centered at the maximum likelihood estimator with covariance matrix given by , where is the true population parameter and is the Fisher information matrix at the true population parameter value:

<span class="mw-page-title-main">Thompson sampling</span> Type of heuristic technique

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

DeGroot learning refers to a rule-of-thumb type of social learning process. The idea was stated in its general form by the American statistician Morris H. DeGroot; antecedents were articulated by John R. P. French and Frank Harary. The model has been used in physics, computer science and most widely in the theory of social networks.

Bayesian hierarchical modelling is a statistical model written in multiple levels that estimates the parameters of the posterior distribution using the Bayesian method. The sub-models combine to form the hierarchical model, and Bayes' theorem is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.

References

  1. Boroomand, Amin; Smaldino, Paul (2023). "Superiority bias and communication noise can enhance collective problem solving". Journal of Artificial Societies and Social Simulation. 26 (3). doi:10.18564/jasss.5154.
  2. 1 2 Acemoglu, Daron; Ozdaglar, Asuman (2010). "Opinion Dynamics and Learning in Social Networks". Dynamic Games and Applications. 1 (1): 3–49. CiteSeerX   10.1.1.471.6097 . doi:10.1007/s13235-010-0004-1.
  3. 1 2 Chandrasekhar, Arun G.; Larreguy, Horacio; Xandri, Juan Pablo (August 2015). "Testing Models of Social Learning on Networks: Evidence from a Lab Experiment in the Field". NBER Working Paper No. 21468. doi: 10.3386/w21468 .
  4. Golub, Benjamin; Jackson, Matthew (2010). "Naïve Learning in Social Networks and the Wisdom of Crowds". American Economic Journal: Microeconomics. 2 (1): 112–149. CiteSeerX   10.1.1.304.7305 . doi:10.1257/mic.2.1.112.