Rumor spread in social network

Last updated

The spread of rumors is an important form of communication in society. There are two approaches to investigating the rumor spreading process: microscopic models and the macroscopic models. The macroscopic models propose a macro view about this process and are mainly based on the widely-used Daley-Kendall and Maki-Thompson models. Particularly, rumor spread can be viewed as a stochastic process in social networks. By contrast, the microscopic models are more interested on micro-level interactions between individuals.

Contents

Rumor propagation models

In the last few years, there has been a growing interest in rumor propagation in online social networks problems where different approaches have been proposed.

Macroscopic models

The first category is mainly based on the epidemic models. Pioneering research on rumor propagation using these models started during the 1960s. [1]

Epidemic models

A standard model of rumor spreading was introduced by Daley and Kendall. [1] Assume there are N people in total and those people in the network are categorized into three groups: ignorants, spreaders and stiflers, which are denoted as S, I, and R respectively hereinafter:

The rumor is propagated through the population by pair-wise contacts between spreaders and others in the population. Any spreader involved in a pair-wise meeting attempts to “infect” the other individual with the rumor. In the case this other individual is an ignorant, he or she becomes a spreader. In the other two cases, either one or both of those involved in the meeting learn that the rumor is known and decided not to tell the rumor anymore, thereby turning into stiflers.

One variant is the Maki-Thompson model. [2] In this model, rumor is spread by directed contacts of the spreaders with others in the population. Furthermore, when a spreader contacts another spreader only the initiating spreader becomes a stifler. Therefore, three types of interactions can happen with certain rates.

 

 

 

 

(1)

which says when a spreader meet an ignorant, the ignorant will become a spreader.

 

 

 

 

(2)

which says when two spreaders meet with each other, one of them will become a stifler.

 

 

 

 

(3)

which says when a spreader meet a stifler, the spreader will lose the interest in spreading the rumor, so become a stifler.

Of course we always have conservation of individuals:

The change in each class in a small time interval is:

Since we know , and sum up to , we can reduce one equation from the above, which leads to a set of differential equations using relative variable and as follows

which we can write

Compared with the ordinary SIR model, we see that the only difference to the ordinary SIR model is that we have a factor in the first equation instead of just . We immediately see that the ignorants can only decrease since and . Also, if

which means

the rumor model exhibits an “epidemic” even for arbitrarily small rate parameters.

Epidemic models in social networks

We model the process introduced above on a network in discrete time, that is, we can model it as a DTMC. Say we have a network with N nodes, then we can define to be the state of node i at time t. Then is a stochastic process on . At a single moment, some node i and node j interact with each other, and then one of them will change its state. Thus we define the function so that for in , is when the state of network is , node i and node j interact with each other, and one of them will change its state. The transition matrix depends on the number of ties of node i and node j, as well as the state of node i and node j. For any , we try to find . If node i is in state I and node j is in state S, then ; if node i is in state I and node j is in state I, then ; if node i is in state I and node j is in state R, then . For all other , .

The procedure on a network is as follows: [3]

  1. We initial rumor to a single node ;
  2. We pick one of its neighbors as given by the adjacency matrix, so the probability we will pick node is

    where is from the adjacency matrix and if there is a tie from to , and is the degree for node ;
  3. Then have the choice:
    1. If node is an ignorant, it becomes a spreader at a rate ;
    2. If node is a spreader or stifler, then node becomes a stifler at a rate .
  4. We pick another node who is a spreader at random, and repeat the process.

We would expect that this process spreads the rumor throughout a considerable fraction of the network. Note however that if we have a strong local clustering around a node, what can happen is that many nodes become spreaders and have neighbors who are spreaders. Then, every time we pick one of those, they will recover and can extinguish the rumor spread. On the other hand, if we have a network that is small world, that is, a network in which the shortest path between two randomly chosen nodes is much smaller than that one would expect, we can expect the rumor spread far away.

Also we can compute the final number of people who once spread the news, this is given by

In networks the process that does not have a threshold in a well mixed population, exhibits a clear cut phase-transition in small worlds. The following graph illustrates the asymptotic value of as a function of the rewiring probability .

Microscopic models

The microscopic approaches are more focused on interactions between individuals: "who influenced whom."

Models include the independent cascade model, linear threshold model, [4] energy model, [5] HISBmodel, [6] and Galam's Model. [7]

Independent cascades models

Linear threshold models

Energy model

HISBmodel

The HISBmodel is a rumor propagation model that can reproduce a trend of this phenomenon and provide indicators to assess the impact of the rumor to effectively understand the diffusion process and reduce its influence. The variety that exists in human nature makes their decision-making ability pertaining to spreading information unpredictable, which is the primary challenge to model such a complex phenomenon. Hence, this model considers the impact of human individual and social behaviors in the spreading process of the rumors. The HISBmodel proposes an approach that is parallel to other models in the literature and concerned more with how individuals spread rumors. Therefore, it tries to understand the behavior of individuals, as well as their social interactions in OSNs, and highlight their impact on the dissemination of rumors. Thus, the model, attempts to answer the following question: When does an individual spread a rumor? When does an individual accept rumors? In which OSN does this individual spread the rumors?

First, it proposes a formulation of individual behavior towards a rumor analog to damped harmonic motion, which incorporates the opinions of individuals in the propagation process. Furthermore, it establishes rules of rumor transmission between individuals. As a result, it presents the HISBmodel propagation process, where new metrics are introduced to accurately assess the impact of a rumor spreading through OSNs.

Related Research Articles

The propagation constant of a sinusoidal electromagnetic wave is a measure of the change undergone by the amplitude and phase of the wave as it propagates in a given direction. The quantity being measured can be the voltage, the current in a circuit, or a field vector such as electric field strength or flux density. The propagation constant itself measures the dimensionless change in magnitude or phase per unit length. In the context of two-port networks and their cascades, propagation constant measures the change undergone by the source quantity as it propagates from one port to the next.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

The Lotka–Volterra equations, also known as the Lotka–Volterra predator–prey model, are a pair of first-order nonlinear differential equations, frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations:

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

<span class="mw-page-title-main">Centrality</span> Degree of connectedness within a graph

In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

Exponential smoothing or exponential moving average (EMA) is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data.

<span class="mw-page-title-main">Simple linear regression</span> Linear regression model with a single explanatory variable

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

<span class="mw-page-title-main">Beta-binomial distribution</span> Discrete probability distribution

In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.

Cooperative diversity is a cooperative multiple antenna technique for improving or maximising total network channel capacities for any given set of bandwidths which exploits user diversity by decoding the combined signal of the relayed signal and the direct signal in wireless multihop networks. A conventional single hop system uses direct transmission where a receiver decodes the information only based on the direct signal while regarding the relayed signal as interference, whereas the cooperative diversity considers the other signal as contribution. That is, cooperative diversity decodes the information from the combination of two signals. Hence, it can be seen that cooperative diversity is an antenna diversity that uses distributed antennas belonging to each node in a wireless network. Note that user cooperation is another definition of cooperative diversity. User cooperation considers an additional fact that each user relays the other user's signal while cooperative diversity can be also achieved by multi-hop relay networking systems.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

<span class="mw-page-title-main">Multivariate stable distribution</span>

The multivariate stable distribution is a multivariate probability distribution that is a multivariate generalisation of the univariate stable distribution. The multivariate stable distribution defines linear relations between stable distribution marginals. In the same way as for the univariate case, the distribution is defined in terms of its characteristic function.

In applied mathematics and mathematical analysis, the fractal derivative or Hausdorff derivative is a non-Newtonian generalization of the derivative dealing with the measurement of fractals, defined in fractal geometry. Fractal derivatives were created for the study of anomalous diffusion, by which traditional approaches fail to factor in the fractal nature of the media. A fractal measure t is scaled according to tα. Such a derivative is local, in contrast to the similarly applied fractional derivative. Fractal calculus is formulated as a generalization of standard calculus.

<span class="mw-page-title-main">Multidimensional network</span> Networks with multiple kinds of relations

In network theory, multidimensional networks, a special type of multilayer network, are networks with multiple kinds of relations. Increasingly sophisticated attempts to model real-world systems as multidimensional networks have yielded valuable insight in the fields of social network analysis, economics, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.

In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of three major components: the forward process, the reverse process, and the sampling procedure. The goal of diffusion models is to learn a diffusion process that generates a probability distribution for a given dataset from which we can then sample new images. They learn the latent structure of a dataset by modeling the way in which data points diffuse through their latent space.

References

  1. 1 2 Daley, D.J., and Kendal, D.G. 1965 Stochastic rumors, J. Inst. Maths Applics 1, p. 42.
  2. Maki, D.P. 1973 Mathematical Models and Applications, With Emphasis on Social, Life, and Management Sciences, Prentice Hall.
  3. Brockmann, D. 2011 Complex Networks and Systems, Lecture Notes, Northwestern University
  4. [1] D. Kempe, J. Kleinberg, É. Tardos, Maximizing the spread of influence through a social network, Proc. Ninth ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD ’03. (2003) 137. doi:10.1145/956755.956769.
  5. S. Han, F. Zhuang, Q. He, Z. Shi, X. Ao, Energy model for rumor propagation on social networks, Phys. A Stat. Mech. Its Appl. 394 (2014) 99–109. doi:10.1016/j.physa.2013.10.003.
  6. A.I.E. Hosni, K. Li, S. Ahmed, HISBmodel : A Rumor Diffusion Model Based on Human Individual and Social Behaviors in Online Social Networks, in: Springer, 2018.
  7. S. Galam, Modelling rumors: The no plane Pentagon French hoax case, Phys. A Stat. Mech. Its Appl. 320 (2003) 571–580. doi:10.1016/S0378-4371(02)01582-0.