Rumor spread in social network

Last updated

The spread of rumors is an important form of communication in society. There are two approaches to investigating the rumor spreading process: microscopic models and the macroscopic models. The macroscopic models propose a macro view about this process and are mainly based on the widely-used Daley-Kendall and Maki-Thompson models. Particularly, rumor spread can be viewed as a stochastic process in social networks. By contrast, the microscopic models are more interested on micro-level interactions between individuals.

Contents

Rumor propagation models

In the last few years, there has been a growing interest in rumor propagation in online social networks problems where different approaches have been proposed.

Macroscopic models

The first category is mainly based on the epidemic models. Pioneering research on rumor propagation using these models started during the 1960s. [1]

Epidemic models

A standard model of rumor spreading was introduced by Daley and Kendall. [1] Assume there are N people in total and those people in the network are categorized into three groups: ignorants, spreaders and stiflers, which are denoted as S, I, and R respectively hereinafter (in correspondance with the SIR model):

The rumor is propagated through the population by pair-wise contacts between spreaders and others in the population. Any spreader involved in a pair-wise meeting attempts to “infect” the other individual with the rumor. In the case this other individual is an ignorant, he or she becomes a spreader. In the other two cases, either one or both of those involved in the meeting learn that the rumor is known and decided not to tell the rumor anymore, thereby turning into stiflers.

One variant is the Maki-Thompson model. [2] In this model, rumor is spread by directed contacts of the spreaders with others in the population. Furthermore, when a spreader contacts another spreader only the initiating spreader becomes a stifler. Therefore, three types of interactions can happen with certain rates.

which says when a spreader meet an ignorant, the ignorant will become a spreader.
which says when two spreaders meet with each other, one of them will become a stifler.
which says when a spreader meet a stifler, the spreader will lose the interest in spreading the rumor, so become a stifler.

Of course we always have conservation of individuals:

The change in each class in a small time interval is:

Since we know , and sum up to , we can reduce one equation from the above, which leads to a set of differential equations using relative variable and as follows

which we can write

Compared with the ordinary SIR model, we see that the only difference to the ordinary SIR model is that we have a factor in the first equation instead of just . We immediately see that the ignorants can only decrease since and . Also, if

which means

the rumor model exhibits an “epidemic” even for arbitrarily small rate parameters.

Epidemic models in social networks

We model the process introduced above on a network in discrete time, that is, we can model it as a DTMC. Say we have a network with N nodes, then we can define to be the state of node i at time t. Then is a stochastic process on . At a single moment, some node i and node j interact with each other, and then one of them will change its state. Thus we define the function so that for in , is when the state of network is , node i and node j interact with each other, and one of them will change its state. The transition matrix depends on the number of ties of node i and node j, as well as the state of node i and node j. For any , we try to find . If node i is in state I and node j is in state S, then ; if node i is in state I and node j is in state I, then ; if node i is in state I and node j is in state R, then . For all other , .

The procedure on a network is as follows: [3]

  1. We initial rumor to a single node ;
  2. We pick one of its neighbors as given by the adjacency matrix, so the probability we will pick node is

    where is from the adjacency matrix and if there is a tie from to , and is the degree for node ;
  3. Then have the choice:
    1. If node is an ignorant, it becomes a spreader at a rate ;
    2. If node is a spreader or stifler, then node becomes a stifler at a rate .
  4. We pick another node who is a spreader at random, and repeat the process.

We would expect that this process spreads the rumor throughout a considerable fraction of the network. Note however that if we have a strong local clustering around a node, what can happen is that many nodes become spreaders and have neighbors who are spreaders. Then, every time we pick one of those, they will recover and can extinguish the rumor spread. On the other hand, if we have a network that is small world, that is, a network in which the shortest path between two randomly chosen nodes is much smaller than that one would expect, we can expect the rumor spread far away.

Also we can compute the final number of people who once spread the news, this is given by

In networks the process that does not have a threshold in a well mixed population, exhibits a clear cut phase-transition in small worlds. The following graph illustrates the asymptotic value of as a function of the rewiring probability .

Microscopic models

The microscopic approaches are more focused on interactions between individuals: "who influenced whom."

Models include the independent cascade model, linear threshold model, [4] energy model, [5] HISBmodel, [6] and Galam's Model. [7]

Independent cascades models

Linear threshold models

Energy model

HISBmodel

The HISBmodel is a rumor propagation model that can reproduce a trend of this phenomenon and provide indicators to assess the impact of the rumor to effectively understand the diffusion process and reduce its influence. The variety that exists in human nature makes their decision-making ability pertaining to spreading information unpredictable, which is the primary challenge to model such a complex phenomenon. Hence, this model considers the impact of human individual and social behaviors in the spreading process of the rumors. The HISBmodel proposes an approach that is parallel to other models in the literature and concerned more with how individuals spread rumors. Therefore, it tries to understand the behavior of individuals, as well as their social interactions in OSNs, and highlight their impact on the dissemination of rumors. Thus, the model, attempts to answer the following question: When does an individual spread a rumor? When does an individual accept rumors? In which OSN does this individual spread the rumors?

First, it proposes a formulation of individual behavior towards a rumor analog to damped harmonic motion, which incorporates the opinions of individuals in the propagation process. Furthermore, it establishes rules of rumor transmission between individuals. As a result, it presents the HISBmodel propagation process, where new metrics are introduced to accurately assess the impact of a rumor spreading through OSNs.

Related Research Articles

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In mathematics, in set theory, the constructible universe, denoted by , is a particular class of sets that can be described entirely in terms of simpler sets. is the union of the constructible hierarchy. It was introduced by Kurt Gödel in his 1938 paper "The Consistency of the Axiom of Choice and of the Generalized Continuum-Hypothesis". In this paper, he proved that the constructible universe is an inner model of ZF set theory, and also that the axiom of choice and the generalized continuum hypothesis are true in the constructible universe. This shows that both propositions are consistent with the basic axioms of set theory, if ZF itself is consistent. Since many other theorems only hold in systems in which one or both of the propositions is true, their consistency is an important result.

The Lotka–Volterra equations, also known as the Lotka–Volterra predator–prey model, are a pair of first-order nonlinear differential equations, frequently used to describe the dynamics of biological systems in which two species interact, one as a predator and the other as prey. The populations change through time according to the pair of equations:

A continuous-time Markov chain (CTMC) is a continuous stochastic process in which, for each state, the process will change state according to an exponential random variable and then move to a different state as specified by the probabilities of a stochastic matrix. An equivalent formulation describes the process as changing state according to the least value of a set of exponential random variables, one for each possible state it can move to, with the parameters determined by the current state.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In numerical analysis, the Crank–Nicolson method is a finite difference method used for numerically solving the heat equation and similar partial differential equations. It is a second-order method in time. It is implicit in time, can be written as an implicit Runge–Kutta method, and it is numerically stable. The method was developed by John Crank and Phyllis Nicolson in the mid 20th century.

In general relativity, the Gibbons–Hawking–York boundary term is a term that needs to be added to the Einstein–Hilbert action when the underlying spacetime manifold has a boundary.

<span class="mw-page-title-main">Simple linear regression</span> Linear regression model with a single explanatory variable

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

<span class="mw-page-title-main">Assortativity</span> Tendency for similar nodes to be connected

Assortativity, or assortative mixing, is a preference for a network's nodes to attach to others that are similar in some way. Though the specific measure of similarity may vary, network theorists often examine assortativity in terms of a node's degree. The addition of this characteristic to network models more closely approximates the behaviors of many real world networks.

In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and almost twice the number of parameters. Random vectors with a GD distribution are completely neutral.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

The Luttinger–Kohn model is a flavor of the k·p perturbation theory used for calculating the structure of multiple, degenerate electronic bands in bulk and quantum well semiconductors. The method is a generalization of the single band k·p theory.

<span class="mw-page-title-main">Multivariate stable distribution</span>

The multivariate stable distribution is a multivariate probability distribution that is a multivariate generalisation of the univariate stable distribution. The multivariate stable distribution defines linear relations between stable distribution marginals. In the same way as for the univariate case, the distribution is defined in terms of its characteristic function.

The Einstein–Hilbert action for general relativity was first formulated purely in terms of the space-time metric. To take the metric and affine connection as independent variables in the action principle was first considered by Palatini. It is called a first order formulation as the variables to vary over involve only up to first derivatives in the action and so doesn't overcomplicate the Euler–Lagrange equations with higher derivative terms. The tetradic Palatini action is another first-order formulation of the Einstein–Hilbert action in terms of a different pair of independent variables, known as frame fields and the spin connection. The use of frame fields and spin connections are essential in the formulation of a generally covariant fermionic action which couples fermions to gravity when added to the tetradic Palatini action.

<span class="mw-page-title-main">Multidimensional network</span> Networks with multiple kinds of relations

In network theory, multidimensional networks, a special type of multilayer network, are networks with multiple kinds of relations. Increasingly sophisticated attempts to model real-world systems as multidimensional networks have yielded valuable insight in the fields of social network analysis, economics, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.

This article summarizes several identities in exterior calculus, a mathematical notation used in differential geometry.

<span class="mw-page-title-main">Hyperbolastic functions</span> Mathematical functions

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

References

  1. 1 2 Daley, D.J., and Kendal, D.G. 1965 Stochastic rumors, J. Inst. Maths Applics 1, p. 42.
  2. Maki, D.P. 1973 Mathematical Models and Applications, With Emphasis on Social, Life, and Management Sciences, Prentice Hall.
  3. Brockmann, D. 2011 Complex Networks and Systems, Lecture Notes, Northwestern University
  4. [1] D. Kempe, J. Kleinberg, É. Tardos, Maximizing the spread of influence through a social network, Proc. Ninth ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD ’03. (2003) 137. doi:10.1145/956755.956769.
  5. S. Han, F. Zhuang, Q. He, Z. Shi, X. Ao, Energy model for rumor propagation on social networks, Phys. A Stat. Mech. Its Appl. 394 (2014) 99–109. doi:10.1016/j.physa.2013.10.003.
  6. A.I.E. Hosni, K. Li, S. Ahmed, HISBmodel : A Rumor Diffusion Model Based on Human Individual and Social Behaviors in Online Social Networks, in: Springer, 2018.
  7. S. Galam, Modelling rumors: The no plane Pentagon French hoax case, Phys. A Stat. Mech. Its Appl. 320 (2003) 571–580. doi:10.1016/S0378-4371(02)01582-0.