Preferential attachment

Last updated
Graph generated using preferential attachment. A small number of nodes have a large number of incoming edges, whereas a large number of nodes have a small number of incoming edges. Graph with Preferential attachment.png
Graph generated using preferential attachment. A small number of nodes have a large number of incoming edges, whereas a large number of nodes have a small number of incoming edges.

A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. "Preferential attachment" is only the most recent of many names that have been given to such processes. They are also referred to under the names Yule process, cumulative advantage, the rich get richer, and the Matthew effect. They are also related to Gibrat's law. The principal reason for scientific interest in preferential attachment is that it can, under suitable circumstances, generate power law distributions. [1] If preferential attachment is non-linear, measured distributions may deviate from a power law. [2] These mechanisms may generate distributions which are approximately power law over transient periods. [3] [4]

Contents

Definition

A preferential attachment process is a stochastic urn process, meaning a process in which discrete units of wealth, usually called "balls", are added in a random or partly random fashion to a set of objects or containers, usually called "urns". A preferential attachment process is an urn process in which additional balls are added continuously to the system and are distributed among the urns as an increasing function of the number of balls the urns already have. In the most commonly studied examples, the number of urns also increases continuously, although this is not a necessary condition for preferential attachment and examples have been studied with constant or even decreasing numbers of urns.

A classic example of a preferential attachment process is the growth in the number of species per genus in some higher taxon of biotic organisms. [5] New genera ("urns") are added to a taxon whenever a newly appearing species is considered sufficiently different from its predecessors that it does not belong in any of the current genera. New species ("balls") are added as old ones speciate (i.e., split in two) and, assuming that new species belong to the same genus as their parent (except for those that start new genera), the probability that a new species is added to a genus will be proportional to the number of species the genus already has. This process, first studied by British statistician Udny Yule, is a linear preferential attachment process, since the rate at which genera accrue new species is linear in the number they already have.

Linear preferential attachment processes in which the number of urns increases are known to produce a distribution of balls over the urns following the so-called Yule distribution. In the most general form of the process, balls are added to the system at an overall rate of m new balls for each new urn. Each newly created urn starts out with k0 balls and further balls are added to urns at a rate proportional to the number k that they already have plus a constant a > k0. With these definitions, the fraction P(k) of urns having k balls in the limit of long time is given by [6]

for k  k0 (and zero otherwise), where B(x, y) is the Euler beta function:

with Γ(x) being the standard gamma function, and

The beta function behaves asymptotically as B(x, y) ~ xy for large x and fixed y, which implies that for large values of k we have

In other words, the preferential attachment process generates a "long-tailed" distribution following a Pareto distribution or power law in its tail. This is the primary reason for the historical interest in preferential attachment: the species distribution and many other phenomena are observed empirically to follow power laws and the preferential attachment process is a leading candidate mechanism to explain this behavior. Preferential attachment is considered a possible candidate for, among other things, the distribution of the sizes of cities, [7] the wealth of extremely wealthy individuals, [7] the number of citations received by learned publications, [8] and the number of links to pages on the World Wide Web. [1]

The general model described here includes many other specific models as special cases. In the species/genus example above, for instance, each genus starts out with a single species (k0 = 1) and gains new species in direct proportion to the number it already has (a = 0), and hence P(k) = B(k, γ)/B(k0, γ  1) with γ=2 + 1/m. Similarly the Price model for scientific citations [8] corresponds to the case k0 = 0, a = 1 and the widely studied Barabási-Albert model [1] corresponds to k0 = m, a = 0.

Preferential attachment is sometimes referred to as the Matthew effect, but the two are not precisely equivalent. The Matthew effect, first discussed by Robert K. Merton, [9] is named for a passage in the biblical Gospel of Matthew: "For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him." (Matthew 25:29, New International Version.) The preferential attachment process does not incorporate the taking away part. This point may be moot, however, since the scientific insight behind the Matthew effect is in any case entirely different. Qualitatively it is intended to describe not a mechanical multiplicative effect like preferential attachment but a specific human behavior in which people are more likely to give credit to the famous than to the little known. The classic example of the Matthew effect is a scientific discovery made simultaneously by two different people, one well known and the other little known. It is claimed that under these circumstances people tend more often to credit the discovery to the well-known scientist. Thus the real-world phenomenon the Matthew effect is intended to describe is quite distinct from (though certainly related to) preferential attachment.

History

The first rigorous consideration of preferential attachment seems to be that of Udny Yule in 1925, who used it to explain the power-law distribution of the number of species per genus of flowering plants. [5] The process is sometimes called a "Yule process" in his honor. Yule was able to show that the process gave rise to a distribution with a power-law tail, but the details of his proof are, by today's standards, contorted and difficult, since the modern tools of stochastic process theory did not yet exist and he was forced to use more cumbersome methods of proof.

Most modern treatments of preferential attachment make use of the master equation method, whose use in this context was pioneered by Simon in 1955, in work on the distribution of sizes of cities and other phenomena. [7]

The first application of preferential attachment to learned citations was given by Price in 1976. [8] (He referred to the process as a "cumulative advantage" process.) His was also the first application of the process to the growth of a network, producing what would now be called a scale-free network. It is in the context of network growth that the process is most frequently studied today. Price also promoted preferential attachment as a possible explanation for power laws in many other phenomena, including Lotka's law of scientific productivity and Bradford's law of journal use.

The application of preferential attachment to the growth of the World Wide Web was proposed by Barabási and Albert in 1999. [1] Barabási and Albert also coined the name "preferential attachment" by which the process is best known today and suggested that the process might apply to the growth of other networks as well. For growing networks, the precise functional form of preferential attachment can be estimated by maximum likelihood estimation. [10]

See also

Related Research Articles

<span class="mw-page-title-main">Scale-free network</span> Network whose degree distribution follows a power law

A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as

<span class="mw-page-title-main">Unified neutral theory of biodiversity</span> Theory of evolutionary biology

The unified neutral theory of biodiversity and biogeography is a theory and the title of a monograph by ecologist Stephen P. Hubbell. It aims to explain the diversity and relative abundance of species in ecological communities. Like other neutral theories of ecology, Hubbell assumes that the differences between members of an ecological community of trophically similar species are "neutral", or irrelevant to their success. This implies that niche differences do not influence abundance and the abundance of each species follows a random walk. The theory has sparked controversy, and some authors consider it a more complex version of other null models that fit the data better.

<span class="mw-page-title-main">Udny Yule</span> British statistician and geneticist

George Udny Yule FRS, usually known as Udny Yule, was a British statistician, particularly known for the Yule distribution and proposing the preferential attachment model for random graphs.

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

<span class="mw-page-title-main">Yule–Simon distribution</span> Discrete probability distribution

In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.

<span class="mw-page-title-main">Barabási–Albert model</span> Scale-free network generation algorithm

The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.

In applied probability theory, the Simon model is a class of stochastic models that results in a power-law distribution function. It was proposed by Herbert A. Simon to account for the wide range of empirical distributions following a power-law. It models the dynamics of a system of elements with associated counters. In this model the dynamics of the system is based on constant growth via addition of new elements as well as incrementing the counters at a rate proportional to their current values.

In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and almost twice the number of parameters. Random vectors with a GD distribution are completely neutral.

<span class="mw-page-title-main">Network science</span> Academic field

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

In statistics, a Pólya urn model, named after George Pólya, is a family of urn models that can be used to interpret many commonly used statistical models.

In probability theory, a beta negative binomial distribution is the probability distribution of a discrete random variable  equal to the number of failures needed to get successes in a sequence of independent Bernoulli trials. The probability of success on each trial stays constant within any given experiment but varies across different experiments following a beta distribution. Thus the distribution is a compound probability distribution.

<span class="mw-page-title-main">Evolving network</span>

Evolving networks are networks that change as a function of time. They are a natural extension of network science since almost all real world networks evolve over time, either by adding or removing nodes or links over time. Often all of these processes occur simultaneously, such as in social networks where people make and lose friends over time, thereby creating and destroying edges, and some people become part of new social networks or leave their networks, changing the nodes in the network. Evolving network concepts build on established network theory and are now being introduced into studying networks in many diverse fields.

<span class="mw-page-title-main">Hierarchical network model</span>

Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scale-free topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks.

Price's model is a mathematical model for the growth of citation networks. It was the first model which generalized the Simon model to be used for networks, especially for growing networks. Price's model belongs to the broader class of network growing models whose primary target is to explain the origination of networks with strongly skewed degree distributions. The model picked up the ideas of the Simon model reflecting the concept of rich get richer, also known as the Matthew effect. Price took the example of a network of citations between scientific papers and expressed its properties. His idea was that the way an old vertex gets new edges should be proportional to the number of existing edges the vertex already has. This was referred to as cumulative advantage, now also known as preferential attachment. Price's work is also significant in providing the first known example of a scale-free network. His ideas were used to describe many real-world networks such as the Web.

Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.

<span class="mw-page-title-main">Hub (network science)</span> Node with a number of links that greatly exceeds the average

In network science, a hub is a node with a number of links that greatly exceeds the average. Emergence of hubs is a consequence of a scale-free property of networks. While hubs cannot be observed in a random network, they are expected to emerge in scale-free networks. The uprise of hubs in scale-free networks is associated with power-law distribution. Hubs have a significant impact on the network topology. Hubs can be found in many real networks, such as the brain or the Internet.

In a scale-free network the degree distribution follows a power law function. In some empirical examples this power-law fits the degree distribution well only in the high degree region, however for small degree nodes the empirical degree-distribution deviates from it. See for example the network of scientific citations. This deviation of the observed degree-distribution from the theoretical prediction at the low-degree region is often referred as low-degree saturation.

The initial attractiveness is a possible extension of the Barabási–Albert model. The Barabási–Albert model generates scale-free networks where the degree distribution can be described by a pure power law. However, the degree distribution of most real life networks cannot be described by a power law solely. The most common discrepancies regarding the degree distribution found in real networks are the high degree cut-off and the low degree saturation. The inclusion of initial attractiveness in the Barabási–Albert model addresses the low-degree saturation phenomenon.

<span class="mw-page-title-main">Mediation-driven attachment model</span> Mathematics concept

In the scale-free network theory, a mediation-driven attachment (MDA) model appears to embody a preferential attachment rule tacitly rather than explicitly. According to MDA rule, a new node first picks a node from the existing network at random and connect itself not with that but with one of the neighbors also picked at random.

In the analysis of social networks, the uniform-preferential-attachment model, or UPA model is a variation of the Barabási–Albert model in which the preferential attachment is perceived as having a double nature. New nodes joining the network may either attach themselves with high-degree nodes or with most recently added nodes. This behaviour can be noticed in some examples of social networks, such as the citation network of scientific publications.

References

  1. 1 2 3 4 Barabási, A.-L.; R. Albert (1999). "Emergence of scaling in random networks". Science. 286 (5439): 509–512. arXiv: cond-mat/9910332 . Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. PMID   10521342. S2CID   524106.
  2. Krapivsky, P. L.; Redner, S.; Leyvraz, F. (20 November 2000). "Connectivity of Growing Random Networks". Physical Review Letters. 85 (21): 4629–4632. arXiv: cond-mat/0005139 . doi:10.1103/PhysRevLett.85.4629. PMID   11082613. S2CID   16251662.
  3. Krapivsky, Paul; Krioukov, Dmitri (21 August 2008). "Scale-free networks as preasymptotic regimes of superlinear preferential attachment". Physical Review E. 78 (2): 026114. arXiv: 0804.1366 . doi:10.1103/PhysRevE.78.026114. PMID   18850904. S2CID   14292535.
  4. Falkenberg, Max; Lee, Jong-Hyeok; Amano, Shun-ichi; Ogawa, Ken-ichiro; Yano, Kazuo; Miyake, Yoshihiro; Evans, Tim S.; Christensen, Kim (18 June 2020). "Identifying time dependence in network growth". Physical Review Research. 2 (2): 023352. arXiv: 2001.09118 . doi: 10.1103/PhysRevResearch.2.023352 .
  5. 1 2 Yule, G. U. (1925). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S". Philosophical Transactions of the Royal Society B . 213 (402–410): 21–87. doi: 10.1098/rstb.1925.0002 .
  6. Newman, M. E. J. (2005). "Power laws, Pareto distributions and Zipf's law". Contemporary Physics. 46 (5): 323–351. arXiv: cond-mat/0412004 . Bibcode:2005ConPh..46..323N. doi:10.1080/00107510500052444. S2CID   202719165.
  7. 1 2 3 Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika. 42 (3–4): 425–440. doi:10.1093/biomet/42.3-4.425.
  8. 1 2 3 Price, D. J. de S. (1976). "A general theory of bibliometric and other cumulative advantage processes" (PDF). J. Amer. Soc. Inform. Sci. 27 (5): 292–306. doi:10.1002/asi.4630270505. Archived (PDF) from the original on 2020-12-01. Retrieved 2008-07-19.
  9. Merton, Robert K. (1968). "The Matthew effect in science". Science. 159 (3810): 56–63. Bibcode:1968Sci...159...56M. doi:10.1126/science.159.3810.56. PMID   17737466. S2CID   3526819.
  10. Pham, Thong; Sheridan, Paul; Shimodaira, Hidetoshi (September 17, 2015). "PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks". PLOS ONE. 10 (9): e0137796. Bibcode:2015PLoSO..1037796P. doi: 10.1371/journal.pone.0137796 . PMC   4574777 . PMID   26378457.