Part of a series on | ||||
Network science | ||||
---|---|---|---|---|
Network types | ||||
Graphs | ||||
| ||||
Models | ||||
| ||||
| ||||
A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as
where is a parameter whose value is typically in the range (wherein the second moment (scale parameter) of is infinite but the first moment is finite), although occasionally it may lie outside these bounds. [1] [2] The name "scale-free" could be explained by the fact that some moments of the degree distribution are not defined, so that the network does not have a characteristic scale or "size".
Many networks have been reported to be scale-free, although statistical analysis has refuted many of these claims and seriously questioned others. [3] [4] Additionally, some have argued that simply knowing that a degree-distribution is fat-tailed is more important than knowing whether a network is scale-free according to statistically rigorous definitions. [5] [6] Preferential attachment and the fitness model have been proposed as mechanisms to explain conjectured power law degree distributions in real networks. Alternative models such as super-linear preferential attachment and second-neighbour preferential attachment may appear to generate transient scale-free networks, but the degree distribution deviates from a power law as networks become very large. [7] [8]
In studies of the networks of citations between scientific papers, Derek de Solla Price showed in 1965 that the number of links to papers—i.e., the number of citations they receive—had a heavy-tailed distribution following a Pareto distribution or power law, and thus that the citation network is scale-free. He did not however use the term "scale-free network", which was not coined until some decades later. In a later paper in 1976, Price also proposed a mechanism to explain the occurrence of power laws in citation networks, which he called "cumulative advantage" but which is today more commonly known under the name preferential attachment.
Recent interest in scale-free networks started in 1999 with work by Albert-László Barabási and Réka Albert at the University of Notre Dame who mapped the topology of a portion of the World Wide Web, [9] finding that some nodes, which they called "hubs", had many more connections than others and that the network as a whole had a power-law distribution of the number of links connecting to a node. After finding that a few other networks, including some social and biological networks, also had heavy-tailed degree distributions, Barabási and Réka Albert coined the term "scale-free network" to describe the class of networks that exhibit a power-law degree distribution. However, studying seven examples of networks in social, economic, technological, biological, and physical systems, Amaral et al. were not able to find a scale-free network among these seven examples. Only one of these examples, the movie-actor network, had degree distribution P(k) following a power law regime for moderate k, though eventually this power law regime was followed by a sharp cutoff showing exponential decay for large k. [10]
Barabási and Réka Albert proposed a generative mechanism to explain the appearance of power-law distributions, which they called "preferential attachment" and which is essentially the same as that proposed by Price. Analytic solutions for this mechanism (also similar to the solution of Price) were presented in 2000 by Dorogovtsev, Mendes and Samukhin [11] and independently by Krapivsky, Redner, and Leyvraz, and later rigorously proved by mathematician Béla Bollobás. [12] Notably, however, this mechanism only produces a specific subset of networks in the scale-free class, and many alternative mechanisms have been discovered since. [13]
The history of scale-free networks also includes some disagreement. On an empirical level, the scale-free nature of several networks has been called into question. For instance, the three brothers Faloutsos believed that the Internet had a power law degree distribution on the basis of traceroute data; however, it has been suggested that this is a layer 3 illusion created by routers, which appear as high-degree nodes while concealing the internal layer 2 structure of the ASes they interconnect. [14]
On a theoretical level, refinements to the abstract definition of scale-free have been proposed. For example, Li et al. (2005) offered a potentially more precise "scale-free metric". Briefly, let G be a graph with edge set E, and denote the degree of a vertex (that is, the number of edges incident to ) by . Define
This is maximized when high-degree nodes are connected to other high-degree nodes. Now define
where smax is the maximum value of s(H) for H in the set of all graphs with degree distribution identical to that of G. This gives a metric between 0 and 1, where a graph G with small S(G) is "scale-rich", and a graph G with S(G) close to 1 is "scale-free". This definition captures the notion of self-similarity implied in the name "scale-free".
When the concept of "scale-free" was initially introduced in the context of networks, [9] it primarily referred to a specific trait: a power-law distribution for a given variable , expressed as . This property maintains its form when subjected to a continuous scale transformation , evoking parallels with the renormalization group techniques in statistical field theory. [15] [16]
However, there's a key difference. In statistical field theory, the term "scale" often pertains to system size. In the realm of networks, "scale" is a measure of connectivity, generally quantified by a node's degree—that is, the number of links attached to it. Networks featuring a higher number of high-degree nodes are deemed to have greater connectivity.
The power-law degree distribution enables us to make "scale-free" assertions about the prevalence of high-degree nodes. [17] For instance, we can say that "nodes with triple the average connectivity occur half as frequently as nodes with average connectivity." The specific numerical value of what constitutes "average connectivity" becomes irrelevant, whether it's a hundred or a million. [18]
The most notable characteristic in a scale-free network is the relative commonness of vertices with a degree that greatly exceeds the average. The highest-degree nodes are often called "hubs", and are thought to serve specific purposes in their networks, although this depends greatly on the domain.
Another important characteristic of scale-free networks is the clustering coefficient distribution, which decreases as the node degree increases. This distribution also follows a power law. This implies that the low-degree nodes belong to very dense sub-graphs and those sub-graphs are connected to each other through hubs. Consider a social network in which nodes are people and links are acquaintance relationships between people. It is easy to see that people tend to form communities, i.e., small groups in which everyone knows everyone (one can think of such community as a complete graph). In addition, the members of a community also have a few acquaintance relationships to people outside that community. Some people, however, are connected to a large number of communities (e.g., celebrities, politicians). Those people may be considered the hubs responsible for the small-world phenomenon.
At present, the more specific characteristics of scale-free networks vary with the generative mechanism used to create them. For instance, networks generated by preferential attachment typically place the high-degree vertices in the middle of the network, connecting them together to form a core, with progressively lower-degree nodes making up the regions between the core and the periphery. The random removal of even a large fraction of vertices impacts the overall connectedness of the network very little, suggesting that such topologies could be useful for security, while targeted attacks destroys the connectedness very quickly. Other scale-free networks, which place the high-degree vertices at the periphery, do not exhibit these properties. Similarly, the clustering coefficient of scale-free networks can vary significantly depending on other topological details.
The question of how to immunize efficiently scale free networks which represent realistic networks such as the Internet and social networks has been studied extensively. One such strategy is to immunize the largest degree nodes, i.e., targeted (intentional) attacks since for this case p is relatively high and less nodes are needed to be immunized. However, in many realistic cases the global structure is not available and the largest degree nodes are not known.
Properties of random graph may change or remain invariant under graph transformations. Mashaghi A. et al., for example, demonstrated that a transformation which converts random graphs to their edge-dual graphs (or line graphs) produces an ensemble of graphs with nearly the same degree distribution, but with degree correlations and a significantly higher clustering coefficient. Scale free graphs, as such, remain scale free under such transformations. [19]
Although many real-world networks are thought to be scale-free, the evidence often remains inconclusive, primarily due to the developing awareness of more rigorous data analysis techniques. [3] As such, the scale-free nature of many networks is still being debated by the scientific community. A few examples of networks claimed to be scale-free include:
Scale free topology has been also found in high temperature superconductors. [23] The qualities of a high-temperature superconductor — a compound in which electrons obey the laws of quantum physics, and flow in perfect synchrony, without friction — appear linked to the fractal arrangements of seemingly random oxygen atoms and lattice distortion. [24]
A space-filling cellular structure, weighted planar stochastic lattice (WPSL) has recently been proposed whose coordination number distribution follow a power-law. It implies that the lattice has a few blocks which have astonishingly large number neighbors with whom they share common borders. Its construction starts with an initiator, say a square of unit area, and a generator that divides it randomly into four blocks. The generator thereafter is sequentially applied over and over again to only one of the available blocks picked preferentially with respect to their areas. It results in the partitioning of the square into ever smaller mutually exclusive rectangular blocks. The dual of the WPSL (DWPSL), which is obtained by replacing each block with a node at its center, and each common border between blocks with an edge joining the two corresponding vertices, emerges as a network whose degree distribution follows a power-law. [25] [26] The reason for it is that it grows following mediation-driven attachment model rule which also embodies preferential attachment rule but in disguise.
Scale-free networks do not arise by chance alone. Erdős and Rényi (1960) studied a model of growth for graphs in which, at each step, two nodes are chosen uniformly at random and a link is inserted between them. The properties of these random graphs are different from the properties found in scale-free networks, and therefore a model for this growth process is needed.
The most widely known generative model for a subset of scale-free networks is Barabási and Albert's (1999) rich get richer generative model in which each new Web page creates links to existing Web pages with a probability distribution which is not uniform, but proportional to the current in-degree of Web pages. This model was originally invented by Derek J. de Solla Price in 1965 under the term cumulative advantage, but did not reach popularity until Barabási rediscovered the results under its current name (BA Model). According to this process, a page with many in-links will attract more in-links than a regular page. This generates a power-law but the resulting graph differs from the actual Web graph in other properties such as the presence of small tightly connected communities. More general models and network characteristics have been proposed and studied. For example, Pachon et al. (2018) proposed a variant of the rich get richer generative model which takes into account two different attachment rules: a preferential attachment mechanism and a uniform choice only for the most recent nodes. [27] For a review see the book by Dorogovtsev and Mendes.[ citation needed ] Some mechanisms such as super-linear preferential attachment and second neighbour attachment generate networks which are transiently scale-free, but deviate from a power law as networks grow large. [7] [8]
A somewhat different generative model for Web links has been suggested by Pennock et al. (2002). They examined communities with interests in a specific topic such as the home pages of universities, public companies, newspapers or scientists, and discarded the major hubs of the Web. In this case, the distribution of links was no longer a power law but resembled a normal distribution. Based on these observations, the authors proposed a generative model that mixes preferential attachment with a baseline probability of gaining a link.
Another generative model is the copy model studied by Kumar et al. [28] (2000), in which new nodes choose an existent node at random and copy a fraction of the links of the existent node. This also generates a power law.
There are two major components that explain the emergence of the power-law distribution in the Barabási–Albert model: the growth and the preferential attachment. [29] By "growth" is meant a growth process where, over an extended period of time, new nodes join an already existing system, a network (like the World Wide Web which has grown by billions of web pages over 10 years). Finally, by "preferential attachment" is meant that new nodes prefer to connect to nodes that already have a high number of links with others. Thus, there is a higher probability that more and more nodes will link themselves to that one which has already many links, leading this node to a hub in-fine. [9] Depending on the network, the hubs might either be assortative or disassortative. Assortativity would be found in social networks in which well-connected/famous people would tend to know better each other. Disassortativity would be found in technological (Internet, World Wide Web) and biological (protein interaction, metabolism) networks. [29]
However, the growth of the networks (adding new nodes) is not a necessary condition for creating a scale-free network (see Dangalchev [30] ). One possibility (Caldarelli et al. 2002) is to consider the structure as static and draw a link between vertices according to a particular property of the two vertices involved. Once specified the statistical distribution for these vertex properties (fitnesses), it turns out that in some circumstances also static networks develop scale-free properties.
There has been a burst of activity in the modeling of scale-free complex networks. The recipe of Barabási and Albert [31] has been followed by several variations and generalizations [32] [33] [34] [35] [27] and the revamping of previous mathematical works. [36]
In today's terms, if a complex network has a power-law distribution of any of its metrics, it's generally considered a scale-free network. Similarly, any model with this feature is called a scale-free model. [17]
Many real networks are (approximately) scale-free and hence require scale-free models to describe them. In Price's scheme, there are two ingredients needed to build up a scale-free model:
1. Adding or removing nodes. Usually we concentrate on growing the network, i.e. adding nodes.
2. Preferential attachment: The probability that new nodes will be connected to the "old" node.
Note that some models (see Dangalchev [30] and Fitness model below) can work also statically, without changing the number of nodes. It should also be kept in mind that the fact that "preferential attachment" models give rise to scale-free networks does not prove that this is the mechanism underlying the evolution of real-world scale-free networks, as there might exist different mechanisms at work in real-world systems that nevertheless give rise to scaling.
There have been several attempts to generate scale-free network properties. Here are some examples:
The Barabási–Albert model, an undirected version of Price's model has a linear preferential attachment and adds one new node at every time step.
(Note, another general feature of in real networks is that , i.e. there is a nonzero probability that a new node attaches to an isolated node. Thus in general has the form , where is the initial attractiveness of the node.)
Dangalchev (see [30] ) builds a 2-L model by considering the importance of each of the neighbours of a target node in preferential attachment. The attractiveness of a node in the 2-L model depends not only on the number of nodes linked to it but also on the number of links in each of these nodes.
where C is a coefficient between 0 and 1.
A variant of the 2-L model, the k2 model, where first and second neighbour nodes contribute equally to a target node's attractiveness, demonstrates the emergence of transient scale-free networks. [8] In the k2 model, the degree distribution appears approximately scale-free as long as the network is relatively small, but significant deviations from the scale-free regime emerge as the network grows larger. This results in the relative attractiveness of nodes with different degrees changing over time, a feature also observed in real networks.
In the mediation-driven attachment (MDA) model, a new node coming with edges picks an existing connected node at random and then connects itself, not with that one, but with of its neighbors, also chosen at random. The probability that the node of the existing node picked is
The factor is the inverse of the harmonic mean (IHM) of degrees of the neighbors of a node . Extensive numerical investigation suggest that for approximately the mean IHM value in the large limit becomes a constant which means . It implies that the higher the links (degree) a node has, the higher its chance of gaining more links since they can be reached in a larger number of ways through mediators which essentially embodies the intuitive idea of rich get richer mechanism (or the preferential attachment rule of the Barabasi–Albert model). Therefore, the MDA network can be seen to follow the PA rule but in disguise. [37]
However, for it describes the winner takes it all mechanism as we find that almost of the total nodes has degree one and one is super-rich in degree. As value increases the disparity between the super rich and poor decreases and as we find a transition from rich get super richer to rich get richer mechanism.
The Barabási–Albert model assumes that the probability that a node attaches to node is proportional to the degree of node . This assumption involves two hypotheses: first, that depends on , in contrast to random graphs in which , and second, that the functional form of is linear in .
In non-linear preferential attachment, the form of is not linear, and recent studies have demonstrated that the degree distribution depends strongly on the shape of the function
Krapivsky, Redner, and Leyvraz [34] demonstrate that the scale-free nature of the network is destroyed for nonlinear preferential attachment. The only case in which the topology of the network is scale free is that in which the preferential attachment is asymptotically linear, i.e. as . In this case the rate equation leads to
This way the exponent of the degree distribution can be tuned to any value between 2 and .[ clarification needed ]
Hierarchical network models are, by design, scale free and have high clustering of nodes. [38]
The iterative construction leads to a hierarchical network. Starting from a fully connected cluster of five nodes, we create four identical replicas connecting the peripheral nodes of each cluster to the central node of the original cluster. From this, we get a network of 25 nodes (N = 25). Repeating the same process, we can create four more replicas of the original cluster – the four peripheral nodes of each one connect to the central node of the nodes created in the first step. This gives N = 125, and the process can continue indefinitely.
The idea is that the link between two vertices is assigned not randomly with a probability p equal for all the couple of vertices. Rather, for every vertex j there is an intrinsic fitnessxj and a link between vertex i and j is created with a probability . [39] In the case of World Trade Web it is possible to reconstruct all the properties by using as fitnesses of the country their GDP, and taking
Assuming that a network has an underlying hyperbolic geometry, one can use the framework of spatial networks to generate scale-free degree distributions. This heterogeneous degree distribution then simply reflects the negative curvature and metric properties of the underlying hyperbolic geometry. [41]
Starting with scale free graphs with low degree correlation and clustering coefficient, one can generate new graphs with much higher degree correlations and clustering coefficients by applying edge-dual transformation. [19]
UPA model is a variant of the preferential attachment model (proposed by Pachon et al.) which takes into account two different attachment rules: a preferential attachment mechanism (with probability 1−p) that stresses the rich get richer system, and a uniform choice (with probability p) for the most recent nodes. This modification is interesting to study the robustness of the scale-free behavior of the degree distribution. It is proved analytically that the asymptotically power-law degree distribution is preserved. [27]
In the context of network theory a scale-free ideal network is a random network with a degree distribution following the scale-free ideal gas density distribution. These networks are able to reproduce city-size distributions and electoral results by unraveling the size distribution of social groups with information theory on complex networks when a competitive cluster growth process is applied to the network. [42] [43] In models of scale-free ideal networks it is possible to demonstrate that Dunbar's number is the cause of the phenomenon known as the 'six degrees of separation'.
For a scale-free network with nodes and power-law exponent , the induced subgraph constructed by vertices with degrees larger than is a scale-free network with , almost surely. [44]
Estimating the power-law exponent of a scale-free network is typically done by using the maximum likelihood estimation with the degrees of a few uniformly sampled nodes. [3] However, since uniform sampling does not obtain enough samples from the important heavy-tail of the power law degree distribution, this method can yield a large bias and a variance. It has been recently proposed to sample random friends (i.e., random ends of random links) who are more likely come from the tail of the degree distribution as a result of the friendship paradox. [45] [46] Theoretically, maximum likelihood estimation with random friends lead to a smaller bias and a smaller variance compared to classical approach based on uniform sampling. [46]
A small-world network is a graph characterized by a high clustering coefficient and low distances. On an example of social network, high clustering implies the high probability that two friends of one person are friends themselves. The low distances, on the other hand, mean that there is a short chain of social connections between any two people. Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network, that is:
In the context of network theory, a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in networks representing real systems. The study of complex networks is a young and active area of scientific research inspired largely by empirical findings of real-world networks such as computer networks, biological networks, technological networks, brain networks, climate networks and social networks.
In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.
A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. "Preferential attachment" is only the most recent of many names that have been given to such processes. They are also referred to under the names Yule process, cumulative advantage, the rich get richer, and the Matthew effect. They are also related to Gibrat's law. The principal reason for scientific interest in preferential attachment is that it can, under suitable circumstances, generate power law distributions. If preferential attachment is non-linear, measured distributions may deviate from a power law. These mechanisms may generate distributions which are approximately power law over transient periods.
The Barabási–Albert (BA) model is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the World Wide Web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes with unusually high degree as compared to the other nodes of the network. The BA model tries to explain the existence of such nodes in real networks. The algorithm is named for its inventors Albert-László Barabási and Réka Albert.
The Watts–Strogatz model is a random graph generation model that produces graphs with small-world properties, including short average path lengths and high clustering. It was proposed by Duncan J. Watts and Steven Strogatz in their article published in 1998 in the Nature scientific journal. The model also became known as the (Watts) beta model after Watts used to formulate it in his popular science book Six Degrees.
In complex network theory, the fitness model is a model of the evolution of a network: how the links between nodes change over time depends on the fitness of nodes. Fitter nodes attract more links at the expense of less fit nodes.
In applied probability theory, the Simon model is a class of stochastic models that results in a power-law distribution function. It was proposed by Herbert A. Simon to account for the wide range of empirical distributions following a power-law. It models the dynamics of a system of elements with associated counters. In this model the dynamics of the system is based on constant growth via addition of new elements as well as incrementing the counters at a rate proportional to their current values.
Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."
Evolving networks are networks that change as a function of time. They are a natural extension of network science since almost all real world networks evolve over time, either by adding or removing nodes or links over time. Often all of these processes occur simultaneously, such as in social networks where people make and lose friends over time, thereby creating and destroying edges, and some people become part of new social networks or leave their networks, changing the nodes in the network. Evolving network concepts build on established network theory and are now being introduced into studying networks in many diverse fields.
Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scale-free topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks.
Price's model is a mathematical model for the growth of citation networks. It was the first model which generalized the Simon model to be used for networks, especially for growing networks. Price's model belongs to the broader class of network growing models whose primary target is to explain the origination of networks with strongly skewed degree distributions. The model picked up the ideas of the Simon model reflecting the concept of rich get richer, also known as the Matthew effect. Price took the example of a network of citations between scientific papers and expressed its properties. His idea was that the way an old vertex gets new edges should be proportional to the number of existing edges the vertex already has. This was referred to as cumulative advantage, now also known as preferential attachment. Price's work is also significant in providing the first known example of a scale-free network. His ideas were used to describe many real-world networks such as the Web.
In mathematical modeling of social networks, link-centric preferential attachment is a node's propensity to re-establish links to nodes it has previously been in contact with in time-varying networks. This preferential attachment model relies on nodes keeping memory of previous neighbors up to the current time.
Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.
In network science, preferential attachment means that nodes of a network tend to connect to those nodes which have more links. If the network is growing and new nodes tend to connect to existing ones with linear probability in the degree of the existing nodes then preferential attachment leads to a scale-free network. If this probability is sub-linear then the network’s degree distribution is stretched exponential and hubs are much smaller than in a scale-free network. If this probability is super-linear then almost all nodes are connected to a few hubs. According to Kunegis, Blattner, and Moser several online networks follow a non-linear preferential attachment model. Communication networks and online contact networks are sub-linear while interaction networks are super-linear. The co-author network among scientists also shows the signs of sub-linear preferential attachment.
The Bianconi–Barabási model is a model in network science that explains the growth of complex evolving networks. This model can explain that nodes with different characteristics acquire links at different rates. It predicts that a node's growth depends on its fitness and can calculate the degree distribution. The Bianconi–Barabási model is named after its inventors Ginestra Bianconi and Albert-László Barabási. This model is a variant of the Barabási–Albert model. The model can be mapped to a Bose gas and this mapping can predict a topological phase transition between a "rich-get-richer" phase and a "winner-takes-all" phase.
In network science, a hub is a node with a number of links that greatly exceeds the average. Emergence of hubs is a consequence of a scale-free property of networks. While hubs cannot be observed in a random network, they are expected to emerge in scale-free networks. The uprise of hubs in scale-free networks is associated with power-law distribution. Hubs have a significant impact on the network topology. Hubs can be found in many real networks, such as the brain or the Internet.
In a scale-free network the degree distribution follows a power law function. In some empirical examples this power-law fits the degree distribution well only in the high degree region; in some small degree nodes the empirical degree-distribution deviates from it. See for example the network of scientific citations. This deviation of the observed degree-distribution from the theoretical prediction at the low-degree region is often referred as low-degree saturation. The empirical degree-distribution typically deviates downward from the power-law function fitted on higher order nodes, which means low-degree nodes are less frequent in real data than what is predicted by the Barabási–Albert model.
The initial attractiveness is a possible extension of the Barabási–Albert model. The Barabási–Albert model generates scale-free networks where the degree distribution can be described by a pure power law. However, the degree distribution of most real life networks cannot be described by a power law solely. The most common discrepancies regarding the degree distribution found in real networks are the high degree cut-off and the low degree saturation. The inclusion of initial attractiveness in the Barabási–Albert model addresses the low-degree saturation phenomenon.
In the scale-free network theory, a mediation-driven attachment (MDA) model appears to embody a preferential attachment rule tacitly rather than explicitly. According to MDA rule, a new node first picks a node from the existing network at random and connect itself not with that but with one of the neighbors also picked at random.