Giant component

Last updated September 25, 2024

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

More precisely, in graphs drawn randomly from a probability distribution over arbitrarily large graphs, a giant component is a connected component whose fraction of the overall number of vertices is bounded away from zero. In sufficiently dense graphs distributed according to the Erdős–Rényi model, a giant component exists with high probability.

Giant component in Erdős–Rényi model

Giant components are a prominent feature of the Erdős–Rényi model (ER) of random graphs, in which each possible edge connecting pairs of a given set of $n$ vertices is present, independently of the other edges, with probability $p$ . In this model, if $p\leq {\frac {1-\epsilon }{n}}$ for any constant $\epsilon >0$ , then with high probability (in the limit as $n$ goes to infinity) all connected components of the graph have size $O(log n)$ , and there is no giant component. However, for $p\geq {\frac {1+\epsilon }{n}}$ there is with high probability a single giant component, with all other components having size $O(log n)$ . For $p=p_{c}={\frac {1}{n}}$ , intermediate between these two possibilities, the number of vertices in the largest component of the graph, $P_{\inf }$ is with high probability proportional to $n^{2/3}$ .^[1]

Giant component is also important in percolation theory.^[1]^[2] When a fraction of nodes, $q=1-p$ , is removed randomly from an ER network of degree $\langle k\rangle$ , there exists a critical threshold, $p_{c}={\frac {1}{\langle k\rangle }}$ . Above $p_{c}$ there exists a giant component (largest cluster) of size, $P_{\inf }$ . $P_{\inf }$ fulfills, $P_{\inf }=p(1-\exp(-\langle k\rangle P_{\inf }))$ . For $p<p_{c}$ the solution of this equation is $P_{\inf }=0$ , i.e., there is no giant component.

At $p_{c}$ , the distribution of cluster sizes behaves as a power law, $n(s)$ ~ $s^{-5/2}$ which is a feature of phase transition.

Alternatively, if one adds randomly selected edges one at a time, starting with an empty graph, then it is not until approximately $n/2$ edges have been added that the graph contains a large component, and soon after that the component becomes giant. More precisely, when $t$ edges have been added, for values of $t$ close to but larger than $n/2$ , the size of the giant component is approximately $4t-2n$ .^[1] However, according to the coupon collector's problem, $\Theta (n\log n)$ edges are needed in order to have high probability that the whole random graph is connected.

Graphs with arbitrary degree distributions

A similar sharp threshold between parameters that lead to graphs with all components small and parameters that lead to a giant component also occurs in tree-like random graphs with non-uniform degree distributions $P(k)$ . The degree distribution does not define a graph uniquely. However, under the assumption that in all respects other than their degree distribution, the graphs are treated as entirely random, many results on finite/infinite-component sizes are known. In this model, the existence of the giant component depends only on the first two (mixed) moments of the degree distribution. Let a randomly chosen vertex have degree $k$ , then the giant component exists^[3] if and only if $\langle k^{2}\rangle -2\langle k\rangle >0.$ This is known as the Molloy and Reed condition^[4]. The first moment of $P(k)$ is the mean degree of the network. In general, the $n$ -th moment is defined as $\langle k^{n}\rangle =\mathbb {E} [k^{n}]=\sum k^{n}P(k)$ .

When there is no giant component, the expected size of the small component can also be determined by the first and second moments and it is $1+{\frac {\langle k\rangle ^{2}}{2\langle k\rangle +\langle k^{2}\rangle }}.$ However, when there is a giant component, the size of the giant component is more tricky to evaluate^[2].

Criteria for giant component existence in directed and undirected configuration graphs

Similar expressions are also valid for directed graphs, in which case the degree distribution is two-dimensional.^[5] There are three types of connected components in directed graphs. For a randomly chosen vertex:

out-component is a set of vertices that can be reached by recursively following all out-edges forward;
in-component is a set of vertices that can be reached by recursively following all in-edges backward;
weak component is a set of vertices that can be reached by recursively following all edges regardless of their direction.

Let a randomly chosen vertex has $k_{\text{in}}$ in-edges and $k_{\text{out}}$ out edges. By definition, the average number of in- and out-edges coincides so that $c=\mathbb {E} [k_{\text{in}}]=\mathbb {E} [k_{\text{out}}]$ . If $G_{0}(x)=\textstyle \sum _{k}\displaystyle P(k)x^{k}$ is the generating function of the degree distribution $P(k)$ for an undirected network, then $G_{1}(x)$ can be defined as $G_{1}(x)=\textstyle \sum _{k}\displaystyle {\frac {k}{\langle k\rangle }}P(k)x^{k-1}$ . For directed networks, generating function assigned to the joint probability distribution $P(k_{in},k_{out})$ can be written with two valuables $x$ and $y$ as: ${\mathcal {G}}(x,y)=\sum _{k_{in},k_{out}}\displaystyle P({k_{in},k_{out}})x^{k_{in}}y^{k_{out}}$ , then one can define $g(x)={\frac {1}{c}}{\partial {\mathcal {G}} \over \partial x}\vert _{y=1}$ and $f(y)={\frac {1}{c}}{\partial {\mathcal {G}} \over \partial y}\vert _{x=1}$ . The criteria for giant component existence in directed and undirected random graphs are given in the table below:


Type	Criteria
undirected: giant component	$\mathbb {E} [k^{2}]-2\mathbb {E} [k]>0$ ,^[3] or $G'_{1}(1)=1$ ^[5]
directed: giant in/out-component	$\mathbb {E} [k_{\text{in}}k_{out}]-\mathbb {E} [k_{\text{in}}]>0$ ,^[5] or $g'_{1}(1)=f'_{1}(1)=1$ ^[5]
directed: giant weak component	$2\mathbb {E} [k_{\text{in}}]\mathbb {E} [k_{\text{in}}k_{\text{out}}]-\mathbb {E} [k_{\text{in}}]\mathbb {E} [k_{\text{out}}^{2}]-\mathbb {E} [k_{\text{in}}]\mathbb {E} [k_{\text{in}}^{2}]+\mathbb {E} [k_{\text{in}}^{2}]\mathbb {E} [k_{\text{out}}^{2}]-\mathbb {E} [k_{\text{in}}k_{\text{out}}]^{2}>0$ ^[6]

Related Research Articles

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In mathematics, the probabilistic method is a nonconstructive method, primarily used in combinatorics and pioneered by Paul Erdős, for proving the existence of a prescribed kind of mathematical object. It works by showing that if one randomly chooses objects from a specified class, the probability that the result is of the prescribed kind is strictly greater than zero. Although the proof uses probability, the final conclusion is determined for certain, without any possible error.

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.

In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network.

In probability theory and statistical mechanics, the Gaussian free field (GFF) is a Gaussian random field, a central model of random surfaces.

In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959. Edgar Gilbert introduced the other model contemporaneously with and independently of Erdős and Rényi. In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely. In the model introduced by Gilbert, also called the Erdős–Rényi–Gilbert model, each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

In mathematics, the Fortuin–Kasteleyn–Ginibre (FKG) inequality is a correlation inequality, a fundamental tool in statistical mechanics and probabilistic combinatorics, due to Cees M. Fortuin, Pieter W. Kasteleyn, and Jean Ginibre. Informally, it says that in many random systems, increasing events are positively correlated, while an increasing and a decreasing event are negatively correlated. It was obtained by studying the random cluster model.

The structural cut-off is a concept in network science which imposes a degree cut-off in the degree distribution of a finite size network due to structural limitations. Networks with vertices with degree higher than the structural cut-off will display structural disassortativity.

Robustness, the ability to withstand failures and perturbations, is a critical attribute of many complex systems including complex networks.

In network science, a critical point is a value of average degree, which separates random networks that have a giant component from those that do not. Considering a random network with an average degree $the critical point is$

Evolution of a random network is a dynamical process in random networks, usually leading to emergence of giant components, accompanied with striking consequences on the network topology.

Lancichinetti–Fortunato–Radicchibenchmark is an algorithm that generates benchmark networks. They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.

In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on $or from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.$

In network science, the configuration model is a method for generating random networks from a given degree sequence. It is widely used as a reference model for real-life social networks, because it allows the modeler to incorporate arbitrary degree distributions.

Maximum-entropy random graph models are random graph models used to study complex networks subject to the principle of maximum entropy under a set of structural constraints, which may be global, distributional, or local.

In applied mathematics, the soft configuration model (SCM) is a random graph model subject to the principle of maximum entropy under constraints on the expectation of the degree sequence of sampled graphs. Whereas the configuration model (CM) uniformly samples random graphs of a specific degree sequence, the SCM only retains the specified degree sequence on average over all network realizations; in this sense the SCM has very relaxed constraints relative to those of the CM ("soft" rather than "sharp" constraints). The SCM for graphs of size $has a nonzero probability of sampling any graph of size, whereas the CM is restricted to only graphs having precisely the prescribed connectivity structure.$

In the context of quantum computing, the quantum walk search is a quantum algorithm for finding a marked node in a graph.

References

1 2 3 Bollobás, Béla (2001), "6. The Evolution of Random Graphs—The Giant Component", Random Graphs, Cambridge studies in advanced mathematics, vol. 73 (2nd ed.), Cambridge University Press, pp. 130–159, ISBN 978-0-521-79722-1 .
1 2 Newman, M. E. J. (2010). Networks : an introduction. New York: Oxford University Press. OCLC 456837194.
1 2 Molloy, Michael; Reed, Bruce (1995). "A critical point for random graphs with a given degree sequence". Random Structures & Algorithms. 6 (2–3): 161–180. doi:10.1002/rsa.3240060204. ISSN 1042-9832.
↑ Molloy, Michael; Reed, Bruce (March 1995). "A critical point for random graphs with a given degree sequence". Random Structures & Algorithms. 6 (2–3): 161–180. doi:10.1002/rsa.3240060204. ISSN 1042-9832.
1 2 3 4 Newman, M. E. J.; Strogatz, S. H.; Watts, D. J. (2001-07-24). "Random graphs with arbitrary degree distributions and their applications". Physical Review E. 64 (2): 026118. arXiv: cond-mat/0007235 . Bibcode:2001PhRvE..64b6118N. doi: 10.1103/physreve.64.026118 . ISSN 1063-651X. PMID 11497662.
↑ Kryven, Ivan (2016-07-27). "Emergence of the giant weak component in directed random graphs with arbitrary degree distributions". Physical Review E. 94 (1): 012315. arXiv: 1607.03793 . Bibcode:2016PhRvE..94a2315K. doi:10.1103/physreve.94.012315. ISSN 2470-0045. PMID 27575156. S2CID 206251373.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[b-1] 1 2 3 Bollobás, Béla (2001), "6. The Evolution of Random Graphs—The Giant Component", Random Graphs, Cambridge studies in advanced mathematics, vol. 73 (2nd ed.), Cambridge University Press, pp. 130–159, ISBN 978-0-521-79722-1 .

[:2-2] 1 2 Newman, M. E. J. (2010). Networks : an introduction. New York: Oxford University Press. OCLC 456837194.

[Molloy&Reed1995-3] 1 2 Molloy, Michael; Reed, Bruce (1995). "A critical point for random graphs with a given degree sequence". Random Structures & Algorithms. 6 (2–3): 161–180. doi:10.1002/rsa.3240060204. ISSN 1042-9832.

[4] Molloy, Michael; Reed, Bruce (March 1995). "A critical point for random graphs with a given degree sequence". Random Structures & Algorithms. 6 (2–3): 161–180. doi:10.1002/rsa.3240060204. ISSN 1042-9832.

[NewmanStrogatzWatts2001-5] 1 2 3 4 Newman, M. E. J.; Strogatz, S. H.; Watts, D. J. (2001-07-24). "Random graphs with arbitrary degree distributions and their applications". Physical Review E. 64 (2): 026118. arXiv: cond-mat/0007235 . Bibcode:2001PhRvE..64b6118N. doi: 10.1103/physreve.64.026118 . ISSN 1063-651X. PMID 11497662.

[Kryven2016-6] Kryven, Ivan (2016-07-27). "Emergence of the giant weak component in directed random graphs with arbitrary degree distributions". Physical Review E. 94 (1): 012315. arXiv: 1607.03793 . Bibcode:2016PhRvE..94a2315K. doi:10.1103/physreve.94.012315. ISSN 2470-0045. PMID 27575156. S2CID 206251373.

[1]

[2]

[3]

[4]

[5]

[6]