The clique percolation method [1] is a popular approach for analyzing the overlapping community structure of networks. The term network community (also called a module, cluster or cohesive group) has no widely accepted unique definition and it is usually defined as a group of nodes that are more densely connected to each other than to other nodes in the network. There are numerous alternative methods for detecting communities in networks, [2] for example, the Girvan–Newman algorithm, hierarchical clustering and modularity maximization.
The clique percolation method builds up the communities from k-cliques, which correspond to complete (fully connected) sub-graphs of k nodes. (E.g., a k-clique at k = 3 is equivalent to a triangle). Two k-cliques are considered adjacent if they share k − 1 nodes. A community is defined as the maximal union of k-cliques that can be reached from each other through a series of adjacent k-cliques. Such communities can be best interpreted with the help of a k-clique template (an object isomorphic to a complete graph of k nodes). Such a template can be placed onto any k-clique in the graph, and rolled to an adjacent k-clique by relocating one of its nodes and keeping its other k − 1 nodes fixed. Thus, the k-clique communities of a network are all those sub-graphs that can be fully explored by rolling a k-clique template in them, but cannot be left by this template.
This definition allows overlaps between the communities in a natural way, as illustrated in Fig.1, showing four k-clique communities at k = 4. The communities are color-coded and the overlap between them is emphasized in red. The definition above is also local: if a certain sub-graph fulfills the criteria to be considered as a community, then it will remain a community independent of what happens to another part of the network far away. In contrast, when searching for the communities by optimizing with respect to a global quantity, a change far away in the network can reshape the communities in the unperturbed regions as well. Furthermore, it has been shown that global methods can suffer from a resolution limit problem, [3] where the size of the smallest community that can be extracted is dependent on the system size. A local community definition such as here circumvents this problem automatically.
Since even small networks can contain a vast number of k-cliques, the implementation of this approach is based on locating all maximal cliques rather than the individual k-cliques. [1] This inevitably requires finding the graph's maximum clique, which is an NP-hard problem. (We emphasize to the reader that finding a maximum clique is much harder than finding a single maximal clique.) This means that although networks with few million nodes have already been analyzed successfully with this approach, [4] the worst case runtime complexity is exponential in the number of nodes.
On a network with directed links a directed k-clique is a complete subgraph with k nodes fulfilling the following condition. The k nodes can be ordered such that between an arbitrary pair of them there exists a directed link pointing from the node with the higher rank towards the node with the lower rank. The directed Clique Percolation Method defines directed network communities as the percolation clusters of directed k-cliques.
On a network with weighted links a weighted k-clique is a complete subgraph with k nodes such that the geometric mean of the k (k - 1) / 2 link weights within the k-clique is greater than a selected threshold value, I. The weighted Clique Percolation Method defines weighted network communities as the percolation clusters of weighted k-cliques. Note that the geometric mean of link weights within a subgraph is called the intensity of that subgraph. [5]
Clique percolation methods may be generalized by recording different amounts of overlap between the various k-cliques. This then defines a new type of graph, a clique graph, [6] where each k-clique in the original graph is represented by a vertex in the new clique graph. The edges in the clique graph are used to record the strength of the overlap of cliques in the original graph. One may then apply any community detection method to this clique graph to identify the clusters in the original graph through the k-clique structure.
For instance in a simple graph, we can define the overlap between two k-cliques to be the number of vertices common to both k-cliques. The Clique Percolation Method is then equivalent to thresholding this clique graph, dropping all edges of weight less than (k-1), with the remaining connected components forming the communities of cliques found in CPM. For k=2 the cliques are the edges of the original graph and the clique graph in this case is the line graph of the original network.
In practice, using the number of common vertices as a measure of the strength of clique overlap may give poor results as large cliques in the original graph, those with many more than k vertices, will dominate the clique graph. The problem arises because if a vertex is in n different k-cliques it will contribute to n(n-1)/2 edges in such a clique graph. A simple solution is to let each vertex common to two overlapping kcliques to contribute a weight equal to 1/n when measuring the overlap strength of the two k-cliques.
In general the clique graph viewpoint is a useful way of finding generalizations of standard clique-percolation methods to get around any problems encountered. It even shows how to describe extensions of these methods based on other motifs, subgraphs other than k-cliques. In this case a clique graph is best thought of a particular example of a hypergraph.
The Erdős–Rényi model shows a series of interesting transitions when the probability p of two nodes being connected is increased. For each k one can find a certain threshold probability pc above which the k-cliques organize into a giant community. [7] [8] [9] (The size of the giant community is comparable to the system size, in other words the giant community occupies a finite part of the system even in the thermodynamic limit.) This transition is analogous to the percolation transition in statistical physics. A similar phenomenon can be observed in many real networks as well: if k is large, only the most densely linked parts are accepted as communities, thus, they usually remain small and dispersed. When k is lowered, both the number and the size of the communities start to grow. However, in most cases a critical k value can be reached, below which a giant community emerges, smearing out the details of the community structure by merging (and making invisible) many smaller communities.
The clique percolation method had been used to detect communities from the studies of cancer metastasis [10] [11] through various social networks [4] [12] [13] [14] [15] to document clustering [16] and economical networks. [17]
There are a number of implementations of clique percolation. The clique percolation method was first implemented and popularized by CFinder (freeware for non-commercial use) software for detecting and visualizing overlapping communities in networks. The program enables customizable visualization and allows easy strolling over the found communities. The package contains a command line version of the program as well, which is suitable for scripting.
A faster implementation (available under the GPL) has been implemented by another group. [18] Another example, which is also very fast in certain contexts, is the SCP algorithm. [19]
A parallel version of the clique percolation method was designed and developed by S. Mainardi et al.. [20] By exploiting today's multi-core/multi-processor computing architectures, the method enables the extraction of k-clique communities from very large networks such as the Internet. [21] The authors released the source code of the method under the GPL and made it freely available for the community.
In statistical physics and mathematics, percolation theory describes the behavior of a network when nodes or links are added. This is a geometric type of phase transition, since at a critical fraction of addition the network of small, disconnected clusters merge into significantly larger connected, so-called spanning clusters. The applications of percolation theory to materials science and in many other disciplines are discussed here and in the articles Network theory and Percolation.
A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as
In mathematics, computer science and network science, network theory is a part of graph theory. It defines networks as graphs where the vertices or edges possess attributes. Network theory analyses these networks over the symmetric relations or asymmetric relations between their (discrete) components.
In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes.
In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, super-spreaders of disease, and brain networks. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.
In the context of network theory, a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in networks representing real systems. The study of complex networks is a young and active area of scientific research inspired largely by empirical findings of real-world networks such as computer networks, biological networks, technological networks, brain networks, climate networks and social networks.
In geometric graph theory, a unit disk graph is the intersection graph of a family of unit disks in the Euclidean plane. That is, it is a graph with one vertex for each disk in the family, and with an edge between two vertices whenever the corresponding vertices lie within a unit distance of each other.
In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But overlapping communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to.
In the mathematical field of graph theory, the Erdős–Rényi model refers to one of two closely related models for generating random graphs or the evolution of a random network. These models are named after Hungarian mathematicians Paul Erdős and Alfréd Rényi, who introduced one of the models in 1959. Edgar Gilbert introduced the other model contemporaneously with and independently of Erdős and Rényi. In the model of Erdős and Rényi, all graphs on a fixed vertex set with a fixed number of edges are equally likely. In the model introduced by Gilbert, also called the Erdős–Rényi–Gilbert model, each edge has a fixed probability of being present or absent, independently of the other edges. These models can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.
The percolation threshold is a mathematical concept in percolation theory that describes the formation of long-range connectivity in random systems. Below the threshold a giant connected component does not exist; while above it, there exists a giant component of the order of system size. In engineering and coffee making, percolation represents the flow of fluids through porous media, but in the mathematics and physics worlds it generally refers to simplified lattice models of random systems or networks (graphs), and the nature of the connectivity in them. The percolation threshold is the critical value of the occupation probability p, or more generally a critical surface for a group of parameters p1, p2, ..., such that infinite connectivity (percolation) first occurs.
Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."
Mark Newman is a British physicist and Anatol Rapoport Distinguished University Professor of Physics at the University of Michigan, as well as an external faculty member of the Santa Fe Institute. He is known for his fundamental contributions to the fields of complex systems and complex networks, for which he was awarded the Lagrange Prize in 2014 and the APS Kadanoff Prize in 2024.
Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.
Human dynamics refer to a branch of complex systems research in statistical physics such as the movement of crowds and queues and other systems of complex human interactions including statistical modelling of human networks, including interactions over communications networks.
In the context of the physical and mathematical theory of percolation, a percolation transition is characterized by a set of universal critical exponents, which describe the fractal properties of the percolating medium at large scales and sufficiently close to the transition. The exponents are universal in the sense that they only depend on the type of percolation model and on the space dimension. They are expected to not depend on microscopic details such as the lattice structure, or whether site or bond percolation is considered. This article deals with the critical exponents of random percolation.
In graph theory, a k-degenerate graph is an undirected graph in which every subgraph has a vertex of degree at most k: that is, some vertex in the subgraph touches k or fewer of the subgraph's edges. The degeneracy of a graph is the smallest value of k for which it is k-degenerate. The degeneracy of a graph is a measure of how sparse it is, and is within a constant factor of other sparsity measures such as the arboricity of a graph.
A stock correlation network is a type of financial network based on stock price correlation used for observing, analyzing and predicting the stock market dynamics.
Water retention on random surfaces is the simulation of catching of water in ponds on a surface of cells of various heights on a regular array such as a square lattice, where water is rained down on every cell in the system. The boundaries of the system are open and allow water to flow out. Water will be trapped in ponds, and eventually all ponds will fill to their maximum height, with any additional water flowing over spillways and out the boundaries of the system. The problem is to find the amount of water trapped or retained for a given surface. This has been studied extensively for random surfaces.
Disparity filter is a network reduction algorithm to extract the backbone structure of undirected weighted network. Many real world networks such as citation networks, food web, airport networks display heavy tailed statistical distribution of nodes' weight and strength. Disparity filter can sufficiently reduce the network without destroying the multi-scale nature of the network. The algorithm is developed by M. Angeles Serrano, Marian Boguna and Alessandro Vespignani.
In the scale-free network theory, a mediation-driven attachment (MDA) model appears to embody a preferential attachment rule tacitly rather than explicitly. According to MDA rule, a new node first picks a node from the existing network at random and connect itself not with that but with one of the neighbors also picked at random.