Extremal Ensemble Learning

Last updated

Extremal Ensemble Learning (EEL) is a machine learning algorithmic paradigm for graph partitioning. EEL creates an ensemble of partitions and then uses information contained in the ensemble to find new and improved partitions. The ensemble evolves and learns how to form improved partitions through extremal updating procedure. The final solution is found by achieving consensus among its member partitions about what the optimal partition is. [1] [2]

Reduced Network Extremal Ensemble Learning (RenEEL)

A particular implementation of the EEL paradigm is the Reduced Network Extremal Ensemble Learning (RenEEL) scheme for partitioning a graph. [1] RenEEL uses consensus across many partitions in an ensemble to create a reduced network that can be efficiently analyzed to find more accurate partitions. These better quality partitions are subsequently used to update the ensemble. An algorithm that utilizes the RenEEL scheme is currently the best algorithm for finding the graph partition with maximum modularity, which is an NP-hard problem. [3]

Related Research Articles

Clique (graph theory) subset of the vertices of a node-link graph that are all adjacent to each other

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent; that is, its induced subgraph is complete. Cliques are one of the basic concepts of graph theory and are used in many other mathematical problems and constructions on graphs. Cliques have also been studied in computer science: the task of finding whether there is a clique of a given size in a graph is NP-complete, but despite this hardness result, many algorithms for finding cliques have been studied.

In the mathematical discipline of graph theory, a matching or independent edge set in a graph is a set of edges without common vertices. Finding a matching in a bipartite graph can be treated as a network flow problem.

In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.

Community structure concept in graph theory

In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of non-overlapping community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But overlapping communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to.

In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others. Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013). Two common examples of graph partitioning are minimum cut and maximum cut problems.

Network motif Type of sub-graph

Network motifs are recurrent and statistically significant sub-graphs or patterns of a larger graph. All networks, including biological networks, social networks, technological networks and more, can be represented as graphs, which include a wide variety of subgraphs.

Spectral clustering

In multivariate statistics and the clustering of data, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

Mark Newman is an English-American physicist and Anatol Rapoport Distinguished University Professor of Physics at the University of Michigan, as well as an external faculty member of the Santa Fe Institute. He is known for his fundamental contributions to the fields of complex networks and complex systems, for which he was awarded the 2014 Lagrange Prize.

Modularity (networks)

Modularity is one measure of the structure of networks or graphs. It was designed to measure the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. However, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities. Biological networks, including animal brains, exhibit a high degree of modularity.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of O(√n) vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most 2n/3 vertices.

Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance.

Consensus clustering is a method of aggregating results from multiple clustering algorithms. Also called cluster ensembles or aggregation of clustering, it refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete, even when the number of input clusterings is three. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning.

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

Tribe (internet) slang for an unofficial community of people who share a common interest

The term tribe or digital tribe is used as a slang term for an unofficial community or organization of people who share a common interest, and usually who are loosely affiliated with each other through social media or other Internet mechanisms. The term is related to "tribe", which traditionally refers to people closely associated in both geography and genealogy. Nowadays, it looks more like a virtual community or a personal network and it is often called global digital tribe. Most anthropologists agree that a tribe is a (small) society that practices its own customs and culture, and that these define the tribe. The tribes are divided into clans, with their own customs and cultural values that differentiate them from activities that occur in 'real life' contexts. People feel more inclined to share and defend their ideas on social networks than they would dare to say to someone face to face. For example, it would be ridiculous to 'poke' someone in real life.

Consensus dynamics or agreement dynamics is an area of research lying at the intersection of systems theory and graph theory. A major topic of investigation is the agreement or consensus problem in multi-agent systems that concerns processes by which a collection of interacting agents achieve a common goal. Networks of agents that exchange information to reach consensus include: physiological systems, gene networks, large-scale energy systems and fleets of vehicles on land, in the air or in space. The agreement protocol or consensus protocol is an unforced dynamical system that is governed by the interconnection topology and the initial condition for each agent. Other problems are the rendezvous problem, synchronization, flocking, formation control. One solution paradigm is distributed constraint reasoning.

In graph theory, the modular decomposition is a decomposition of a graph into subsets of vertices called modules. A module is a generalization of a connected component of a graph. Unlike connected components, however, one module can be a proper subset of another. Modules therefore lead to a recursive (hierarchical) decomposition of the graph, instead of just a partition.

Louvain modularity method to detect communities in graphs

The Louvain method for community detection is a method to extract communities from large networks created by Blondel et al. from the University of Louvain. The method is a greedy optimization method that appears to run in time in the number of nodes in the network.

The stochastic block model is a generative model for random graphs. This model tends to produce graphs containing communities, subsets characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data.

Automatic clustering algorithms are algorithms that can perform clustering without prior knowledge of data sets. In contrast with other cluster analysis techniques, automatic clustering algorithms can determine the optimal number of clusters even in the presence of noise and outlier points.

References

  1. 1 2 J. Guo; P. Singh; K.E. Bassler (2019). "Reduced network extremal ensemble learning (RenEEL) scheme for community detection in complex networks". Scientific Reports. 9 (14234). doi: 10.1038/s41598-019-50739-3 .
  2. Polikar, R. (2006). "Ensemble based systems in decision making". IEEE Circuits and Systems Magazine. 6 (3): 21–45. doi:10.1109/MCAS.2006.1688199.
  3. Newman, M. E. J. (2006). "Modularity and community structure in networks". Proceedings of the National Academy of Sciences of the United States of America. 103 (23): 8577–8696. arXiv: physics/0602124 . Bibcode:2006PNAS..103.8577N. doi:10.1073/pnas.0601602103. PMC   1482622 . PMID   16723398.