In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others. [1] Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013). [2] Two common examples of graph partitioning are minimum cut and maximum cut problems.
Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms. [3] However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor. [1] Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist, [4] unless P=NP. Grids are a particularly interesting case since they model the graphs resulting from Finite Element Model (FEM) simulations. When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs. [4]
Consider a graph G = (V, E), where V denotes the set of n vertices and E the set of edges. For a (k,v) balanced partition problem, the objective is to partition G into k components of at most size v · (n/k), while minimizing the capacity of the edges between separate components. [1] Also, given G and an integer k > 1, partition V into k parts (subsets) V1, V2, ..., Vk such that the parts are disjoint and have equal size, and the number of edges with endpoints in different parts is minimized. Such partition problems have been discussed in literature as bicriteria-approximation or resource augmentation approaches. A common extension is to hypergraphs, where an edge can connect more than two vertices. A hyperedge is not cut if all vertices are in one partition, and cut exactly once otherwise, no matter how many vertices are on each side. This usage is common in electronic design automation.
For a specific (k, 1 + ε) balanced partition problem, we seek to find a minimum cost partition of G into k components with each component containing a maximum of (1 + ε)·(n/k) nodes. We compare the cost of this approximation algorithm to the cost of a (k,1) cut, wherein each of the k components must have the same size of (n/k) nodes each, thus being a more restricted problem. Thus,
We already know that (2,1) cut is the minimum bisection problem and it is NP-complete. [5] Next, we assess a 3-partition problem wherein n = 3k, which is also bounded in polynomial time. [1] Now, if we assume that we have a finite approximation algorithm for (k, 1)-balanced partition, then, either the 3-partition instance can be solved using the balanced (k,1) partition in G or it cannot be solved. If the 3-partition instance can be solved, then (k, 1)-balanced partitioning problem in G can be solved without cutting any edge. Otherwise, if the 3-partition instance cannot be solved, the optimum (k, 1)-balanced partitioning in G will cut at least one edge. An approximation algorithm with a finite approximation factor has to differentiate between these two cases. Hence, it can solve the 3-partition problem which is a contradiction under the assumption that P = NP. Thus, it is evident that (k,1)-balanced partitioning problem has no polynomial-time approximation algorithm with a finite approximation factor unless P = NP. [1]
The planar separator theorem states that any n-vertex planar graph can be partitioned into roughly equal parts by the removal of O(√n) vertices. This is not a partition in the sense described above, because the partition set consists of vertices rather than edges. However, the same result also implies that every planar graph of bounded degree has a balanced cut with O(√n) edges.
Since graph partitioning is a hard problem, practical solutions are based on heuristics. There are two broad categories of methods, local and global. Well-known local methods are the Kernighan–Lin algorithm, and Fiduccia-Mattheyses algorithms, which were the first effective 2-way cuts by local search strategies. Their major drawback is the arbitrary initial partitioning of the vertex set, which can affect the final solution quality. Global approaches rely on properties of the entire graph and do not rely on an arbitrary initial partition. The most common example is spectral partitioning, where a partition is derived from approximate eigenvectors of the adjacency matrix, or spectral clustering that groups graph vertices using the eigendecomposition of the graph Laplacian matrix.
A multi-level graph partitioning algorithm works by applying one or more stages. Each stage reduces the size of the graph by collapsing vertices and edges, partitions the smaller graph, then maps back and refines this partition of the original graph. [6] A wide variety of partitioning and refinement methods can be applied within the overall multi-level scheme. In many cases, this approach can give both fast execution times and very high quality results. One widely used example of such an approach is METIS, [7] a graph partitioner, and hMETIS, the corresponding partitioner for hypergraphs. [8] An alternative approach originated from [9] and implemented, e.g., in scikit-learn is spectral clustering with the partitioning determined from eigenvectors of the graph Laplacian matrix for the original graph computed by LOBPCG solver with multigrid preconditioning.
Given a graph with adjacency matrix , where an entry implies an edge between node and , and degree matrix , which is a diagonal matrix, where each diagonal entry of a row , , represents the node degree of node . The Laplacian matrix is defined as . Now, a ratio-cut partition for graph is defined as a partition of into disjoint , and , minimizing the ratio
of the number of edges that actually cross this cut to the number of pairs of vertices that could support such edges. Spectral graph partitioning can be motivated [10] by analogy with partitioning of a vibrating string or a mass-spring system and similarly extended to the case of negative weights of the graph. [11]
In such a scenario, the second smallest eigenvalue () of , yields a lower bound on the optimal cost () of ratio-cut partition with . The eigenvector () corresponding to , called the Fiedler vector, bisects the graph into only two communities based on the sign of the corresponding vector entry. Division into a larger number of communities can be achieved by repeated bisection or by using multiple eigenvectors corresponding to the smallest eigenvalues. [12] The examples in Figures 1,2 illustrate the spectral bisection approach.
Minimum cut partitioning however fails when the number of communities to be partitioned, or the partition sizes are unknown. For instance, optimizing the cut size for free group sizes puts all vertices in the same community. Additionally, cut size may be the wrong thing to minimize since a good division is not just one with small number of edges between communities. This motivated the use of Modularity (Q) [13] as a metric to optimize a balanced graph partition. The example in Figure 3 illustrates 2 instances of the same graph such that in (a) modularity (Q) is the partitioning metric and in (b), ratio-cut is the partitioning metric.
Another objective function used for graph partitioning is Conductance which is the ratio between the number of cut edges and the volume of the smallest part. Conductance is related to electrical flows and random walks. The Cheeger bound guarantees that spectral bisection provides partitions with nearly optimal conductance. The quality of this approximation depends on the second smallest eigenvalue of the Laplacian λ2.
Graph partition can be useful for identifying the minimal set of nodes or links that should be immunized in order to stop epidemics. [14]
Spin models have been used for clustering of multivariate data wherein similarities are translated into coupling strengths. [15] The properties of ground state spin configuration can be directly interpreted as communities. Thus, a graph is partitioned to minimize the Hamiltonian of the partitioned graph. The Hamiltonian (H) is derived by assigning the following partition rewards and penalties.
Additionally, Kernel-PCA-based Spectral clustering takes a form of least squares Support Vector Machine framework, and hence it becomes possible to project the data entries to a kernel induced feature space that has maximal variance, thus implying a high separation between the projected communities. [16]
Some methods express graph partitioning as a multi-criteria optimization problem which can be solved using local methods expressed in a game theoretic framework where each node makes a decision on the partition it chooses. [17]
For very large-scale distributed graphs classical partition methods might not apply (e.g., spectral partitioning, Metis [7] ) since they require full access to graph data in order to perform global operations. For such large-scale scenarios distributed graph partitioning is used to perform partitioning through asynchronous local operations only.
scikit-learn implements spectral clustering with the partitioning determined from eigenvectors of the graph Laplacian matrix for the original graph computed by ARPACK, or by LOBPCG solver with multigrid preconditioning. [9]
METIS [7] is a graph partitioning family by Karypis and Kumar. Among this family, kMetis aims at greater partitioning speed, hMetis, [8] applies to hypergraphs and aims at partition quality, and ParMetis [7] is a parallel implementation of the Metis graph partitioning algorithm.
KaHyPar [18] [19] [20] is a multilevel hypergraph partitioning framework providing direct k-way and recursive bisection based partitioning algorithms. It instantiates the multilevel approach in its most extreme version, removing only a single vertex in every level of the hierarchy. By using this very fine grained n-level approach combined with strong local search heuristics, it computes solutions of very high quality.
Scotch [21] is graph partitioning framework by Pellegrini. It uses recursive multilevel bisection and includes sequential as well as parallel partitioning techniques.
Jostle [22] is a sequential and parallel graph partitioning solver developed by Chris Walshaw. The commercialized version of this partitioner is known as NetWorks.
Party [23] implements the Bubble/shape-optimized framework and the Helpful Sets algorithm.
The software packages DibaP [24] and its MPI-parallel variant PDibaP [25] by Meyerhenke implement the Bubble framework using diffusion; DibaP also uses AMG-based techniques for coarsening and solving linear systems arising in the diffusive approach.
Sanders and Schulz released a graph partitioning package KaHIP [26] (Karlsruhe High Quality Partitioning) that implements for example flow-based methods, more-localized local searches and several parallel and sequential meta-heuristics.
The tools Parkway [27] by Trifunovic and Knottenbelt as well as Zoltan [28] by Devine et al. focus on hypergraph partitioning.
In graph theory, a perfect matching in a graph is a matching that covers every vertex of the graph. More formally, given a graph G = (V, E), a perfect matching in G is a subset M of edge set E, such that every vertex in the vertex set V is adjacent to exactly one edge in M.
In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.
In the mathematical field of graph theory, a bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets and , that is, every edge connects a vertex in to one in . Vertex sets and are usually called the parts of the graph. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles.
In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from the field of graph theory within mathematics.
In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.
In the mathematical discipline of graph theory, a matching or independent edge set in an undirected graph is a set of edges without common vertices. In other words, a subset of the edges is a matching if each vertex appears in at most one edge of that matching. Finding a matching in a bipartite graph can be treated as a network flow problem.
In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.
In graph theory, a domatic partition of a graph is a partition of into disjoint sets , ,..., such that each Vi is a dominating set for G. The figure on the right shows a domatic partition of a graph; here the dominating set consists of the yellow vertices, consists of the green vertices, and consists of the blue vertices.
In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. Any cut determines a cut-set, the set of edges that have one endpoint in each subset of the partition. These edges are said to cross the cut. In a connected graph, each cut-set determines a unique cut, and in some cases cuts are identified with their cut-sets rather than with their vertex partitions.
In graph theory, a minimum cut or min-cut of a graph is a cut that is minimal in some metric.
Maximum cardinality matching is a fundamental problem in graph theory. We are given a graph G, and the goal is to find a matching containing as many edges as possible; that is, a maximum cardinality subset of the edges such that each vertex is adjacent to at most one edge of the subset. As each edge will cover exactly two vertices, this problem is equivalent to the task of finding a matching that covers as many vertices as possible.
In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.
The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioning via minimum cut or maximum cut. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.
In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.
In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.
Consensus clustering is a method of aggregating results from multiple clustering algorithms. Also called cluster ensembles or aggregation of clustering, it refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete, even when the number of input clusterings is three. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning.
In mathematics, the minimum k-cut is a combinatorial optimization problem that requires finding a set of edges whose removal would partition the graph to at least k connected components. These edges are referred to as k-cut. The goal is to find the minimum-weight k-cut. This partitioning can have applications in VLSI design, data-mining, finite elements and communication in parallel computing.
The stochastic block model is a generative model for random graphs. This model tends to produce graphs containing communities, subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation was first introduced in 1983 in the field of social network analysis by Paul W. Holland et al. The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data.
In combinatorial optimization, the matroid parity problem is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the matchoid problem.
In graph theory, the cutwidth of an undirected graph is the smallest integer with the following property: there is an ordering of the vertices of the graph, such that every cut obtained by partitioning the vertices into earlier and later subsets of the ordering is crossed by at most edges. That is, if the vertices are numbered , then for every , the number of edges with and is at most .
{{cite book}}
: CS1 maint: location missing publisher (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)