Dense subgraph

Last updated
An example of a graph G with density dG = 1.375 and its densest subgraph induced by the vertices b, c, d, e and h in red with density 1.4 . Dense subgraph.png
An example of a graph G with density dG = 1.375 and its densest subgraph induced by the vertices b, c, d, e and h in red with density 1.4 .

In graph theory and computer science, a dense subgraph is a subgraph with many edges per vertex. This is formalized as follows: let G = (V, E) be an undirected graph and let S = (VS, ES) be a subgraph of G. Then the density of S is defined to be:

Contents

The density of the maximally dense subgraph of a graph is sometimes referred to as its subgraph density. A subgraph with maximal density can also be seen as a subgraph with maximal average degree in the graph.

Subgraph density is asymptotic to the related notion of arboricity and to graph degeneracy.

Densest subgraph

The densest subgraph problem is that of finding a subgraph of maximum density. In 1984, Andrew V. Goldberg developed a polynomial time algorithm to find the maximum density subgraph using a max flow technique. This has been improved by Gallo, Grigoriadis and Tarjan in 1989 [1] to run in O(nm log(n2/m)) time. A simple LP for finding the optimal solution was given by Charikar in 2000. [2]

Many of the exact algorithms for solving the densest subgraph problem are impractical on real-world data [3] , which has led to the study of approximation algorithms for the densest subgraph problem. A simple approximation for finding the densest subgraph was given by Charikar in 2000, based on a peeling procedure which was first proposed by Asahiro, Iwama, Tamaki, and Tokuyama in 1996 as an approximation algorithm for the densest subgraph problem. [4] In this algorithm, the vertex with the lowest degree is repeatedly removed, creating an ordering of vertices , where is the th vertex in the graph to be removed. The subgraph returned by the algorithm is the graph induced by the set with the highest density. By using the dual of the LP for the exact algorithm he provided, Charikar proved that this procedure runs in linear time and yields a subgraph with at least 50% of the optimal density. [2] Though 50% is a tight bound, in practice, this greedy peeling procedure yields about 80% of the optimal density on real-world graphs. [3]

In 2020, Boob et al. gave an iterative peeling algorithm that aims to get closer to the optimal subgraph by repeated the peeling procedure multiple times. [3] Instead of removing the vertices based on their current degree, a load is assigned to each vertex based on data from previous iterations, and vertices are peeled based on their loads. In 2022, Chekuri, Quanrud, and Torres proved that this procedure converges to a approximation for the densest subgraph problem after iterations of the algorithm, where is the optimal density and is the maximum degree in the graph. [5] They also showed that a similar algorithm could be used to find densest hypergraphs.

Densest k subgraph

There are many variations on the densest subgraph problem. One of them is the densest k subgraph problem, where the objective is to find the maximum density subgraph on exactly k vertices. This problem generalizes the clique problem and is thus NP-hard in general graphs. There exists a polynomial algorithm approximating the densest k subgraph within a ratio of for every , [6] while it does not admit an -approximation in polynomial time unless the exponential time hypothesis is false. [7] Under a weaker assumption that , no PTAS exists for the problem. [8]

The problem remains NP-hard in bipartite graphs and chordal graphs but is polynomial for trees and split graphs. [9] It is open whether the problem is NP-hard or polynomial in (proper) interval graphs and planar graphs; however, a variation of the problem in which the subgraph is required to be connected is NP-hard in planar graphs. [10]

Densest at most k subgraph

The objective of the densest at most problem is to find the maximum density subgraph on at most vertices. Andersen and Chellapilla showed that if there exists an -approximation for this problem then that will lead to an -approximation for the densest subgraph problem. [11] Later, this was improved by Khuller and Saha who showed that an -approximation for densest at most subgraph implies a -approximation for the densest subgraph problem. [12]

Densest at least k subgraph

The densest at least problem is defined similarly to the densest at most subgraph problem. The problem is NP-complete, [12] but admits 2-approximation in polynomial time. [13] Moreover, there is some evidence that this approximation algorithm is essentially the best possible: assuming the small set expansion hypothesis (a computational complexity assumption closely related to the unique games conjecture), then it is NP-hard to approximate the problem to within factor for every constant . [14]

K-clique densest subgraph

Charalampos Tsourakakis introduced the -clique densest subgraph problem. This variation of the densest subgraph problem aims to maximize the average number of induced cliques , where is the set of -cliques induced by . Notice that the densest subgraph problem is obtained as a special case for . This generalization provides an empirically successful poly-time approach for extracting large near-cliques from large-scale real-world networks.

Locally top-k densest subgraph

Qin et al. introduced the problem of top-k locally densest subgraphs discovery in a graph, each of which achieves the highest density in its local region in the graph: it is neither contained in any supergraph with the same or larger density, nor it contains subgraphs with density being loosely connected with the rest of the local densest subgraph. Note that the densest subgraph problem is obtained as a special case for . The set of locally densest subgraphs in a graph can be computed in polynomial time.

Related Research Articles

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

<span class="mw-page-title-main">Steiner tree problem</span> On short connecting networks with added vertices

In combinatorial mathematics, the Steiner tree problem, or minimum Steiner tree problem, named after Jakob Steiner, is an umbrella term for a class of problems in combinatorial optimization. While Steiner tree problems may be formulated in a number of settings, they all require an optimal interconnect for a given set of objects and a predefined objective function. One well-known variant, which is often used synonymously with the term Steiner tree problem, is the Steiner tree problem in graphs. Given an undirected graph with non-negative edge weights and a subset of vertices, usually referred to as terminals, the Steiner tree problem in graphs requires a tree of minimum weight that contains all terminals and minimizes the total weight of its edges. Further well-known variants are the Euclidean Steiner tree problem and the rectilinear minimum Steiner tree problem.

<span class="mw-page-title-main">Independent set (graph theory)</span> Unrelated vertices in graphs

In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.

<span class="mw-page-title-main">Vertex cover</span> Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

<i>k</i>-minimum spanning tree

The k-minimum spanning tree problem, studied in theoretical computer science, asks for a tree of minimum cost that has exactly k vertices and forms a subgraph of a larger graph. It is also called the k-MST or edge-weighted k-cardinality tree. Finding this tree is NP-hard, but it can be approximated to within a constant approximation ratio in polynomial time.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

<span class="mw-page-title-main">Feedback arc set</span> Edges that hit all cycles in a graph

In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks all of the cycles, producing an acyclic subgraph of the given graph, often called a directed acyclic graph. A feedback arc set with the fewest possible edges is a minimum feedback arc set and its removal leaves a maximum acyclic subgraph; weighted versions of these optimization problems are also used. If a feedback arc set is minimal, meaning that removing any edge from it produces a subset that is not a feedback arc set, then it has an additional property: reversing all of its edges, rather than removing them, produces a directed acyclic graph.

In computational complexity theory, the unique games conjecture is a conjecture made by Subhash Khot in 2002. The conjecture postulates that the problem of determining the approximate value of a certain type of game, known as a unique game, has NP-hard computational complexity. It has broad applications in the theory of hardness of approximation. If the unique games conjecture is true and P ≠ NP, then for many important problems it is not only impossible to get an exact solution in polynomial time, but also impossible to get a good polynomial-time approximation. The problems for which such an inapproximability result would hold include constraint satisfaction problems, which crop up in a wide variety of disciplines.

In graph theory and theoretical computer science, a maximum common induced subgraph of two graphs G and H is a graph that is an induced subgraph of both G and H, and that has as many vertices as possible.

In graph theory, a clique cover or partition into cliques of a given undirected graph is a collection of cliques that cover the whole graph. A minimum clique cover is a clique cover that uses as few cliques as possible. The minimum k for which a clique cover exists is called the clique cover number of the given graph.

In graph theory and theoretical computer science, the longest path problem is the problem of finding a simple path of maximum length in a given graph. A path is called simple if it does not have any repeated vertices; the length of a path may either be measured by its number of edges, or by the sum of the weights of its edges. In contrast to the shortest path problem, which can be solved in polynomial time in graphs without negative-weight cycles, the longest path problem is NP-hard and the decision version of the problem, which asks whether a path exists of at least some given length, is NP-complete. This means that the decision problem cannot be solved in polynomial time for arbitrary graphs unless P = NP. Stronger hardness results are also known showing that it is difficult to approximate. However, it has a linear time solution for directed acyclic graphs, which has important applications in finding the critical path in scheduling problems.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.

In computational complexity theory and the analysis of algorithms, an algorithm is said to take quasi-polynomial time if its time complexity is quasi-polynomially bounded. That is, there should exist a constant such that the worst-case running time of the algorithm, on inputs of size , has an upper bound of the form

In theoretical computer science, Baker's technique is a method for designing polynomial-time approximation schemes (PTASs) for problems on planar graphs. It is named after Brenda Baker, who announced it in a 1983 conference and published it in the Journal of the ACM in 1994.

<span class="mw-page-title-main">Planted clique</span> Complete subgraph added to a random graph

In computational complexity theory, a planted clique or hidden clique in an undirected graph is a clique formed from another graph by selecting a subset of vertices and adding edges between each pair of vertices in the subset. The planted clique problem is the algorithmic problem of distinguishing random graphs from graphs that have a planted clique. This is a variation of the clique problem; it may be solved in quasi-polynomial time but is conjectured not to be solvable in polynomial time for intermediate values of the clique size. The conjecture that no polynomial time solution exists is called the planted clique conjecture; it has been used as a computational hardness assumption.

<span class="mw-page-title-main">Matroid parity problem</span> Largest independent set of paired elements

In combinatorial optimization, the matroid parity problem is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the matchoid problem.

<span class="mw-page-title-main">Odd cycle transversal</span>

In graph theory, an odd cycle transversal of an undirected graph is a set of vertices of the graph that has a nonempty intersection with every odd cycle in the graph. Removing the vertices of an odd cycle transversal from a graph leaves a bipartite graph as the remaining induced subgraph.

The twin-width of an undirected graph is a natural number associated with the graph, used to study the parameterized complexity of graph algorithms. Intuitively, it measures how similar the graph is to a cograph, a type of graph that can be reduced to a single vertex by repeatedly merging together twins, vertices that have the same neighbors. The twin-width is defined from a sequence of repeated mergers where the vertices are not required to be twins, but have nearly equal sets of neighbors.

A parameterized approximation algorithm is a type of algorithm that aims to find approximate solutions to NP-hard optimization problems in polynomial time in the input size and a function of a specific parameter. These algorithms are designed to combine the best aspects of both traditional approximation algorithms and fixed-parameter tractability.

References

  1. Gallo, Giorgio; Grigoriadis, Michael D.; Tarjan, Robert E. (1989), "A fast parametric maximum flow algorithm and applications", SIAM Journal on Computing, 18 (1): 30–55, doi:10.1137/0218003, MR   0978165
  2. 1 2 Charikar, Moses (2000), "Greedy approximation algorithms for finding dense components in a graph", in Jansen, Klaus; Khuller, Samir (eds.), Approximation Algorithms for Combinatorial Optimization, Third International Workshop, APPROX 2000, Saarbrücken, Germany, September 5-8, 2000, Proceedings, Lecture Notes in Computer Science, vol. 1913, Springer, pp. 84–95, doi:10.1007/3-540-44436-X_10, ISBN   978-3-540-67996-7
  3. 1 2 3 Boob, Digvijay; Gao, Yu; Peng, Richard; Sawlani, Saurabh; Tsourakakis, Charalampos; Wang, Di; Wang, Junxing (2020-04-20). "Flowless: Extracting Densest Subgraphs Without Flow Computations". Proceedings of The Web Conference 2020. WWW '20. New York, NY, USA: Association for Computing Machinery: 573–583. doi:10.1145/3366423.3380140. ISBN   978-1-4503-7023-3.
  4. Asahiro, Yuichi; Iwama, Kazuo; Tamaki, Hisao; Tokuyama, Takeshi (1996). Karlsson, Rolf; Lingas, Andrzej (eds.). "Greedily finding a dense subgraph". Algorithm Theory — SWAT'96. Berlin, Heidelberg: Springer: 136–148. doi:10.1007/3-540-61422-2_127. ISBN   978-3-540-68529-6.
  5. Chekuri, Chandra; Quanrud, Kent; Torres, Manuel R. (January 2022), "Densest Subgraph: Supermodularity, Iterative Peeling, and Flow", Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Proceedings, Society for Industrial and Applied Mathematics, pp. 1531–1555, doi:10.1137/1.9781611977073.64 , retrieved 2024-12-12
  6. Bhaskara, Aditya; Charikar, Moses; Chlamtáč, Eden; Feige, Uriel; Vijayaraghavan, Aravindan (2010), "Detecting high log-densities—an O(n1/4) approximation for densest k-subgraph", STOC'10—Proceedings of the 2010 ACM International Symposium on Theory of Computing, ACM, New York, pp. 201–210, doi:10.1145/1806689.1806719, ISBN   9781450300506, MR   2743268, S2CID   1391318 .
  7. Manurangsi, Pasin (2017), "Almost-polynomial ratio ETH-hardness of approximating densest k-subgraph", STOC'17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ACM, pp. 954–961, arXiv: 1611.05991 , doi:10.1145/3055399.3055412, ISBN   9781450345286, S2CID   1892186 .
  8. Khot, Subhash (2006), "Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique", SIAM Journal on Computing , 36 (4): 1025–1071, CiteSeerX   10.1.1.114.3324 , doi:10.1137/S0097539705447037, MR   2272270, S2CID   16514252 .
  9. Corneil, D. G.; Perl, Y. (1984), "Clustering and domination in perfect graphs", Discrete Applied Mathematics , 9 (1): 27–39, doi: 10.1016/0166-218X(84)90088-X , MR   0754426 .
  10. Keil, J. Mark; Brecht, Timothy B. (1991), "The complexity of clustering in planar graphs" (PDF), Journal of Combinatorial Mathematics and Combinatorial Computing, 9: 155–159, MR   1111849 .
  11. Andersen, Reid; Chellapilla, Kumar (2009). "Finding Dense Subgraphs with Size Bounds". In Avrachenkov, Konstantin; Donato, Debora; Litvak, Nelly (eds.). Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. pp. 25–37. doi:10.1007/978-3-540-95995-3_3. ISBN   978-3-540-95995-3.
  12. 1 2 Khuller, Samir; Saha, Barna (2009), "On finding dense subgraphs" (PDF), Automata, Languages and Programming: 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I, Lecture Notes in Computer Science, vol. 5555, Berlin: Springer-Verlag, pp. 597–608, CiteSeerX   10.1.1.722.843 , doi:10.1007/978-3-642-02927-1_50, ISBN   978-3-642-02926-4, MR   2544878
  13. Andersen, Reid (2007), Finding large and small dense subgraphs, arXiv: cs/0702032 , Bibcode:2007cs........2032A
  14. Manurangsi, Pasin (2018), "Inapproximability of maximum biclique problems, minimum k-cut and densest at-least-k-subgraph from the small set expansion hypothesis", Algorithms, 11 (1): 10, arXiv: 1705.03581 , doi: 10.3390/a11010010 , MR   3758880

Further reading