Planted clique

Last updated
A 15-vertex planted clique (blue vertices and upper edges) in a 32-vertex random graph (all vertices and lower edges). Every pair of blue vertices is adjacent; the remaining pairs are adjacent randomly with probability 1/2. Planted clique 15,32.svg
A 15-vertex planted clique (blue vertices and upper edges) in a 32-vertex random graph (all vertices and lower edges). Every pair of blue vertices is adjacent; the remaining pairs are adjacent randomly with probability 1/2.

In computational complexity theory, a planted clique or hidden clique in an undirected graph is a clique formed from another graph by selecting a subset of vertices and adding edges between each pair of vertices in the subset. The planted clique problem is the algorithmic problem of distinguishing random graphs from graphs that have a planted clique. This is a variation of the clique problem; it may be solved in quasi-polynomial time but is conjectured not to be solvable in polynomial time for intermediate values of the clique size. The conjecture that no polynomial time solution exists is called the planted clique conjecture; it has been used as a computational hardness assumption.

Contents

Definition

A clique in a graph is a subset of vertices, all of which are adjacent to each other. A planted clique is a clique created from another graph by adding edges between all pairs of a selected subset of vertices.

The planted clique problem can be formalized as a decision problem over a random distribution on graphs, parameterized by two numbers, n (the number of vertices), and k (the size of the clique). These parameters may be used to generate a graph, by the following random process: [1]

  1. Create an Erdős–Rényi random graph on n vertices by choosing independently for each pair of vertices whether to include an edge connecting that pair, with probability 1/2 for each pair.
  2. Decide whether or not to add a clique to the graph, with probability 1/2; if not, return the graph formed in step 1.
  3. Choose randomly a subset of k of the n vertices and add an edge (if one is not already present) between each pair of the selected vertices.

The problem is then to determine algorithmically whether one of the graphs resulting from this process contains a clique of at least k vertices.

Upper and lower bounds

There exists a function such that asymptotically almost surely, the size of the largest clique in an n-vertex random graph is either or , [2] and there exists some constant such that the expected number of cliques of size converges to infinity. Consequently, one should expect that the planting a clique of size cannot be detected with high probability.

By the central limit theorem, the vertex degrees of the random graph would be distributed close to a standard normal distribution with mean and standard deviation . Consequently, when is on the order of it would create a detectable change in the shape of the distribution. Namely, if you plot out the vertex degree distribution, it would look like a deformed bell curve. Therefore, the most interesting range of values for the parameter k is between these two values, [1]

Algorithms

Large cliques

For sufficiently large values of the parameter k, the planted clique problem can be solved (with high probability) in polynomial time. [1]

Kučera (1995) observes that, when then almost surely all vertices of the planted clique have higher degree than all vertices outside the clique, making the clique very easy to find. He describes a modification to the random process for generating planted clique instances, that makes the vertex degrees more uniform even for large values of k, but shows that despite this modification the planted clique may still be found quickly. [3]

Alon, Krivelevich & Sudakov (1998) prove for a planted clique can be found with high probability by the following method:

  1. Compute the eigenvector of the adjacency matrix corresponding to its second highest eigenvalue.
  2. Select the k vertices whose coordinates in this eigenvector have the largest absolute values.
  3. Return the set of vertices that are adjacent to at least 3/4 of the selected vertices.

They show how to modify this technique so that it continues to work whenever k is at least proportional to some multiple of the square root of the number of vertices. [4] Large planted cliques can also be found using semidefinite programming. [5] A combinatorial technique based on randomly sampling vertices can achieve the same bound on k and runs in linear time. [6]

Quasipolynomial time

It is also possible to solve the planted clique problem, regardless of the choice of k, in quasi-polynomial time. [7] Because the largest clique in a random graph typically has size near 2 log2n, [8] a planted clique of size k (if it exists) can be found with high probability by the following method:

  1. Loop through all sets S of vertices.
  2. For each choice of S, test whether S is a clique. If it is, and , return S. Otherwise, find the set T of vertices that are adjacent to all vertices in S. If , return T.

The running time of this algorithm is quasipolynomial, because there are quasipolynomially many choices of S to loop over. This method is guaranteed to try a set S that is a subset of the planted clique; with high probability, the set T will consist only of other members of the planted clique.

As a hardness assumption

The planted clique conjecture is the conjecture that there is no polynomial time algorithm that takes as input graphs produced by the planted clique process and distinguishes the ones with planted cliques from the ones that don't have planted cliques with probability significantly better than random chance. [9]

Hazan & Krauthgamer (2011) used the assumption that finding planted cliques is hard as a computational hardness assumption to prove that, if so, it is also hard to approximate the best Nash equilibrium in a two-player game. [7] The planted clique conjecture has also been used as a hardness assumption to show the difficulty of property testing k-independence of random distributions, [10] finding clusters in social networks, [11] and machine learning. [12]

Related Research Articles

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

<span class="mw-page-title-main">Graph coloring</span> Methodic assignment of colors to elements of a graph

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

<span class="mw-page-title-main">Independent set (graph theory)</span> Unrelated vertices in graphs

In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set of vertices such that for every two vertices in , there is no edge connecting the two. Equivalently, each edge in the graph has at most one endpoint in . A set is independent if and only if it is a clique in the graph's complement. The size of an independent set is the number of vertices it contains. Independent sets have also been called "internally stable sets", of which "stable set" is a shortening.

<span class="mw-page-title-main">Vertex cover</span> Subset of a graphs vertices, including at least one endpoint of every edge

In graph theory, a vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of the graph.

<span class="mw-page-title-main">Hadwiger conjecture (graph theory)</span> Unproven generalization of the four-color theorem

In graph theory, the Hadwiger conjecture states that if is loopless and has no minor then its chromatic number satisfies . It is known to be true for . The conjecture is a generalization of the four-color theorem and is considered to be one of the most important and challenging open problems in the field.

<span class="mw-page-title-main">Hadwiger number</span> Size of largest complete graph made by contracting edges of a given graph

In graph theory, the Hadwiger number of an undirected graph G is the size of the largest complete graph that can be obtained by contracting edges of G. Equivalently, the Hadwiger number h(G) is the largest number n for which the complete graph Kn is a minor of G, a smaller graph obtained from G by edge contractions and vertex and edge deletions. The Hadwiger number is also known as the contraction clique number of G or the homomorphism degree of G. It is named after Hugo Hadwiger, who introduced it in 1943 in conjunction with the Hadwiger conjecture, which states that the Hadwiger number is always at least as large as the chromatic number of G.

<span class="mw-page-title-main">Dominating set</span> Subset of a graphs nodes such that all other nodes link to at least one

In graph theory, a dominating set for a graph G is a subset D of its vertices, such that any vertex of G is either in D, or has a neighbor in D. The domination numberγ(G) is the number of vertices in a smallest dominating set for G.

<span class="mw-page-title-main">Maximal independent set</span> Independent set which is not a subset of any other independent set

In graph theory, a maximal independent set (MIS) or maximal stable set is an independent set that is not a subset of any other independent set. In other words, there is no vertex outside the independent set that may join it because it is maximal with respect to the independent set property.

<span class="mw-page-title-main">Feedback arc set</span> Edges that hit all cycles in a graph

In graph theory and graph algorithms, a feedback arc set or feedback edge set in a directed graph is a subset of the edges of the graph that contains at least one edge out of every cycle in the graph. Removing these edges from the graph breaks all of the cycles, producing an acyclic subgraph of the given graph, often called a directed acyclic graph. A feedback arc set with the fewest possible edges is a minimum feedback arc set and its removal leaves a maximum acyclic subgraph; weighted versions of these optimization problems are also used. If a feedback arc set is minimal, meaning that removing any edge from it produces a subset that is not a feedback arc set, then it has an additional property: reversing all of its edges, rather than removing them, produces a directed acyclic graph.

In computational complexity theory, a computational hardness assumption is the hypothesis that a particular problem cannot be solved efficiently. It is not known how to prove (unconditional) hardness for essentially any useful problem. Instead, computer scientists rely on reductions to formally relate the hardness of a new or complicated problem to a computational hardness assumption about a problem that is better-understood.

<span class="mw-page-title-main">Triangle-free graph</span> Graph without triples of adjacent vertices

In the mathematical area of graph theory, a triangle-free graph is an undirected graph in which no three vertices form a triangle of edges. Triangle-free graphs may be equivalently defined as graphs with clique number ≤ 2, graphs with girth ≥ 4, graphs with no induced 3-cycle, or locally independent graphs.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In computer science and graph theory, the term color-coding refers to an algorithmic technique which is useful in the discovery of network motifs. For example, it can be used to detect a simple path of length k in a given graph. The traditional color-coding algorithm is probabilistic, but it can be derandomized without much overhead in the running time.

In the mathematical field of graph theory, the intersection number of a graph is the smallest number of elements in a representation of as an intersection graph of finite sets. In such a representation, each vertex is represented as a set, and two vertices are connected by an edge whenever their sets have a common element. Equivalently, the intersection number is the smallest number of cliques needed to cover all of the edges of .

In graph theory, a branch of mathematics, the Erdős–Hajnal conjecture states that families of graphs defined by forbidden induced subgraphs have either large cliques or large independent sets. It is named for Paul Erdős and András Hajnal.

In the mathematical fields of graph theory and finite model theory, the logic of graphs deals with formal specifications of graph properties using sentences of mathematical logic. There are several variations in the types of logical operation that can be used in these sentences. The first-order logic of graphs concerns sentences in which the variables and predicates concern individual vertices and edges of a graph, while monadic second-order graph logic allows quantification over sets of vertices or edges. Logics based on least fixed point operators allow more general predicates over tuples of vertices, but these predicates can only be constructed through fixed-point operators, restricting their power.

<span class="mw-page-title-main">Matroid parity problem</span> Largest independent set of paired elements

In combinatorial optimization, the matroid parity problem is a problem of finding the largest independent set of paired elements in a matroid. The problem was formulated by Lawler (1976) as a common generalization of graph matching and matroid intersection. It is also known as polymatroid matching, or the matchoid problem.

The small set expansion hypothesis or small set expansion conjecture in computational complexity theory is an unproven computational hardness assumption. Under the small set expansion hypothesis it is assumed to be computationally infeasible to distinguish between a certain class of expander graphs called "small set expanders" and other graphs that are very far from being small set expanders. This assumption implies the hardness of several other computational problems, and the optimality of certain known approximation algorithms.

References

  1. 1 2 3 Arora, Sanjeev; Barak, Boaz (2009), Computational Complexity: A Modern Approach, Cambridge University Press, pp. 362–363, ISBN   9780521424264 .
  2. Bollobas, B.; Erdös, P. (November 1976), "Cliques in random graphs", Mathematical Proceedings of the Cambridge Philosophical Society, 80 (3): 419–427, Bibcode:1976MPCPS..80..419B, doi:10.1017/S0305004100053056, ISSN   1469-8064, S2CID   16619643
  3. Kučera, Luděk (1995), "Expected complexity of graph partitioning problems", Discrete Applied Mathematics, 57 (2–3): 193–212, doi:10.1016/0166-218X(94)00103-K, hdl: 11858/00-001M-0000-0014-B73F-2 , MR   1327775 .
  4. Alon, Noga; Krivelevich, Michael; Sudakov, Benny (1998), "Finding a large hidden clique in a random graph", Random Structures & Algorithms, 13 (3–4): 457–466, CiteSeerX   10.1.1.24.6419 , doi:10.1002/(SICI)1098-2418(199810/12)13:3/4<457::AID-RSA14>3.3.CO;2-K, MR   1662795
  5. Feige, U.; Krauthgamer, R. (2000), "Finding and certifying a large hidden clique in a semirandom graph", Random Structures and Algorithms, 16 (2): 195–208, doi:10.1002/(SICI)1098-2418(200003)16:2<195::AID-RSA5>3.0.CO;2-A .
  6. Dekel, Yael; Gurel-Gurevich, Ori; Peres, Yuval (2014), "Finding hidden cliques in linear time with high probability", Combinatorics, Probability and Computing, 23 (1): 29–49, arXiv: 1010.2997 , doi:10.1017/S096354831300045X, MR   3197965, S2CID   14356678 .
  7. 1 2 Hazan, Elad; Krauthgamer, Robert (2011), "How hard is it to approximate the best Nash equilibrium?", SIAM Journal on Computing, 40 (1): 79–91, CiteSeerX   10.1.1.511.4422 , doi:10.1137/090766991, MR   2765712 .
  8. Grimmett, G. R.; McDiarmid, C. J. H. (1975), "On colouring random graphs", Mathematical Proceedings of the Cambridge Philosophical Society, 77 (2): 313–324, Bibcode:1975MPCPS..77..313G, doi:10.1017/S0305004100051124, MR   0369129, S2CID   3421302 .
  9. Braverman, Mark; Ko, Young Kun; Rubinstein, Aviad; Weinstein, Omri (2015), ETH hardness for densest-k-subgraph with perfect completeness, arXiv: 1504.08352 , Bibcode:2015arXiv150408352B .
  10. Alon, Noga; Andoni, Alexandr; Kaufman, Tali; Matulef, Kevin; Rubinfeld, Ronitt; Xie, Ning (2007), "Testing k-wise and almost k-wise independence", STOC'07—Proceedings of the 39th Annual ACM Symposium on Theory of Computing, New York: ACM, pp. 496–505, doi:10.1145/1250790.1250863, ISBN   9781595936318, MR   2402475, S2CID   5050980 .
  11. Balcan, Maria-Florina; Borgs, Christian; Braverman, Mark; Chayes, Jennifer; Teng, Shang-Hua (2013), "Finding Endogenously Formed Communities", Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '13), SIAM, pp. 767–783, ISBN   978-1-611972-51-1 .
  12. Berthet, Quentin; Rigollet, Philippe (2013), "Complexity theoretic lower bounds for sparse principal component detection", Conference on Learning Theory, Journal of Machine Learning Research, 30: 1046–1066.