Property testing

Last updated

Property testing is a field of theoretical computer science, concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects. [1]

Contents

A property testing algorithm for a decision problem is an algorithm whose query complexity (the number of queries made to its input) is much smaller than the instance size of the problem. Typically, property testing algorithms are used to determine whether some combinatorial structure S (such as a graph or a boolean function) satisfies some property P, or is "far" from having this property (meaning that an ε-fraction of the representation of S must be modified to make S satisfy P), using only a small number of "local" queries to the object. [2] [3]

For example, the following promise problem admits an algorithm whose query complexity is independent of the instance size (for an arbitrary constant ε > 0):

"Given a graph on n vertices, decide whether it is bipartite, or cannot be made bipartite even after removing an arbitrary subset of at most εn2 edges."

Property testing algorithms are central to the definition of probabilistically checkable proofs, as a probabilistically checkable proof is essentially a proof that can be verified by a property testing algorithm.

Definition and variants

Formally, a property testing algorithm with query complexity q(n) and proximity parameterε for a decision problem L is a randomized algorithm that, on input x (an instance of L) makes at most q(|x|) queries to x and behaves as follows:

Here, "x is ε-far from L" means that the Hamming distance between x and any string in L is at least ε|x|.

A property testing algorithm is said to have one-sided error if it satisfies the stronger condition that the accepting probability for instances xL is 1 instead of 2/3.

A property testing algorithm is said be non-adaptive if it performs all its queries before it "observes" any answers to previous queries. Such an algorithm can be viewed as operating in the following manner. First the algorithm receives its input. Before looking at the input, using its internal randomness, the algorithm decides which symbols of the input are to be queried. Next, the algorithm observes these symbols. Finally, without making any additional queries (but possibly using its randomness), the algorithm decides whether to accept or reject the input. [2]

Features and limitations

The main efficiency parameter of a property testing algorithm is its query complexity, which is the maximum number of input symbols inspected over all inputs of a given length (and all random choices made by the algorithm). Computer scientists are interested in designing algorithms whose query complexity is as small as possible. In many cases, the running time of property testing algorithms is sublinear in the instance length. Typically, the goal is first to make the query complexity as small as possible as a function of the instance size n, and then study the dependency on the proximity parameter ε.

Unlike other complexity-theoretic settings, the asymptotic query complexity of property testing algorithms is affected dramatically by the representation of instances. For example, when ε = 0.01, the problem of testing bipartiteness of dense graphs (which are represented by their adjacency matrix) admits an algorithm of constant query complexity. In contrast, sparse graphs on n vertices (which are represented by their adjacency list) require property testing algorithms of query complexity Ω(n1/2).

The query complexity of property testing algorithms grows as the proximity parameter ε becomes smaller for all non-trivial properties. This dependence on ε is necessary, as a change of fewer than ε symbols in the input cannot be detected with constant probability using fewer than O(1/ε) queries. Many interesting properties of dense graphs can be tested using query complexity that depends only on ε and not on the graph size n. However, the query complexity can grow enormously fast as a function of ε. For example, for a long time, the best known algorithm for testing whether a graph does not contain any triangle had a query complexity which is a tower function of poly(1/ε), and only in 2010 was this improved to a tower function of log(1/ε). One of the reasons for this enormous growth in bounds is that many of the positive results for property testing of graphs are established using the Szemerédi regularity lemma, which also has tower-type bounds in its conclusions. The connection of property testing to the Szemerédi regularity lemma and related graph removal lemmas is elaborated on below.

Testing graph properties

For a graph G with n vertices, the notion of distance we will use is the edit distance . That is, we say that the distance between two graphs is the smallest ε such that one can add and/or delete εn2 edges and get from the first graph to the second. Under a reasonable representation of graphs, this is equivalent to the earlier Hamming distance definition (up to possibly a change of constants).

To make precise the general notions of property testing in the context of graphs, we say a tester for graph property P should distinguish with at least ⅔ probability between the cases of G satisfying P and the cases where G is ε-far in edit distance from satisfying P. The tester can access some oracle to query whether a pair of vertices has an edge between them in G or not. The query complexity is the number of such oracle queries. Say the tester has one-sided error if it has false positives and not false negatives, i.e. if G satisfies P, the tester always outputs the correct answer. [4] [5]

We can only differentiate between graphs that satisfy P versus those far from P, as opposed to satisfying versus not satisfying P. In the latter case, consider two graphs: G satisfying P and H not satisfying P by changing only a few edges. One example is testing triangle-freeness with H a graph with exactly one triangle and G having one of these edges removed. Then, the tester cannot tell them apart unless it queries every edge, which it cannot do.

Short history

The field of graph property testing was first introduced by Goldreich, Goldwasser, and Ron. In their seminal paper published in 1998, an abstract graph partition problem is analyzed and some testers provided. These include as special cases several important graph properties like bipartiteness, k-colorability, having a large clique, and having a large cut. [4] In particular, the natural algorithms that sample a subgraph and check whether it satisfy the property are all correct, albeit with perhaps-suboptimal query complexities.

Since then, several related discoveries have been made

Testing hereditary graph properties

A graph property is hereditary if it is preserved under deletion of vertices, or equivalently, if it is preserved under taking induced subgraphs. A few important hereditary properties are H-freeness (for some graph H), k-colorability, and planarity. All hereditary properties are testable.

Theorem (Alon & Shapira 2008). Every hereditary graph property is testable with one-sided error. [2]

The proof relies on a version of the graph removal lemma for infinite families of induced subgraphs. The query complexity using this regularity approach is large due to the tower function bound in the Szemerédi regularity lemma .

Theorem (Infinite graph removal lemma). For each (possibly infinite) set of graphs H and ε > 0, there exist h0 and δ > 0 so that, if G is an n-vertex graph with fewer than δnv(H) copies of H for every HH with at most h0 vertices, then G can be made induced H-free by adding/removing fewer than εn2 edges. [9]

Oblivious testers

Informally, an oblivious tester is oblivious to the size of the input. For a graph property P, it is an algorithm that takes as input a parameter ε and graph G, and then runs as a property testing algorithm on G for the property P with proximity parameter ε that makes exactly q(ε) queries to G.

Definition. An oblivious tester is an algorithm that takes as input a parameter ε. It computes an integer q(ε) and then asks an oracle for an induced subgraph H on exactly q(ε) vertices from G chosen uniformly at random. It then accepts or rejects (possibly randomly) according to ε and H. We say it tests for the property P if it accepts with probability at least 2/3 for G that has property P, and rejects with probability at least 2/3 or G that is ε-far from having property P. [2] [1] [10]

Crucially, the number of queries an oblivious tester makes is a constant dependent only on ε and not the size of the input graph G. In complete analogy with property testing algorithms, we can talk about oblivious testers with one-sided error.

Testing semi-hereditary graph properties

We can contrive some graph properties for which a tester must access the number of vertices.

Example. A graph G satisfies property P if it is bipartite with an even number of vertices or perfect with an odd number of vertices. [2]

In this case, the tester cannot even differentiate which property (bipartiteness or perfectness) to test unless it knows the number of vertices. There are many examples of such unnatural properties. In fact, the characterization of graph properties testable by an oblivious tester with one-sided error leads to a class of natural properties.

Definition. A graph property H is semi-hereditary if there exists a hereditary graph property H such that any graph satisfying P satisfies H, and for every ε > 0, there is an M(ε) such that every graph of size at least M(ε) that is ε-far from satisfying P contains an induced subgraph that does not satisfy H. [2]

Trivially, hereditary properties are also semi-hereditary. This characterization partially answers the converse to the other Alon & Shapira theorem above: the properties that are easy to test properties (having oblivious testers with one-sided error) are almost hereditary. In the same paper, they showed that

Theorem (Alon & Shapira 2008). A graph property P has an oblivious one-sided error tester if and only if P is semi-hereditary. [2]

Examples: testing some graph properties

In this section, we will give some natural oblivious testing algorithms with one-sided error for triangle-freeness, bipartiteness, and k-colorability. They are natural in the sense that we follow the natural idea of randomly sampling some subset X of vertices of G and checking whether the graph property holds on the subgraph spanned by X by brute-force search. We have one-sided error since these properties are actually hereditary: if G satisfy the property, so must the induced subgraph spanned by X, so our tester always accepts.

For triangle-freeness, the tester is an application of the triangle removal lemma. In particular, it tells us that if graph G is ε-far from being triangle-free, then there is a (computable) constant δ = δ(ε) so that G has at least δn3 triangles.

Example (Triangle-freeness Testing Algorithm).

  1. Given graph G, choose a random set X of q(ε) = 1/δ triples of vertices independently at random, where δ is as above.
  2. For every triple of vertices in X, query whether all three pairs of vertices are adjacent in G.
  3. The algorithm accepts if no triple of vertices induces a triangle, and rejects otherwise. [1]

For bipartiteness and k-colorability, let δ be the desired upper bound on error probability for the following testers. Note that query complexity should not be confused with running time. The latter is often exponential (as is the case of both) due to a lack of polynomial time decision algorithm to test the property on the induced subgraph. We instead check by brute-force search. [4]

Example (Bipartite Testing Algorithm).

  1. Given graph G, choose a random set X of q(ε) = O(log(1/(εδ))/ε2) vertices.
  2. For every pair of vertices in X, query whether they are adjacent in G.
  3. It accepts if the induced subgraph of G on X is bipartite and rejects otherwise. [4]

Example (k-colorability Testing Algorithm).

  1. Given graph G, choose a random set X of q(ε) = O(k4 log2(k/δ)/ε3) vertices.
  2. For every pair of vertices in X, query if they are adjacent in G.
  3. It accepts if the induced subgraph of G on X is k-colorable and rejects otherwise. [4]

Related Research Articles

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

<span class="mw-page-title-main">Clique problem</span> Task of computing complete subgraphs

In computer science, the clique problem is the computational problem of finding cliques in a graph. It has several different formulations depending on which cliques, and what information about the cliques, should be found. Common formulations of the clique problem include finding a maximum clique, finding a maximum weight clique in a weighted graph, listing all maximal cliques, and solving the decision problem of testing whether a graph contains a clique larger than a given size.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

<span class="mw-page-title-main">Extremal graph theory</span>

Extremal graph theory is a branch of combinatorics, itself an area of mathematics, that lies at the intersection of extremal combinatorics and graph theory. In essence, extremal graph theory studies how global properties of a graph influence local substructure. Results in extremal graph theory deal with quantitative connections between various graph properties, both global and local, and problems in extremal graph theory can often be formulated as optimization problems: how big or small a parameter of a graph can be, given some constraints that the graph has to satisfy? A graph that is an optimal solution to such an optimization problem is called an extremal graph, and extremal graphs are important objects of study in extremal graph theory.

<span class="mw-page-title-main">Perfect graph</span> Graph with tight clique-coloring relation

In graph theory, a perfect graph is a graph in which the chromatic number equals the size of the maximum clique, both in the graph itself and in every induced subgraph. In all graphs, the chromatic number is greater than or equal to the size of the maximum clique, but they can be far apart. A graph is perfect when these numbers are equal, and remain equal after the deletion of arbitrary subsets of vertices.

<span class="mw-page-title-main">Edge coloring</span> Problem of coloring a graphs edges such that meeting edges do not match

In graph theory, a proper edge coloring of a graph is an assignment of "colors" to the edges of the graph so that no two incident edges have the same color. For example, the figure to the right shows an edge coloring of a graph by the colors red, blue, and green. Edge colorings are one of several different types of graph coloring. The edge-coloring problem asks whether it is possible to color the edges of a given graph using at most k different colors, for a given value of k, or with the fewest possible colors. The minimum required number of colors for the edges of a given graph is called the chromatic index of the graph. For example, the edges of the graph in the illustration can be colored by three colors but cannot be colored by two colors, so the graph shown has chromatic index three.

<span class="mw-page-title-main">Szemerédi regularity lemma</span> Concept in extremal graph theory

In extremal graph theory, Szemerédi’s regularity lemma states that a graph can be partitioned into a bounded number of parts so that the edges between parts are regular. The lemma shows that certain properties of random graphs can be applied to dense graphs like counting the copies of a given subgraph within graphs. Endre Szemerédi proved the lemma over bipartite graphs for his theorem on arithmetic progressions in 1975 and for general graphs in 1978. Variants of the lemma use different notions of regularity and apply to other mathematical objects like hypergraphs.

The Christofides algorithm or Christofides–Serdyukov algorithm is an algorithm for finding approximate solutions to the travelling salesman problem, on instances where the distances form a metric space . It is an approximation algorithm that guarantees that its solutions will be within a factor of 3/2 of the optimal solution length, and is named after Nicos Christofides and Anatoliy I. Serdyukov ; the latter discovered it independently in 1976.

In graph theory, a graph is said to be a pseudorandom graph if it obeys certain properties that random graphs obey with high probability. There is no concrete definition of graph pseudorandomness, but there are many reasonable characterizations of pseudorandomness one can consider.

<span class="mw-page-title-main">Triangle-free graph</span> Graph without triples of adjacent vertices

In the mathematical area of graph theory, a triangle-free graph is an undirected graph in which no three vertices form a triangle of edges. Triangle-free graphs may be equivalently defined as graphs with clique number ≤ 2, graphs with girth ≥ 4, graphs with no induced 3-cycle, or locally independent graphs.

<span class="mw-page-title-main">Handshaking lemma</span> Every graph has evenly many odd vertices

In graph theory, the handshaking lemma is the statement that, in every finite undirected graph, the number of vertices that touch an odd number of edges is even. For example, if there is a party of people who shake hands, the number of people who shake an odd number of other people's hands is even. The handshaking lemma is a consequence of the degree sum formula, also sometimes called the handshaking lemma, according to which the sum of the degrees equals twice the number of edges in the graph. Both results were proven by Leonhard Euler in his famous paper on the Seven Bridges of Königsberg that began the study of graph theory.

In mathematics, a hereditary property is a property of an object that is inherited by all of its subobjects, where the meaning of subobject depends on the context. These properties are particularly considered in topology and graph theory, but also in set theory.

In theoretical computer science, the Aanderaa–Karp–Rosenberg conjecture is a group of related conjectures about the number of questions of the form "Is there an edge between vertex and vertex ?" that have to be answered to determine whether or not an undirected graph has a particular property such as planarity or bipartiteness. They are named after Stål Aanderaa, Richard M. Karp, and Arnold L. Rosenberg. According to the conjecture, for a wide class of properties, no algorithm can guarantee that it will be able to skip any questions: any algorithm for determining whether the graph has the property, no matter how clever, might need to examine every pair of vertices before it can give its answer. A property satisfying this conjecture is called evasive.

<span class="mw-page-title-main">Well-covered graph</span> Graph with equal-size maximal independent sets

In graph theory, a well-covered graph is an undirected graph in which the minimal vertex covers all have the same size. Here, a vertex cover is a set of vertices that touches all edges, and it is minimal if removing any vertex from it would leave some edge uncovered. Equivalently, well-covered graphs are the graphs in which all maximal independent sets have equal size. Well-covered graphs were defined and first studied by Michael D. Plummer in 1970.

<span class="mw-page-title-main">Half graph</span> Type of graph in mathematics

In graph theory, a branch of mathematics, a half graph is a special type of bipartite graph. These graphs are called the half graphs because they have approximately half of the edges of a complete bipartite graph on the same vertices. The name was given to these graphs by Paul Erdős and András Hajnal.

<span class="mw-page-title-main">Odd cycle transversal</span>

In graph theory, an odd cycle transversal of an undirected graph is a set of vertices of the graph that has a nonempty intersection with every odd cycle in the graph. Removing the vertices of an odd cycle transversal from a graph leaves a bipartite graph as the remaining induced subgraph.

In graph theory, the graph removal lemma states that when a graph contains few copies of a given subgraph, then all of the copies can be eliminated by removing a small number of edges. The special case in which the subgraph is a triangle is known as the triangle removal lemma.

<span class="mw-page-title-main">Ruzsa–Szemerédi problem</span>

In combinatorial mathematics and extremal graph theory, the Ruzsa–Szemerédi problem or (6,3)-problem asks for the maximum number of edges in a graph in which every edge belongs to a unique triangle. Equivalently it asks for the maximum number of edges in a balanced bipartite graph whose edges can be partitioned into a linear number of induced matchings, or the maximum number of triples one can choose from points so that every six points contain at most two triples. The problem is named after Imre Z. Ruzsa and Endre Szemerédi, who first proved that its answer is smaller than by a slowly-growing factor.

The blow-up lemma, proved by János Komlós, Gábor N. Sárközy, and Endre Szemerédi in 1997, is an important result in extremal graph theory, particularly within the context of the regularity method. It states that the regular pairs in the statement of Szemerédi's regularity lemma behave like complete bipartite graphs in the context of embedding spanning graphs of bounded degree.

References

  1. 1 2 3 Goldreich, Oded (2017). Introduction to Property Testing. Cambridge University Press. ISBN   9781107194052.
  2. 1 2 3 4 5 6 7 8 Alon, Noga; Shapira, Asaf (2008). "A characterization of the (natural) graph properties testable with one-sided error" (PDF). SIAM Journal on Computing. 37 (6): 1703–1727. doi:10.1137/06064888X.
  3. Goldreich, Oded (1999). "Combinatorial Property Testing (a survey)". Randomization Methods in Algorithm Design. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. 43: 45–59. doi: 10.1090/dimacs/043/04 . ISBN   0821870874.
  4. 1 2 3 4 5 Goldreich, Oded; Goldwasser, Shafi; Ron, Dana (1 July 1998). "Property testing and its connection to learning and approximation". Journal of the ACM. 45 (4): 653–750. doi: 10.1145/285055.285060 .
  5. Rubinfeld, Ronitt; Shapira, Asaf (2011). "Sublinear Time Algorithms". SIAM Journal on Discrete Mathematics. 25 (4): 1562–1588. CiteSeerX   10.1.1.221.1797 . doi:10.1137/100791075. S2CID   1319122.
  6. Alon, N.; Duke, R. A.; Lefmann, H.; Rodl, V.; Yuster, R. (1 January 1994). "The Algorithmic Aspects of the Regularity Lemma". Journal of Algorithms. 16 (1): 80–109. doi:10.1006/jagm.1994.1005.
  7. Alon, Noga; Fischer, Eldar; Krivelevich, Michael; Szegedy, Mario (1 April 2000). "Efficient Testing of Large Graphs". Combinatorica. 20 (4): 451–476. doi:10.1007/s004930070001.
  8. Alon, Noga; Shapira, Asaf (22 May 2005). "Every monotone graph property is testable". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. pp. 128–137. doi:10.1145/1060590.1060611. ISBN   1581139608. S2CID   14096855.
  9. Fox, Jacob (2010). "A new proof of the graph removal lemma". arXiv: 1006.1300 [math.CO].
  10. Ron, Dana (2000). Property Testing (Technical report).