Property testing

In computer science, a property testing algorithm for a decision problem is an algorithm whose query complexity to its input is much smaller than the instance size of the problem. Typically property testing algorithms are used to distinguish if some combinatorial structure S (such as a graph or a boolean function) satisfies some property P satisfies a "global" property P, or is "far" from having this property meaning an ε-fraction of the representation of S need be modified in order to make S satisfy P, using only a small number of "local" queries to the object. ^[1]^[2]

For example, the following promise problem admits an algorithm whose query complexity is independent of the instance size (for an arbitrary constant ε > 0):

"Given a graph G on n vertices, decide if G is bipartite, or G cannot be made bipartite even after removing an arbitrary subset of at most

\epsilon {\tbinom {n}{2}}

edges of G."

Property testing algorithms are central to the definition of probabilistically checkable proofs, as a probabilistically checkable proof is essentially a proof that can be verified by a property testing algorithm.

Definition and variants

Formally, a property testing algorithm with query complexity q(n) and proximity parameter ε for a decision problem L is a randomized algorithm that, on input x (an instance of L) makes at most q(|x|) queries to x and behaves as follows:

If x is in L, the algorithm accepts x with probability at least ⅔.
If x is ε-far from L, the algorithm rejects x with probability at least ⅔.

Here, "x is ε-far from L" means that the Hamming distance between x and any string in L is at least ε|x|.

A property testing algorithm is said to have one-sided error if it satisfies the stronger condition that the accepting probability for instances x ∈ L is 1 instead of ⅔.

A property testing algorithm is said be non-adaptive if it performs all its queries before it "observes" any answers to previous queries. Such an algorithm can be viewed as operating in the following manner. First the algorithm receives its input. Before looking at the input, using its internal randomness, the algorithm decides which symbols of the input are to be queried. Next, the algorithm observes these symbols. Finally, without making any additional queries (but possibly using its randomness), the algorithm decides whether to accept or reject the input. ^[1]

Features and limitations

The main efficiency parameter of a property testing algorithm is its query complexity, which is the maximum number of input symbols inspected over all inputs of a given length (and all random choices made by the algorithm). One is interested in designing algorithms whose query complexity is as small as possible. In many cases the running time of property testing algorithms is sublinear in the instance length. Typically, the goal is first to make the query complexity as small as possible as a function of the instance size n, and then study the dependency on the proximity parameter ε.

Unlike other complexity-theoretic settings, the asymptotic query complexity of property testing algorithms is affected dramatically by the representation of instances. For example, when ε = 0.01, the problem of testing bipartiteness of dense graphs (which are represented by their adjacency matrix) admits an algorithm of constant query complexity. In contrast, sparse graphs on n vertices (which are represented by their adjacency list) require property testing algorithms of query complexity $\Omega ({\sqrt {n}})$ .

The query complexity of property testing algorithms grows as the proximity parameter ε becomes smaller for all non-trivial properties. This dependence on ε is necessary as a change of fewer than ε symbols in the input cannot be detected with constant probability using fewer than O(1/ε) queries. Many interesting properties of dense graphs can be tested using query complexity that depends only on ε and not on the graph size n. However, the query complexity can grow enormously fast as a function of ε. For example, for a long time the best known algorithm for testing if a graph does not contain any triangle had a query complexity which is a tower function of poly(1/ε), and only in 2010 this has been improved to a tower function of log(1/ε). One of the reasons for this enormous growth in bounds is that many of the positive results for property testing of graphs are established using the Szemerédi regularity lemma, which also has tower-type bounds in its conclusions. The connection of property testing to the Szemerédi regularity lemma and related graph removal lemmas is elaborated on below.

Testing graph properties

For graph a graph G with n vertices, the notion of distance we will use is the edit distance. That is, we say that the distance between two graphs is the smallest ε such that one can add and/or delete $\varepsilon n^{2}$ edges and get from the first graph to the second. Under a reasonable representation of graphs, this is equivalent to the earlier Hamming distance definition (up to possibly a change of constants).

To make precise the general notions of property testing in the context of graphs, we say a tester for graph property P should distinguish with at least ⅔ probability between the cases of G satisfying P and the cases where G is ε-far in edit distance from satisfying P. The tester can access some oracle to query whether a pair of vertices has an edge between the in G or not. The query complexity is the number of such oracle queries. Say the tester has one-sided error if it has false positives and not false negatives, i.e. if G satisfies P, the tester always outputs the correct answer. ^[3]^[4]

Note, the notion of ε-far from P is essential: naively trying to distinguish between graphs that satisfy property P and those that do not fail since they can be one edge apart and the tester cannot tell unless all edges are queried. One example of this is testing triangle-freeness on a graph with exactly one triangle.

Short history

The field of graph property testing was first introduced in a seminal paper by Goldreich, Goldwasser, and Ron in 1998 in which an abstract graph partition problem is analyzed and testers provided, which include as special cases several important graph properties like bipartiteness, k-colorability, having a large clique, and having a large cut. ^[3] In particular, the natural algorithms that sample a subgraph and check whether it satisfy the property are all correct albeit with perhaps suboptimal query complexities.

Since then, several related discoveries have been made

In 1992, Alon, Duke, Lefmann, Rödl, and Yuster showed that for every graph H the property of not contain H as a subgraph is testable. ^[5]
In 1999, Alon, Fischer, Krivelevich, and Szegedy showed that for every graph H the property of not contain H as an induced subgraph subgraph is testable. ^[6]
In 2005, Alon and Shapira showed that any monotone graph property (one that is preserved under vertex and edge deletion) is testable with one-sided error. ^[7]

In 2008, Alon and Shapira exhibited testers with one-sided error for all hereditary graph properties. They also characterized properties are easy to test. Namely, these natural properties are semi-hereditary. These statements will be made clearer below. ^[1]

Testing hereditary graph properties

A graph property is hereditary if it is preserved under deletion of vertices, or equivalently, if it is preserved under taking induced subgraphs, hence the name hereditary. A few important hereditary properties are H-freeness (for some graph H), k-colorability, and planarity. All hereditary properties are testable.

Theorem (Alon & Shapira 2008). Every hereditary graph property is testable with one-sided error. ^[1]

The proof relies on a version of the graph removal lemma for infinite families of induced subgraphs. We reproduce the theorem here without proof. In particular, note that constant $h_{0}$ come up naturally as the size of the samples. Also, the query complexity using this regularity approach is large due to the tower function bound in Szemerédi regularity lemma.

Theorem (Infinite graph removal lemma). For each (possibly infinite) set of graphs

{\mathcal {H}}

and

\varepsilon >0

, there exist

h_{0}

and

\delta >0

so that if

G

is an

n

-vertex graph with fewer than

\delta n^{v(H)}

copies of

H

for every

H\in {\mathcal {H}}

with at most

h_{0}

vertices, then

G

can be made induced

{\mathcal {H}}

-free by adding/removing fewer than

\varepsilon n^{2}

edges. ^[8]

Oblivious testers

Informally, an oblivious tester is oblivious to the size of the input. For a graph property P, it is an algorithm that takes as input a parameter ε and graph G, and then runs as a property testing algorithm on G for the property P with proximity parameter ε that makes exactly q(ε) queries to G.

Definition. An oblivious tester is an algorithm that takes as input a parameter ε. It computes an integer q(ε) and then asks an oracle for an induced subgraph H on exactly q(ε) vertices from G chosen uniformly at random. It then accepts or rejects (possibly randomly) according to ε and H. We say it tests for the property P if it accepts with probability at least ⅔ for G that has property P, and rejects with probability at least ⅔ or G that is ε-far from having property P. ^[1]^[9]^[10]

Crucially, the number of queries an oblivious tester makes is a constant only dependent on ε and not the size of the input graph G. In complete analogy with property testing algorithms, we can talk about oblivious testers with one-sided error.

Testing semi-hereditary graph properties

We can certainly contrive unnatural graph properties such as a graph G satisfies property P if it is bipartite with an even number of vertices or perfect with an odd number of vertices, so a tester must access the number of vertices to decide. In fact, the characterization of graph properties testable by an oblivious tester with one-sided error leads to a class of natural properties.

Definition. A graph property H is semi-hereditary if there exists a hereditary graph property H such that any graph satisfying P satisfies H, and for every

\varepsilon >0

there is an

M(\varepsilon )

such that every graph of size at least

M(\varepsilon )

that is ε-far from satisfying P contains an induced subgraph that does not satisfy H. ^[1]

This characterization partially answers the converse to the other Alon & Shapira theorem above: the properties that easy to test properties (having a oblivious testers with one-sided error) are almost hereditary. In the same paper, they showed

Theorem (Alon & Shapira 2008). A graph property P has an oblivious one-sided error tester if and only if P is semi-hereditary. ^[1]

Example: testing bipartiteness

In this section, we will sketch a natural oblivious tester algorithm for bipartiteness. It has query complexity $O(\log ^{2}(1/(\varepsilon \delta ))/\varepsilon ^{4})$ . The algorithm is as follows:

Bipartite Testing Algorithm
Given graph G, choose a random set X of $q(\varepsilon )=O(\log(1/(\varepsilon \delta ))/\varepsilon ^{2})$ vertices.

For every pair of vertices in X, query if they are adjacent in G.

It accepts if the induced subgraph of G on X is bipartite and reject otherwise. ^[3]

If G is bipartite, then the induced subgraph must also be, so the algorithm always accepts, establishing one-sided error. It is oblivious since it never queries the size of the graph. The contrived number of queries is chosen so the error probability is small. Therefore, the algorithm above is an oblivious tester with one-sided error for bipartiteness.

Example: testing k-colorability

In this section, we will sketch a natural oblivious tester algorithm for k-colorability. It has query complexity $O(k^{4}\log ^{2}(k/\delta )/\varepsilon ^{6})$ . The algorithm is as follows:

k-colorability Testing Algorithm
Given graph G, choose a random set X of $q(\varepsilon )=O(k^{4}\log ^{2}(k/\delta )/\varepsilon ^{3})$ vertices.

For every pair of vertices in X, query if they are adjacent in G.

It accepts if the induced subgraph of G on X is k-colorable and reject otherwise. ^[3]

If G is k-colorable, then the induced subgraph must also be, so the algorithm always accepts, establishing one-sided error. It is oblivious since it never queries the size of the graph. The contrived number of queries is chosen so the error probability is small. Therefore, the algorithm above is an oblivious tester with one-sided error for k-colorability.

Note, query complexity should not be confused with running time. The latter is $\exp(O(k^{4}\log ^{2}(k/\delta )/\varepsilon ^{3}))$ due to a lack of polynomial time k-colorability decision algorithm to run on the induced subgraph. ^[3]

Example: testing triangle-freeness

In this section, we will sketch a proof of the existence of an oblivious tester for triangle-freeness; this proof is an application of the triangle removal lemma. In particular, it tells us that if graph G is ε-far from being triangle-free, there is a (computable) constant $\delta =\delta (\varepsilon )$ so that G has at least $\delta n^{3}$ triangles.

Triangle-freeness Testing Algorithm
Given graph G, choose a random set X of $q(\varepsilon )=1/\delta$ triples of vertices independently at random, where δ is as above.

The algorithm accepts if no triple of vertices induces a triangle, and rejects otherwise. ^[9]

If G is triangle-free, then we can never find a triple forming a triangle, so clearly this algorithm always accepts. If G is ε-far from being triangle-free, then a more than $6/\delta$ fraction of the triples of vertices in G induce a triangle, and then it is not difficult to see that there is greater than ⅔ chance that the algorithm finds at least one triangle. Therefore, the algorithm above is an oblivious tester with one-sided error for triangle-freeness.

References

^ ^a ^b ^c ^d ^e ^f ^g Alon, Noga; Shapira, Asaf (2008). "A characterization of the (natural) graph properties testable with one-sided error" (PDF). SIAM Journal on Computing. 37: 1703–1727.
^ Goldreich, Oded (1999). "Combinatorial Property Testing (a survey)". Randomization Methods in Algorithm Design. 43: 45–59. ISBN 0821870874.
^ ^a ^b ^c ^d ^e Goldreich, Oded; Goldwasser, Shari; Ron, Dana (1 July 1998). "Property testing and its connection to learning and approximation". Journal of the ACM. 45 (4): 653–750. doi:https://doi.org/10.1145/285055.285060. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)
^ Rubinfeld, Ronitt; Shapira, Asaf (2011). "Sublinear Time Algorithms". SIAM Journal on Discrete Mathematics. 25 (4): 1562–1588. CiteSeerX 10.1.1.221.1797. doi:10.1137/100791075.
^ Alon, N.; Duke, R. A.; Lefmann, H.; Rodl, V.; Yuster, R. (1 January 1994). "The Algorithmic Aspects of the Regularity Lemma". Journal of Algorithms. 16 (1): 80–109. doi:https://doi.org/10.1006/jagm.1994.1005. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)
^ Alon, Noga; Fischer, Eldar; Krivelevich, Michael; Szegedy, Mario (1 April 2000). "Efficient Testing of Large Graphs". Combinatorica. 20 (4): 451–476. doi:https://doi.org/10.1007/s004930070001. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)
^ Alon, Noga; Shapira, Asaf (22 May 2005). "Every monotone graph property is testable". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing: 128–137. doi:https://doi.org/10.1145/1060590.1060611. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)
^ Fox, Jacob (2010). "A new proof of the graph removal lemma". arXiv:1006.1300. A bot will complete this citation soon. Click here to jump the queue
^ ^a ^b Goldreich, Oded (2017). Introduction to Property Testing. Cambridge University Press. ISBN 9781107194052.
^ Ron, Dana (2000). Property Testing (Technical report).

[as2008-1] ^ ^a ^b ^c ^d ^e ^f ^g Alon, Noga; Shapira, Asaf (2008). "A characterization of the (natural) graph properties testable with one-sided error" (PDF). SIAM Journal on Computing. 37: 1703–1727.

[g1999-2] Goldreich, Oded (1999). "Combinatorial Property Testing (a survey)". Randomization Methods in Algorithm Design. 43: 45–59. ISBN 0821870874.

[ggr1998-3] Goldreich, Oded; Goldwasser, Shari; Ron, Dana (1 July 1998). "Property testing and its connection to learning and approximation". Journal of the ACM. 45 (4): 653–750. doi:https://doi.org/10.1145/285055.285060. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)

[rs2011-4] Rubinfeld, Ronitt; Shapira, Asaf (2011). "Sublinear Time Algorithms". SIAM Journal on Discrete Mathematics. 25 (4): 1562–1588. CiteSeerX 10.1.1.221.1797. doi:10.1137/100791075.

[adlry1992-5] Alon, N.; Duke, R. A.; Lefmann, H.; Rodl, V.; Yuster, R. (1 January 1994). "The Algorithmic Aspects of the Regularity Lemma". Journal of Algorithms. 16 (1): 80–109. doi:https://doi.org/10.1006/jagm.1994.1005. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)

[afks1999-6] Alon, Noga; Fischer, Eldar; Krivelevich, Michael; Szegedy, Mario (1 April 2000). "Efficient Testing of Large Graphs". Combinatorica. 20 (4): 451–476. doi:https://doi.org/10.1007/s004930070001. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)

[as2005-7] Alon, Noga; Shapira, Asaf (22 May 2005). "Every monotone graph property is testable". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing: 128–137. doi:https://doi.org/10.1145/1060590.1060611. {{cite journal}}: Check |doi= value (help); External link in |doi= (help)

[f2010-8] Fox, Jacob (2010). "A new proof of the graph removal lemma". arXiv:1006.1300. A bot will complete this citation soon. Click here to jump the queue

[g2017-9] Goldreich, Oded (2017). Introduction to Property Testing. Cambridge University Press. ISBN 9781107194052.

[r2000-10] Ron, Dana (2000). Property Testing (Technical report).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]