Incompressibility method

In mathematics, the incompressibility method is a proof method like the probabilistic method, the counting method or the pigeonhole principle. To prove that an object in a certain class (on average) satisfies a certain property, select an object of that class which is incompressible. If it does not satisfy the property, it can be compressed by computable coding. Since it can be generally proven that almost all objects in a given class are incompressible, the argument demonstrates that almost all objects in the class have the property involved (not just the average). To select an incompressible object is ineffective, and cannot be done by a computer program. However, a simple counting argument usually shows that almost all objects of a given class can be compressed by only a few bits (are incompressible).

History

The incompressibity method depends on an objective, fixed notion of incompressibility. Such a notion was provided by the Kolmogorov complexity theory, named for Andrey Kolmogorov.^[1]

The Kolmogorov complexity of an object, represented by a finite, binary string, is the length of the shortest binary program on a fixed, optimal universal Turing machine. Since the machine is fixed and the program concerned is the shortest, the Kolmogorov complexity is a definite positive integer. The Kolmogorov complexity of an object is the length of the shortest binary program from which it can be computed. Therefore, it is a lower bound on the length of a computably-compressed version (in bits) of that object by any existing (or future) compression program.

One of the first uses of the incompressibility method with Kolmogorov complexity its theory of computation was to^[2] prove that the running time of a one-tape Turing machine is quadratic for accepting a palindromic language and sorting algorithms require at least $n\log n$ time to sort $n$ items. The first influential paper using the incompressibility method was published in 1980.^[3] The method was applied to a number of fields, and its name was coined in a textbook.^[4]

Applications

Number theory

According to an elegant Euclidian proof, there is an infinite number of prime numbers. Bernhard Riemann demonstrated that the number of primes less than a given number is connected with the 0s of the Riemann zeta function. Jacques Hadamard and Charles Jean de la Vallée-Poussin proved in 1896 that this number of primes is asymptotic to $n/\ln n$ ; see Prime number theorem (use $\ln$ for the natural logarithm an $\log$ for the binary logarithm). Using the incompressibility method, G. J. Chaitin argued as follows: Each $n$ can be described by its prime factorization $n=p_{1}^{n_{1}}\cdots p_{k}^{n_{k}}$ (which is unique), where $p_{1},\ldots ,p_{k}$ are the first $k$ primes which are (at most) $n$ and the exponents (possibly) 0. Each exponent is (at most) $\log n$ , and can be described by $\log \log n$ bits. The description of $n$ can be given in $k\log \log n$ bits, provided we know the value of $\log \log n$ (enabling one to parse the consecutive blocks of exponents). To describe $\log \log n$ requires only $\log \log \log n$ bits. Using the incompressibility of most positive integers, for each $k>0$ there is a positive integer $n$ of binary length $l\approx \log n$ which cannot be described in fewer than $l$ bits. This shows that the number of primes, $\pi (n)$ less than $n$ , satisfies

\pi (n)\geq {\frac {\log n}{\log \log n}}-o(1).

A more-sophisticated approach attributed to Piotr Berman (present proof partially by John Tromp) describes every incompressible $n$ by $k$ and $n/p_{k}$ , where $p_{k}$ is the largest prime number dividing $n$ . Since $n$ is incompressible, the length of this description must exceed $\log n$ . To parse the first block of the description $k$ must be given in prefix form $P(k)=\log k+\log \log k+\log \varepsilon (k)$ , where $\varepsilon (k)$ is an arbitrary, small, positive function. Therefore, $\log p_{k}\leq P(k)$ . Hence, $p_{k}\leq n_{k}$ with $n_{k}=\varepsilon (k)k\log k$ for a special sequence of values $n_{1},n_{2},\ldots$ . This shows that the expression below holds for this special sequence, and a simple extension shows that it holds for every $n>0$ :

\pi (n)\geq {\frac {n}{\varepsilon (n)\log n}}.

Both proofs are presented in more detail.^[4]

Graph theory

A labeled graph $G=(V,E)$ with $n$ nodes can be represented by a string $E(G)$ of ${n \choose 2}$ bits, where each bit indicates the presence (or absence) of an edge between the pair of nodes in that position. $K(G)\geq {n \choose 2}$ , and the degree $d$ of each vertex satisfies

|d-n/2|=O\left({\sqrt {n\log n}}\right).

To prove this by the incompressibity method, if the deviation is larger we can compress the description of $G$ below $K(G)$ ; this provides the required contradiction. This theorem is required in a more complicated proof, where the incompressibility argument is used a number of times to show that the number of unlabeled graphs is

\sim {\frac {2^{n \choose 2}}{n!}}.

^[5]

Combinatorics

A transitive tournament is a complete directed graph, $G=(V,E)$ ; if $(i,j),(j,k)\in E$ , $(i,k)\in E$ . Consider the set of all transitive tournaments on $n$ nodes. Since a tournament is a labeled, directed complete graph, it can be encoded by a string $E(G)$ of ${n \choose 2}$ bits where each bit indicates the direction of the edge between the pair of nodes in that position. Using this encoding, every transitive tournament contains a transitive subtournament on (at least) $v(n)$ vertices with

v(n)\leq 1+\lfloor 2\log n\rfloor .

This was shown as the first problem.^[6] It is easily solved by the incompressibility method,^[7] as are the coin-weighing problem, the number of covering families and expected properties; for example, at least a fraction of $1-1/n$ of all transitive tournaments on $n$ vertices have transitive subtournaments on not more than $1+2\lceil 2\log n\rceil$ vertices. $n$ is large enough.

If a number of events are independent (in probability theory) of one another, the probability that none of the events occur can be easily calculated. If the events are dependent, the problem becomes difficult. Lovasz local lemma^[8] is a principle that if events are mostly independent of one another and have an individually-small probability, there is a positive probability that none of them will occur.^[9] It was proven by the incompressibility method.^[10] Using the incompressibility method, several versions of expanders and superconcentrator graphs were shown to exist.^[11]

Topological combinatorics

In the Heilbronn triangle problem, throw $n$ points in the unit square and determine the maximum of the minimal area of a triangle formed by three of the points over all possible arrangements. This problem was solved for small arrangements, and much work was done on asymptotic expression as a function of $n$ . The original conjecture of Heilbronn was $O(1/n^{2})$ during the early 1950s. Paul Erdos proved that this bound is correct for $n$ , a prime number. The general problem remains unsolved, apart from the best-known lower bound $\Omega ((\log n)/n^{2})$ (achievable; hence, Heilbronn's conjecture is not correct for general $n$ ) and upper bound $\exp(c{\sqrt {\log n}})/n^{8/7}$ (proven by Komlos, Pintsz and Szemeredi in 1982 and 1981, respectively). Using the incompressibility method, the average case was studied. It was proven that if the area is too small (or large) it can be compressed below the Kolmogorov complexity of a uniformly-random arrangement (high Kolmogorov complexity). This proves that for the overwhelming majority of the arrangements (and the expectation), the area of the smallest triangle formed by three of $n$ points thrown uniformly at random in the unit square is $\Theta (1/n^{3})$ . In this case, the incompressibility method proves the lower and upper bounds of the property involved.^[12]

Probability

The law of the iterated logarithm, the law of large numbers and the recurrence property were shown to hold using the incompressibility method^[13] and Kolmogorov's zero-one law,^[14] with normal numbers expressed as binary strings (in the sense of E. Borel) and the distribution of 0s and 1s in binary strings of high Kolmogorov complexity.^[15]

Turing machines time complexity

The basic Turing machine as conceived by Alan Turing in 1936 consists of a memory that is a tape of cells in which a symbol can be written, potentially infinite, and a finite control with a read-write head attached which scans a cell on the tape. At each step the read-write head can change the symbol in the cell under scan, move one cell left, right, or not at all, according to instruction from the finite control. For convenience consider Turing machines with two tape symbols (but this is not essential).

In 1968 F. C. Hennie showed that such a Turing machine requires order $n^{2}$ to recognize the language of binary palindromes in the worst-case. In 1977 W. J. Paul^[2] gave an incompreessibility proof which showed that order $n^{2}$ time is required in the average-case. Namely, for every integer $n$ consider all words of that length. For convenience we consider only words with the middle third of the word consisting of 0's. Moreover, the accepting Turing machine ends with an accept state on the left (the beginning of the tape). A computation of a Turing Machine on a given word gives for each location (boundary between adjacent cells) a sequence of crossings from left-to-right and from right-to-left, each crossing in a particular state of the finite control. Consider positions in the middle third of a candidate word. Either they all have a crossing sequence of length $O(n)$ in which case the total computation time is $O(n^{2})$ , or some position has a crossing sequence of $o(n)$ . In the last case the word, if it is a palindrome, can be identified by that crossing sequence. Namely if other palindromes (ending in an accepting state on the left) have the same crossing sequence then the word considering of a prefix (up to the position of the involved crossing sequence) of the original palindome concatenated with a suffix (of the remaining length) of the other palindrome would be accepted as well. Taking the palindrome of $\Omega (n)$ Kolmogorov complexity we have just described it by $o(n)$ bits: contradiction. Since the overwhelming majority of binary palindromes have this high Kolmogorov complexity this gives a lower bound on the average-case running time. The result in^[3] is much more difficult and shows that Turing machines with $k+1$ work tapes are more powerful than those with $k$ work tapes in real-time (here: one symbol per step).

In 1984 W. Maass^[16] and M. Li and P. M. B. Vitanyi ^[17] showed that the simulation of two work tapes by one work tape (of a Turing machine) takes $\Theta (n^{2})$ time deterministically (this is optimal and solved a 30-year open problem) and $\Omega (n^{2}/(\log n\log \log n))$ time nondeterministically ^[17] (in ^[16] this is $\Omega (n^{2}/(\log ^{2}n\log \log n))$ . In ^[17] there are more results concerning tapes, stacks, and queues both deterministically and nondeterministically.

Many more results in this area were proven using the incompressibiity method.^[4]

Theory of computation

Heapsort is a sorting method invented by J. W. J. Williams and improved by R. W. Floyd. This method always runs in $O(n\log n)$ time. The question to decide is whether Floyd's method is better than Williams method on average (it is better in the worst-case). Using the incompressibility method it was shown^[4] that Williams method runs on average in $2n\log n+O(n)$ time and Floyd's method runs on average in $n\log n+O(n)$ time. The proof was suggested by Ian Munro.

Shellsort, discovered by Donald Shell in 1959, is a comparison sort which splits the list to be sorted in scattered sublists and sorts these separately. Subsequently the sorted sublists are merged to reconstitute a partially sorted list. This process repeats a number of times: the number of passes. The difficulty of analyzing the complexity of the sorting process is that it depends on the number $n$ of keys to be sorted, on the number $p$ of passes, but also on the increments governing the scattering in each pass. That is, the sublist is the list of keys that are the increment parameter apart. Although this sorting method gave rise to a large number of papers only the worst-case was established. For the average-case running time only the best case for 2-pass Shellsort was established,^[18] and an upper bound on the best case for 3-pass Shellsort.^[19] A general lower bound on the average-case of $p$ -pass Shelllsort was established in ^[20] making a first advance on this problem in four decades, The idea is as follows. In every pass the comparison sort moves a key to another place a certain distance: a path length. Code all these path lengths logarithmically in their length in the correct order (of passes and keys). This allows to reconstruct the unsorted list from the sorted list. Now let the unsorted list be incompressible (or almost so). Since the sorted list has almost zero Kolmogorov complexity, and the path lengths together give a certain code length, the sum must be at least as large as the Kolmogorov complexity of the original list. The sum of the path lengths correspond to the running time. It turns out that the running time is lower bounded by this argument by $\Omega (pn^{1+1/p})$ .

Assume $n,r,s$ are natural numbers and $2\log n\leq r,s\leq n/4$ . It was shown that for every $n$ there is a boolean $n\times n$ matrix such that every $s\times (n-r)$ submatrix has rank at least $n/2$ by the incompressibility method.

Logic

In Gödel's incompleteness theorems the first one states that in every formal system with computably enumerable theorems/proofs that is strong enough to contain Peano Arithmetic there are true but unprovable statements (theorems). This is proved as follows by the incompressibility method. Every formal system $F$ as above can be described finitely say in $f$ bits. In such a formal system we can express $K(x)\geq |x|$ since it contains arithmetic. Given $F$ we can search exhaustively for a proof that some string $y$ of length $n\gg f$ satisfies $K(y)\geq n$ . In this way we obtain the first such string effectively. Therefore, $K(y)\leq \log n+f$ : contradiction.^[21] (We have ignored some lower order logarithmic terms which do not matter anyway.)

Comparison with other methods

While the probabilistic method generally shows the existence of an object with a certain property in a class, the incompressibility method tends to show that the overwhelming majority of objects in the class (and hence the average or the expectation) has that property. It is sometimes easy to turn a probabilistic proof in an incompressibility proof or vice versa. In some cases it seems hard or impossible to turn a proof by incompresibility in a probabilistic or counting proof. In virtually all cases of Turing machine time complexity cited above, the incompressibility method solved problems which were open for decades. No other proofs are known. Sometimes a proof by incompressibility can be turned into a proof by counting, as happened in the case of the general lower bound on the running time of Shellsort.^[20] Since this problem was open for almost half a century, and well-known, this shows that thinking about coding as in the incompressibility method can be easier than thinking about probability or counting.

References

[1] A. N. Kolmogorov, "Three approaches to the definition of the concept 'quantity of information', Probl. Peredachi Inf., 1:1 (1965), 3–11

[Pa-2] W. J. Paul, "Kolmogorov's complexity and lower bounds", pp 325–333 in: L. Budach Ed., Proc. 2nd Int. Conf. Fund. Comput. Theory, 1979.

[PSS-3] W. J. Paul, J. I. Seiferas, J. Simon, "An information-theoretic approach to time bounds for on-line computation" (preliminary version), Proc. 12th ACM Symp. Theory Comput (STOC), 357–367, 1980.

[LV-4] M. Li, P. M. B. Vitanyi, An Introduction to Kolmogorov Ccomplexity and Its Applications, Springer, 1993, 1997, 2008, Chapter 6.

[5] H. M. Buhrman, M. Li, J. T. Tromp, P. M. B. Vitanyi, "Kolmogorov random graphs and the incompressibility method", SIAM J. Comput., 29:2(1999), 590–599.

[6] P. Erdos, J. Spencer, Probabilistic methods in combinatorics, Academic Press, 1974.

[7] M. Li, P. M. B. Vitanyi, "Kolmogorov complexity arguments in combinatorics", J. Combinatorial Theory, Series A, 66:2(1994), 226–236.

[8] P. Erdos, L. Lovasz, "Problems and results on 3-chromatic hypergraphs and some related questions", in A. Hajnal, R. Rado, and V. T. Sós, eds. Infinite and Finite Sets (to Paul Erdős on his 60th birthday). North-Holland. pp. 609–627.

[9] R. A. Moser, G. Tardos, "A constructive proof of the general lovász local lemma", Journal of the ACM (JACM), 2:57(2010), 11.

[10] L. Fortnow, "A Kolmogorov Complexity Proof of the Lovász Local Lemma", Computational Complexity Weblog, 2 June 2009.

[11] U. Schoning, "Construction of expanders and superconcentrators using Kolmogorov complexity", Random Structures & Algorithms, 17:1(2000), 64–77.

[12] T. Jiang, M. Li, P. M. B. Vitanyi, "The average‐case area of Heilbronn‐type triangles", Random Structures & Algorithms, 20:2(2002), 206–219.

[13] V. G. Vovk, "The law of the iterated logarithm for random Kolmogorov, or chaotic, sequences", Theory Probab. Appl. 3:32(1988), 413–426.

[14] M. Zimand, "A high-low Kolmogorov complexity law equivalent to the 0–1 law", Inform. Process. Letters, 57:2(1996), 59–84.

[15] M. Li, P. M. B. Vitanyi, "Statistical properties of finite sequences with high Kolmogorov complexity", Mathematical Systems Theory, 27(1994), 365–376.

[Ma-16] W. Maass, "Combinatorial lower bound arguments for deterministic and nondeterministic Turing machines", Trans. Amer. Math. Soc. 292 (1985), 675–693.

[LVa-17] M. Li, P. M. B. Vitanyi, "Tape versus queue and stacks: The lower bounds", Information and Computation, 78:1(1988), 56–85.

[18] D. E. Knuth, Sorting and Searching (Vol. 3 The Art of Computer Programming), 2nd Ed. Addison-Wesley, 1998, pp 83–95.

[19] S. Janson, D. E. Knuth, "Shellsort with three increments", Random Structures Algorithms 10:1–2(1997), 125–142.

[JLV-20] T. Jiang, M. Li, P. M. B. Vitanyi, "A lower bound on the average-case complexity of Shellsort", Journal of the ACM (JACM), 47:5(2000) 905–911.

[21] G. J. Chaitin, Algorithmic Information Theory, Cambridge University Press, 1977.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]