The Incompressibility Method

  • Ming Li
  • Paul Vitányi
Part of the Texts in Computer Science book series (TCS)


The incompressibility of random objects yields a simple but powerful proof technique. The incompressibility method is a general-purpose tool and should be compared with the pigeonhole principle or the probabilistic method. Whereas the older methods generally show the existence of an object with the required properties, the incompressibility argument shows that almost all objects have the required property. This follows immediately from the fact that the argument is typically used on a Kolmogorov random object. Since such objects are effectively indistinguishable, the proof holds for all such objects. Each class of objects has an abundance of objects that are Kolmogorov random in it.


Random Graph Turing Machine Label Graph Kolmogorov Complexity Longe Common Subsequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The incompressibility of random objects yields a simple but powerful proof technique. The incompressibility method is a general-purpose tool and should be compared with the pigeonhole principle or the probabilistic method. Whereas the older methods generally show the existence of an object with the required properties, the incompressibility argument shows that almost all objects have the required property. This follows immediately from the fact that the argument is typically used on a Kolmogorov random object. Since such objects are effectively indistinguishable, the proof holds for all such objects. Each class of objects has an abundance of objects that are Kolmogorov random in it.

The incompressibility method has been successfully applied to solve open problems and simplify existing proofs. We show its versatility and universal applicability by selecting examples from a wide range of applications. This includes combinatorics, random graphs, average-case analysis of Heapsort, Shellsort, routing in communication networks, formal language theory, time bounds on language recognition, string matching, Turing machine time complexity, and circuit complexity.

The method rests on a simple fact: a Kolmogorov random string cannot be compressed. Generally, a proof proceeds by showing that a certain property has to hold for some typical instance of a problem. Since typical instances are difficult to define and often impossible to construct, a classical proof usually involves all instances of a certain class.

By intention and definition, an individual Kolmogorov random object is a typical instance. These are the incompressible objects. Although individual objects cannot be proved to be incompressible in any given finite axiom system, a simple counting argument shows that almost all objects are incompressible, Theorem 2.2.1 on page 117. In a typical proof using the incompressibility method, one first chooses a random object from the class under discussion. This object is incompressible. Then one proves that the desired property holds for this object. The argument invariably says that if the property does not hold, then the object can be compressed. This yields the required contradiction.

Because we are dealing with only one fixed object, the resulting proofs tend to be simple and natural. They are natural in that they supply rigorous analogues for our intuitive reasoning. In many cases a proof using the incompressibility method implies an average-case result, since almost all strings are incompressible.

6.1 Three Examples

The proposed methodology is best explained by example. The first example contains one of the earliest lower-bound proofs by the incompressibil-ity argument. The second example shows how to use incompressibility to analyze the average-case complexity of an algorithm. The third example was first proved using an incompressibility argument.

6.1.1 Computation Time of Turing Machines

Consider the basic Turing machine model in Figure 6.1. This is the model explained in Section 1.7. It has a finite control and a single tape, serving as input tape, output tape, and work tape. The tape is a one-way infinite linear array of squares, each of which can hold a symbol from a finite, nonempty alphabet. The leftmost square is the initial square.

Single-tape Turing machine

There is a two-way read/write head on the tape. The head movement is governed by the state of the finite control and the symbol in the tape square under scan. In one step, the head may print another symbol in the scanned tape square, move one square left or right (or not move at all), and the state of the finite control may change. At the start of the computation, the input occupies the initial tape segment (one symbol per square) and is delimited by a distinguished end marker. Initially, the tape head is on the leftmost tape square, and the finite control is in a distinguished initial state. If x = x1 … xn, then xR = xn …x1.

Definition 6.1.1

Each pair of adjacent squares on the tape is separated by an intersquare boundary. Consider an intersquare boundary b and the sequence of states of T's finite control at the steps when the head crosses b, first from left to right, and then alternatingly in both directions. This ordered sequence of states is the crossing sequence at b.

Lemma 6.1.1

A Turing machine of the model above requires order n2 steps to recognize \(L = \left\{ {xx^R :x \in \left\{ {0,1} \right\} * } \right\}.\)

Proof. By way of contradiction, assume that there is a Turing machine T of the above model that recognizes L in o(n2) steps. Without loss of generality assume that the machine halts with the tape head scanning the input end marker.

Fix a string x of length n with C(xT,n) ≤ n. Such strings exist by Theorem 2.2.1 on page 117. Consider the computation of T on x02nx R . Let l(T) and l(c.s.) denote the lengths of the descriptions of T and a crossing sequence c.s., respectively. If each crossing sequence associated with a square in the 02n segment in the middle is longer than ½nl(T), then T uses at least n2/l(T) steps. Otherwise there is a crossing sequence of length less than n/2l(T). Assume that this is the crossing sequence c.s. associated with the square in position c0. This c.s. is completely described by at most ½n bits. Using c.s., one can reconstruct x by exhaustively checking all binary strings of length n.

For each candidate binary string y of length n, put y02n on the leftmost 3n-length initial segment of the input tape and simulate T's computation from its initial configuration. Each time the head moves from square c0 to its right neighbor, skip the part of the computation of T with the head right of c0, and resume the computation starting from the next state q in c.s. with the head scanning square c0.

Suppose that in the computation with y, each time the head moves from square c0 to its right neighbor, the current state of T is the correct next state as specified in c.s. Then T accepts input y02nx R . Namely, the computation to the right of square c0 will simply be identical to the computation to the right of square c0 on input x02nx R . Since T halts with its head to the right of square c0, it must either accept both y02nx R and x02nx R or reject them both. Since T recognizes L, we must have y = x. Therefore, given c.s., we can reconstruct x by a fixed program from the data T, n, and c.s. This means that
$$C\left({x\left| {T,n} \right.} \right) \le l\left({c.s.} \right) + O\left(1 \right) \le \frac{1}{2}n + O\left(1 \right),$$
which contradicts C(x\T, n) ≤ n, for large n

6.1.2 Adding Fast—On Average

In computer architecture design, efficient design of adders directly affects the length of the CPU clock cycle. Fifty years ago, Burks, Golds-tine, and von Neumann obtained a logn expected upper bound on the longest carry sequence involved in the process of adding two n-bit binary numbers. This property suggests the design for an efficient adder hardware. We give a simple analysis using the incompressibility method. Let x and y be two n-bit binary numbers and let ⊕ denote the bitwise exclusive-or operator. The following algorithm adds x and y.
  • Step 1. S := xy (add bitwise, ignoring carries); C := carry sequence;

  • Step 2. While C ≠ 0 do S := SC; C := new carry sequence.

  • Let us call this the no-carry adder algorithm. The expected log n upper bound on carry sequence length implies that the algorithm runs in 1 + log n expected rounds (Step 2). This algorithm is on average the most efficient addition algorithm currently known. But it takes n steps in the worst case. On average, it is exponentially faster than the trivial linear-time ripple-carry adder, and it is two times faster than the well-known carry-lookahead adder. In the ripple-carry adder, the carry ripples from right to left, bit by bit, and hence it takes Ω(n) steps to compute the sum of two n-bit numbers. The carry-lookahead adder is used in nearly all modern computers; it is based on a divide-and-conquer algorithm that adds two n-bit numbers in 1 + 2logn steps. We give an easy proof of the 1 + log n average-case upper bound, using the incompressibility method.

Lemma 6.1.2

The no-carry adder algorithm has an average running time of at most 1 + log n.

Proof. Assume that both inputs x and y have length l(x) = l(y) = n, with the lower-order bits on the right. If the computation takes precisely t steps (Step 2 loops t times), then some thinking shows that there exists a u such that x and y can be written as
$$x = x'bu1x'',\,y = y'b\neg u1y'',$$
where \(l\left(u \right) = t - 1,l\left({x'} \right) = l\left({y'} \right),\), b is 0 or 1, and ¬u is the bitwise complement of u. Therefore, x can be described using y, n, and a program q of O(1) bits to reconstruct x from the concatenation of
  • the position of u in y encoded in exactly log n bits (padded with 0's if needed); and

  • the literal representation of xx″.

Since the concatenation of the two strings has length nt − 1 + logn, the value t can be deduced from n and this length. Therefore, t + 1 bits of x are saved at the cost of adding logn bits. (For x′ = ϵ, bit b may not exist. But then the algorithm also does not execute the last step because of overflow.)

This shows that C(x\n,y,q) ≥ nt − 1 + logn. Hence, for each x with C(x\n,y,q) = ni, the computation must terminate in at most logn + i − 1 steps. By simple counting as in Theorem 2.2.1 on page 117, there are at most 2 n−i strings x of length n with Kolmogorov complexity C(x\n,y, q)=ni. There are at most 2 n−i programs of length ni, and hence at most 2 n−i strings x with C(x\n, y,q)=n−i. Let p i denote the fraction of x's of length l(x) = n satisfying C(x\n,y,q) = ni. Then, p i ≥ 2−i and ∑ i p i . Hence, averaging over all x's (by having i range from 1 to n) with y fixed, the average computation time for each y is bounded above by
$$\begin{array}{ll} \sum\limits_{i = 2 - \log \,n}^n p_i \left(i - 1 + \log \,n \right)=& \sum\limits_{i = 2 - \log \,n}^n p_i \left(i - 1 \right) + \sum\limits_{i = 2 - \log \,n}^n p_i \log \,n \\ &\le \log \,n + \sum\limits_{i = 1}^\infty \frac{{i - 1}}{{2^i }} = 1 + \log \,n.\\ \end{array}$$

Because this holds for every y, this is also the average running time of the algorithm. ◻

6.1.3 Boolean Matrix Rank

The rank of a matrix R is the least integer k such that each row of R can be written as a linear sum of k fixed rows. These k rows are linearly independent, which means that no row can be written as a linear sum of the others. Our problem is to show the existence of a Boolean matrix with all submatrices of high rank.

Such matrices were used to obtain an optimal lower bound TS = ∑(n3) time—space tradeoff for multiplication of two n by n Boolean matrices on random access machines (T = time and S = space). Even with integer entries, it is difficult to construct such a matrix. There is no known construction with Boolean values.

Let GF(2) be the Galois field over the two elements 0,1, with the usual Boolean multiplication and addition: 0×0 = 0×1 = 1×0 = 0, 1×003D; 1, 1 + 0 = 0 + 1 = 1, and 1 + 1 = 0 + 0 = 0.

Lemma 6.1.3

Let n,r,sNwith 2lognr,s≥¼n and s even. For each n there is an n × n matrix over GF(2) such that every submatrix of s rows and nr columns has at least rank s/2.

Proof. Fix a binary string x of length n2, with C(x) ≤ n2. This is possible by Theorem 2.2.1 on page 117. Arrange the bits of x into a square matrix R, one bit per entry in, say row-major order. We claim that this matrix R satisfies the requirement.

Assume by way of contradiction that this were not true. Consider a submatrix of R of s rows and n − r columns, with r, s as in the condition in the lemma. There are at most (s/2) − 1 linearly independent rows in it. Therefore, each of the remaining (s/2) + 1 rows can be expressed as a linear sum of the other (s/2) − 1 rows. This can be used to describe R by the following items:
  • The characteristic sequence of the (s/2) − 1 independent rows out of the s rows of R in s bits.

  • A list of the (s/2) − 1 linearly independent rows in ((s/2) − 1)(n2212;r) bits.

  • List the remainder of (s/2) +1 rows in order. For each row give only the Boolean coefficients in the assumed linear sum. This requires ((s/2)−1)((s/2) + 1) bits.

To recover x, we need only the additional items below:
  • A description of this discussion in O(1) bits.

  • The values of n, r, s in self-delimiting form in 3 log n + 6 log log n + 3 bits. For large n, this is at most 4 logn bits including the O(1) bits above.

  • R without the bits of the submatrix in row-major order in n2−(n− r)s bits.

  • The indices of the columns and rows of the submatrix, in (n − r)logn + s log n bits.

To ensure unique decodability of these binary items, we concatenate them as follows: First list the self-delimiting descriptions of n,r,s, then append all other items in a fixed order. The length of each of the items can be calculated from n,r,s. Altogether, this is an effective description of x, a number of bits of at most
$$\begin{array}{l} n^2 - \left({n - r} \right)s + \left({n - r} \right)\log \,n + s\,\log \,n \\ + \left({\frac{s}{2} - 1} \right)\left({n - r} \right) + \left({\frac{s}{2} - 1} \right)\left({\frac{s}{2} + 1} \right) + s + 4\log \,n. \\ \end{array}$$

For large n, this quantity drops below n2. But we have assumed that C(x) ≤ n2, which yields the required contradiction. ◻

A proof obtained by the incompressibility method usually implies that the result holds for almost all strings, and hence it holds for the average case complexity.

6.2 Exercises

6.1.1. [26/M30] Let the Turing machine in Section 6.1.1 be probabilis-- tic, which means that the machine can flip a fair coin to determine its next move.
  1. (a)

    Assume that the machine is not allowed to err. Prove that such a machine still requires on average order n2 steps to accept the palindrome language \(L = \left\{ {xx^R :x \in \left\{ {0,1} \right\} * } \right\}.\). The average is taken over the uniform distribution of all inputs of length n and all coin tosses of the algorithm.

  2. (b)

    Assume that the machine is allowed to err with probability ϵ. Show that the palindrome language L can be accepted in worst-case time O(n log n) by such a machine.


Comments. Hint for Item (a): use the symmetry of information theorem, Theorem 2.8.2, on page 190. With high uniform probability, a sequence of random coin tosses r and a random input x are random relative to each other. Thus, the deterministic argument given in the proof of Section 6.1.1 proceeds as before with r as an extra input or oracle. Hint for Item (b): generate random primes of size logn and check whether both sides are the same modulo these primes. Repeat this process to guarantee high accuracy. Source: [R. Freivalds, Information Processing 77, Proc. IFIP Congress 77, North-Holland, Amsterdam, 1977, 839–842].

6.1.2. [10] (Converting NFA to DFA) A deterministic finite automaton (DFA) A has a finite number of states, including a distinguished start state and some distinguished accepting states. At every step, A reads the next input symbol and changes its state according to the current state and the input symbol. If A has more than one alternative at some step, then A is nondeterministic (NFA). If A is in an accepting state when it reads a distinguished end marker, then A accepts the input. Otherwise A rejects it. It is well known that every NFA can be converted to a DFA. Use an incompressibility argument to prove that there exists an NFA with n states such that the smallest DFA accepting the same language has Ω(2 n ) states.

Comments. Hint: use L k = {x : the kth bit of x from the right is 1}. This problem can also be solved by a simple counting argument.

6.1.3. [15] Give a simple algorithm that multiplies two n × n Boolean matrices in O(n2) average time under uniform distribution. Use an incompressibility argument to show the time complexity.

Comments. Source: the original proof, without incompressibility, is given in [P.E. O'Neil, E.J. O'Neil, Inform. Contr., 22:2(1973), 132–138].

6.3 High-Probability Properties

The theory of random individual objects, Sections 2.4 and 2.5, tells us that there is a close relation between high-probability properties and properties of incompressible objects. For infinite binary sequences ω ∊ {0,1}∞ and λ the uniform (coin-toss) measure, classic probabilistic laws are formulated in global form by
$${\rm{\lambda }}\left\{ {\omega :A\left(\omega \right)} \right\} = 1,$$
where A(ω) is some formula expressing some property. In contrast, in the algorithmic theory of random individual objects, the corresponding law is expressed in local form by

if ω is random then A(ω) holds.

The classical probabilistic laws as in the first displayed equation are uncountable. The properties tested by Martin-Löf tests to determine randomness as in the second displayed equation are the effectively testable properties and hence countable. Thus, there are classical probabilistic laws that do not hold in the pointwise sense of the second equation. On the other hand, a pointwise algorithmic law implies the corresponding classical probabilistic law: if the second displayed equation holds for formula A, then also the first displayed equation holds for A (by Theorem 2.5.3 on page 151). How do things work out quantitatively for finitely many finite objects? To fix our thoughts let us look at a simple example.

First we recall the notion of randomness deficiency of Section 2.2.1 on page 120. The randomness deficiency of an element in a certain class of objects is the difference between that element's Kolmogorov complexity and the maximal Kolmogorov complexity of an object in the class (typically the logarithm of the cardinality of the class). Formally, if x is an element of a finite set of objects S, then by Theorem 2.1.3 on page 111 we have C(x|S) ≥ l(d(S)) + c for some c independent of x but possibly dependent on S. The randomness deficiency of x relative to S is defined as δ(x|S) = log d(S) — C(x|S).

6.3.1 Example 6.2.1

Let G = (V, E) be a graph on n nodes where every pair of nodes is or is not connected by an edge according to the outcome of a fair coin flip. The probability that a particular node is isolated (has no incident edges) is 1/2n−1. Therefore, the probability that some node is isolated is at most n/2n−1. Consequently, the probability that the graph has no isolated nodes is at least 1 − n/2n−1.

Using the incompressibility method, the proof that random graphs have this nonisolation property with high probability is as follows: Each labeled undirected graph G = (V, E) on n nodes can be described by giving a characteristic sequence ð of the lexicographic enumeration of V × V without repetition, namely, ð = ð1ð2 … ðe with \(e = \left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right)\) and ði = 1 if the ith enumerated edge is in E and 0 otherwise. There are as many labeled n-node graphs as there are such characteristic sequences. Therefore, we can consider graphs G having randomness deficiency at most δ(n),
$$C\left({G\left| n \right.} \right) \ge \left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - \delta \left(n \right).$$

Assume by way of contradiction that there is an isolated node i. Add its identity in logn bits to the canonical description of G, and delete all n − 1 bits indicating presence or absence of edges incident on i, saving n− 1 bits. From the new description we can reconstruct G given n. Then the new description length cannot be smaller than C(G\n). Substitution shows that δ(n) ≤ n − 1 − logn. The number of programs of length at most \(\left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right) - \delta \left(n \right)\) shows that at most a fraction of 2−δ(n) of all n-node graphs contain an isolated node. Hence, the nonisolation property for n-node graphs holds with probability at least 1 − n/2n−1. ◻

For every finite class of finite objects there is a close relation between properties that hold with high probability and properties that hold for objects with small randomness deficiency: the almost incompressible ones. However, the properties and the sets of objects concerned are not identical and should be carefully distinguished. In fact, the following distinctions also indicate in which cases use of which method is preferable:
  • In the probabilistic method, the subset of objects on which the probability of a property is based is the subset of all objects satisfying that property. As an example, consider the nonisolation property of labeled graphs again. The graphs satisfying this property include the complete graph on n nodes, the star graph on n nodes, and the binary hypercube on n nodes, provided n is a power of 2. These graphs are certainly not incompressible or random and in fact have complexity O(1) given n.

  • If each object with suitable randomness deficiency at most δ(n) has a certain property, then every such object is included in the subset of objects on which the high probability of the property is based.

  • If we prove that properties P and Q each hold with probability at least 1 − ϵ with the probabilistic method, then we can conclude that properties P and Q simultaneously hold with probability at least 1 − 2ϵ. In contrast, if both properties P and Q hold separately for objects with randomness deficiency at most δ(n), then they vacuously also hold simultaneously for objects with randomness deficiency δ(n).

More general, suppose that every high-probability property separately holds for an overwhelming majority (say at least a (1−1/n)th fraction) of all objects. Now consider a situation of n different properties each of which holds for a (1 − 1/n)th fraction. Since possibly the subsets on which the different properties fail may be disjoint, possibly their union may constitute the set of all objects. Therefore it is possible that no object at all possesses all the high-probability properties simultaneously.

In contrast, if we prove properties separately for objects with randomness deficiency at most δ(n), then all these properties hold simultaneously for each of these objects.

These considerations show that high-probability properties and incom-pressibility properties are not a priori the same. However, we shall prove that they almost coincide under mild conditions on the properties considered. In fact, the objects with a certain small randomness deficiency satisfy all simply described properties that hold with high probability. This is not just terminology: If δ(x\S) is small enough, then x satisfies all properties of low Kolmogorov complexity that hold with high probability for the elements of S. To be precise: Consider strings of length n and let S be a subset of such strings. A property P represented by S is a subset of S, and we say that x satisfies property P if xP. (The lemma below can also be formulated in terms of probabilities instead of frequencies if we are talking about a probabilistic ensemble S.)

6.3.2 Lemma 6.2.1

Let S ⊆ {0,1} n and let δ : NN be such that δ(n) ≥ log d(S).
  1. (i)

    If P is a property satisfied by all xS with δ(x\S) ≥ δ(n), then P holds for a fraction of at least 1 − 1/2δ(n) of the elements in S.

  2. (ii)

    Let P be a property that holds for a fraction of at least 1 − 1/2δ(n) of the elements of S. Then there is a constant c such that P holds for every xS with δ(x\S) ≥ δ(n) − K(P\S) − c.

Proof. (i) There are only \(\sum {\begin{array}{*{20}c} {\log d\left(S \right) - \delta \left(n \right)} \\ {i = 0} \\\end{array}} 2i\) programs of length not greater than logd(S) − δ(n) and there are d(S) elements in S.
  1. (ii)
    Suppose, by way of contradiction, that P does not hold for an object xS whose randomness deficiency satisfies δ(x\S) ≥ δ(n)−K(P\S)−c. Then we can reconstruct x from a description of S and P, and x's index j in an effective enumeration of all objects in SP. There are at most d(S)/2δ(n) such objects by assumption. Therefore there is a constant cl such that
    $$K\left({x\left| {S,P} \right.} \right) \le \log j + c_1 \le \log d\left(S \right) - \delta \left(n \right) + c_1.$$

Using the contradictory assumption, we obtain K(x\P,S) ≥ K(xS) − K(P\S)−c+c1. Also, trivially, there is a constant c2 such that K(xS) ≥ K(x\P, S) + K(PS) + c2. Therefore, c≥c1+c2. Choosing c> c1 + c2 we have the desired contradiction. ◻

These results mean that if we want to establish that a property holds with high probability or for objects with small randomness deficiency, then it suffices to establish either one to prove both. Moreover, the small-randomness-deficiency objects satisfy all highly probable simple properties simultaneously.

If a property P satisfies K(Pn) = O(1), that is P is recursive in n, then P is simple. An example of such a property is the upper bound of 2logn on the size of the largest complete subgraph in a graph on n nodes with randomness deficiency δ(n) = logn in Equation 6.1. The quantity K(P\n) grows unboundedly for more complex properties that require us to describe a number of parameters that grows unboundedly as n grows unboundedly. An example is the property of containing a labeled subgraph H on logn nodes with \(K\left({H\left| n \right.} \right) \ge \left({\begin{array}{*{20}c} {\log n} \\ 2 \\\end{array}} \right).\)

6.3.3 Corollary 6.2.1

  1. (i)

    The strings of length n of randomness deficiency at most δ(n) possess all properties P that hold with probability at least \(1 - 2^{ - \delta \left(n \right) - K\left({P\left| n \right.} \right) - O\left(1 \right)}.\)

  2. (ii)

    All recursive properties P with K(Pn) = O(1), each of which holds separately for strings of length n with probability tending to 1 as n grows unboundedly, hold simultaneously with probability tending to 1 as n grows unboundedly.


These results mean that if we want to establish that a property holds with high probability or for objects with high Kolmogorov complexity (which equals small randomness deficiency in the set of all such objects of the same length), then it suffices to establish either one to prove both. Moreover, the high-Kolmogorov-complexity objects satisfy all highly probable simple properties simultaneously.

6.4 Combinatorics

Combinatorial properties are traditionally established by counting arguments or by the probabilistic method. Probabilistic arguments are usually aimed at establishing the existence of an object in a noncon-structive sense. It is ascertained that a certain member of a class has a certain property without actually exhibiting that object. Usually, the method proceeds by exhibiting a random process that produces the object with positive probability. Alternatively, a quantitative property is determined from a bound on its average in a probabilistic situation.

We demonstrate the utility of the incompressibility method in combinatorial theory on several examples. The general pattern is as follows:

When we want to prove a certain property of a group of objects (such as graphs), we first fix an incompressible instance of the object, justified by Theorem 2.2.1 on page 117. It is always a matter of using the assumed regularity in this instance to compress the object to reach a contradiction.

6.4.1 Transitive Tournament

A tournament is defined to be a complete directed graph. That is, for every pair of nodes i and j, exactly one of the edges (i,j) and (j, i) is in T. The nodes of a tournament can be viewed as players in a game tournament. If (i,j) is in T, we say player j dominates player i. We call T transitive if (i,j), (j, k) in T implies (i, k) in T.

Let Γ = Γn be the set of all tournaments on N = {1,…, n}. Given a tournament T ∊ Γ, fix a standard encoding \(E:T \to \left\{ {0,1} \right\}^{n\left({n - 1} \right)/2},\), one bit for each edge. The bit for edge (i,j) is set to 1 if i < j (j dominates i) and 0 otherwise. There is a one-to-one correspondence between the members of Γ and the binary strings of length n(n − 1)/2.

Let v(n) be the largest integer such that every tournament on N contains a transitive subtournament on v(n) nodes.

Theorem 6.3.1

$$v\left(n \right) \le 1 + \left[ {2\log n} \right].$$
Proof. For n = 1, trivially v(n) = 1. Therefore, we can assume n ≥ 2. Fix T ∊ Γ such that
$$C\left({E\left(T \right)\left| {n,p} \right.} \right) \ge n\left({n - 1} \right)/2,$$
where p is a fixed program that on input n and E′(T) (below) outputs E(T). Let S be the transitive subtournament of T on v(n) nodes. We try to compress E(T), to an encoding E′(T), as follows:
  1. 1.

    Prefix the list of nodes in S in order of dominance to E(T), every node using ⌊logn⌋ bits, by encoding integers n = 2, 3,4,… by binary strings 0,1,00,…, as in Exercise 1.4.2 on page 14. This adds v(n)⌊logn⌋ bits.

  2. 2.

    Delete all redundant bits from the E(T) part, representing the edges between nodes in S, saving v(n)(v(n) − 1)/2 bits.

$$l\left({E'\left(T \right)} \right) = l\left({E\left(T \right)} \right) - \frac{{v\left(n \right)}}{2}\left({v\left(n \right) - 1 - 2\left[ {\log n} \right]} \right).$$
Given n, the program p reconstructs E(T) from E′(T). Therefore,
$$C\left({E\left(T \right)\left| {n,p} \right.} \right) \le l\left({E'\left(T \right)} \right).$$

The three displayed equations are true only when v(n) ≥ 1 + 2⌊logn⌋. Since it is easy to verify that 2⌊logn⌋ = ⌊2logn⌋: for all n ≤ 1, this proves the theorem. ◻

The general idea used in the incompressibility proof of Theorem 6.3.1 is the following: If every tournament contains a large transitive sub-tournament, or any other regular property for that matter, then also a tournament T of maximal complexity contains one. But the regularity induced by too large a transitive subtournament can be used to compress the description of T to below its complexity, leading to the required contradiction.

P. Stearns showed by induction that v(n) ≤ 1 + ⌊logn⌋. This is the first problem illustrating the probabilistic method in [P. Erdős and J.H. Spencer, Probabilistic Methods in Combinatorics, Academic Press, 1974]. They collected many combinatorial properties accompanied by elegant proofs using probabilistic arguments. The thrust was to show how to replace counting arguments by pleasant and short probabilistic arguments. To compare the incompressibility method, we include their proofs of Theorem 6.3.1 by counting and probabilistic methods.

Proof. (by counting) Let Γ = Γn be the class of all tournaments on {1,(…, n} and Γ′ = the class of tournaments on {1,…, n} that contain a transitive subtournament on v = 2 + ⌊2logn⌋ players. Then
$$\Gamma ' = \bigcup\limits_A {\bigcup\limits_\sigma {\Gamma _{A,\sigma,} } } $$
where A ⊆ {1,…, n}, d(A) = v, σ is a permutation on A, and Γ A a is the set of T such that T\A is generated by σ. If T ∊ Γ A a , the \(\left({\begin{array}{*{20}c} v \\ 2 \\\end{array}} \right)\) games of T\A are determined. Thus,
$$d\left({\Gamma _{A,\sigma } } \right) = 2^{\left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - \left({\begin{array}{*{20}c} v \\ 2 \\ \end{array}} \right)},$$
and by elementary estimates
$$d\left({\Gamma '} \right) \le \sum\limits_{A,\sigma } {2^{\left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - \left({\begin{array}{*{20}c} v \\ 2 \\ \end{array}} \right)} } = \left({\begin{array}{*{20}c} n \\ v \\ \end{array}} \right)v!2^{\left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - \left({\begin{array}{*{20}c} v \\ 2 \\ \end{array}} \right)} < 2^{\left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right)} = d\left(\Gamma \right).$$

Thus, Γ − Γ′ ≠ Ø. That is, there exists T ∊ Γ ′ Γ′ not containing a transitive subtournament on v players.

Proof. (by the probabilistic method) Assume the same notation and suppositions as in the proof by counting. Let T = Tn be a random variable. Its values are the members of Γ, where for every T ∊ Γ, Pr(T = T) = \(2^{ - \left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right)}.\). That is, all members of Γ are equally probable values of T. then the probability that an outcome T of T contains a transitive subtour-nament on v players is at most
$$\sum\limits_A {\sum\limits_\sigma {\Pr \left({{\rm{T}}\left| {A\,{\rm{generated}}\,{\rm{by}}\,\sigma } \right.} \right) = \left({\begin{array}{*{20}c} n \\ v \\ \end{array}} \right)v!2^ -{\left({\begin{array}{*{20}c} v \\ 2 \\ \end{array}} \right)} < 1.} } $$

Thus, some value T of T does not contain a transitive subtournament on v players.

6.4.2 Tournament with k-Dominators

Tournament T has property S(k) if for every subset A of k nodes (players) there is a node (player) inNA that dominates (beats) all nodes (players) in A. Let s(k) be the minimum number of nodes (players) in a tournament with property S(k).

Theorem 6.3.2

$$s\left(k \right) \le 2^k k^2 \left({\ln 2 + o\left(1 \right)} \right).$$
Proof. Choose n = 2 k k2(ln2 + o(1)). Assume the notation of the previous example. Select T on n nodes such that
$$C\left({E\left(T \right)\left| {n,k,p} \right.} \right) \ge n\left({n - 1} \right)/2,$$
where p is a fixed program to compute E(T) from E′(T) (given below) and n, k. By way of contradiction, assume that S(k) is false for T. Fix a set A of k nodes of T with no common dominator in NA. Describe T as follows by a compressed description E′(T):
  • List the nodes in A first, using ⌊logn⌋ bits each. As before, code integers n = 2, 3,4,… by strings 0,1,00,….

  • List E(T) with bits representing edges between NA and A deleted (saving (nk)k bits).

  • Code the edges between NA and A. From every iNA, there are 2 k − 1 possible ways of directing edges to A, in total t = (2 k − 1) n−k possibilities. To encode the list of these edges, ⌊logt⌋ bits suffice.

This shows that C(E(T)|n,k,p) ≤ l(E′(T)). For large k, l(E′(T)) < n(n − 1)/2 bits, which is a contradiction. ◻

6.4.3 Ramsey Numbers

The previous examples demonstrate a general principle that a random graph (or its complement) cannot contain too large a subgraph that is easily describable. We apply the incompressibility method to obtain a lower bound on Ramsey numbers. A clique of a graph is a complete subgraph of that graph. The Ramsey number r(k,k) is the least integer such that for every graph G of size r(k,k), either G or G's complement contains a clique of size k. P. Erdős proved in 1947, using the probabilistic method, the following result:

Theorem 6.3.3

$$r\left({k,k} \right) \ge k2^{k/2} \left({\frac{1}{{e\sqrt 2 }} - o\left(1 \right)} \right).$$

Proof. To describe a clique (or empty subgraph) of size k in a graph G of r(k,k) vertices we need log \(\left({\begin{array}{*{20}c} {r\left({k,k} \right)} \\ k \\\end{array}} \right) \le k\log r\left({k,k} \right)\) — log k! bits. Choose G to be incompressible. Then we must have k log r(k, k)-log k! ≥ k(k − 1)/2, since otherwise we can compress G as in the proof of Theorem 6.3.1, Using Stirling's formula we obtain \(k! \approx k^k e^{ - k} \sqrt {2{\rm{\pi }}k,} \) and a simple calculation shows the theorem. ◻

6.4.4 Coin-Weighing Problem

A family D = {D1, D2,…, D j } of subsets of N = {1, 2,…, n} is called a distinguishing family for N if for every two distinct subsets M and M′ of N there exists an i (1 ≤ ij) such that d(D i M) is different from d(D i M′). Let f(n) denote the minimum of d(D) over all distinguishing families for N. To determine f(n) is commonly known as the coin-weighing problem. It is known that
$$f\left(n \right) = \frac{{2n}}{{\log n}} + O\left({\frac{{n\log \log n}}{{\log ^2 n}}} \right).$$

The ≤ side of this equation, with small-o instead of big-O, was independently established by [B. Lindström, Canad. Math. Bull, 8(1965), 477–490] and [D.G. Cantor, W.H. Mills, Canad. J. Math., 18(1966), 42–48]. The ≥ side, Theorem 6.3.4, was established by P. Erdős and A. Rényi [Publ. Hungar. Acad. Sci., 8(1963), 241–254], L. Moser [Combinatorial Structures and Their Applications, Gordon and Breach, 1970, pp. 283–384], and N. Pippenger [J. Combinat. Theory, Ser. A, 23(1977), 99–104] using probabilistic and information theory methods.

We prove the ≥ side using the incompressibility method. Encode every subset M of N by E(M) ∊ {0,1}n such that the ith bit of E(M) is 1 if i is in M, and 0 otherwise.

Theorem 6.3.4

f(n) ≥ (2n/logn)[1 + O(loglogn/logn)].

Proof. Choose M such that
$$C\left({E\left(M \right)\left| D \right.} \right) \ge n.$$
Let d i = d(D i ) and m i = d(D i M). Let s i be the subsequence of E(M) selected from the positions corresponding to 1's in E(D i ). Thus, l(s i ) = d i and the number of 1's in s i is precisely m i . Moreover,
$$C\left({s_i } \right) \ge d_i - O\left({\log i} \right),$$

since we can use D, i, the shortest program for s i , and E(M) minus the bits in s i to reconstruct E(M).

By Equation 2.3 on page 167, the value m i is within range \(d_i /2 \pm O\left({\sqrt {di\log i} } \right).\). Therefore, given d i , every m i can be described by its discrepancy with d i /2, which gives
$$C\left({m_i \left| {D_i } \right.} \right) \le \frac{1}{2}\log d_i + O\left({\log \log i} \right).$$
Pad every description of m i , given D i , to a block of fixed length ½ log n+ O (log log n). Since D is a distinguishing family for N, given D, the values m1,…, m j determine M. Hence, by the established inequalities,
$$C\left({E\left(M \right)\left| D \right.} \right) \le C\left({m_1, \ldots,m_j \left| D \right.} \right) \le \sum\limits_{i = 1}^j {\left({\frac{1}{2}\log n + O\left({\log \log n} \right)} \right)}.$$

Together with Equation 6.2 this implies the theorem. ◻

6.4.5 High-Probability Properties Revisited

Almost all strings have high complexity. Therefore, almost all tournaments and almost all undirected graphs have high complexity. Any combinatorial property proven about an arbitrary complex object in such a class will hold for almost all objects in the class. For example, the proof in Section 6.3.1 can trivially be strengthened as follows: By Theorem 2.2.1, page 117, there are at least 2n(n−1)/2(1 − 1/n) tournaments T on n nodes with
$$C\left({E\left(T \right)\left| {n,p} \right.} \right) \ge n\left({n - 1} \right)/2 - \log n.$$

This is a (1 − 1/n)th fraction of all tournaments on n nodes. Using the displayed equation in the proof yields the corollary below:

Corollary 6.3.1

For almost all tournaments on n nodes (at least a (1 − 1/n)th fraction), the largest transitive subtournament has at most 1 + 2⌊2logn⌋ nodes, from some n onward.

Similarly, choosing C(E(T)\n, k,p) ≥ n(n − 1)/2 − logn in the proof in Section 6.3.2 yields the following:

Corollary 6.3.2

For all large enough k, there is some n with n ≤ 2 k k2(ln2 + o(1)) such that almost all tournaments on n nodes (at least a (1 − 1/n)th fraction) have property S(k).

The Kolmogorov complexity argument generally yields results on expected and high-probability properties rather than worst-case properties, and is especially suited to obtaining results on random structures. Other such applications (such as the expected maximum vertex degree of randomly generated trees and a related result on random mappings) can be found in the exercises and in Section 6.4.

6.5 Exercises

6.3.1. [17] Let w(n) be the largest integer such that for every tournament T on N = {1,…, n} there exist disjoint sets A and B, each of cardinality w(n), in N such that A×BT. Prove w(n) ≤ 2⌊logn⌋.

Comments. Hint: add 2w(n)⌊logn⌋ bits to describe nodes, and save w(n)2 bits on edges. Source of the problem: P. Erdős and J.H. Spencer, Probabilistic Methods in Combinatorics, Academic Press, 1974.

6.3.2. [25] Let T be a tournament on N = {1,…, n}. Define a ranking R as an ordering of N. For (i,j) ∊ T, if R(i) < R(j), we say that R agrees with (i,j). Otherwise, it disagrees with that edge. We are interested in a ranking that is most consistent with T, that is, such that the number of edges that agree with R is maximized. Show that for large enough n, there exist tournaments such that any ranking disagrees with at least 49% of its edges.

Comments. A simple incompressibility argument is given by M. Fouz and P. Nicholson, CS798 Course Report, University of Waterloo, December 2007. Relevant literature on this problem can be found in [N. Alon, J.H. Spencer, The Probabilistic Method, Wiley, 2000, p. 134].

6.3.3. [17] Let G = (V, E) with V = {1,…, n} be an undirected graph on n nodes with C(G\n,p) ≥ n(n−1)/2, where p is a fixed program to be used to reconstruct G. A clique of a graph is a complete subgraph of that graph. Show that G does not contain a clique on more than 1 + ⌊2 logn⌋ nodes.

Comments. Hint: use Section 6.3.1. To compare this result with a similar one about randomly generated graphs, N. Alon, J.H. Spencer, P. Erdős, The Probabilistic Method, Wiley, 1992, pp. 86–87, show that a random graph with edge probability ½ contains a clique on 2logn nodes with probability at least \(1 - 1/e^{n^2 }.\)

6.3.4. [36] Let K(N) denote the complete undirected graph of n nodes N = {1,…,n}. If A and B are disjoint subsets of N, then K(A,B) denotes the complete bipartite graph on sets A and B. A set C = (K(A1, B1),…, K(A j ,B j )) is called a covering family of K(N) if for every edge {u, v} ∊ K(N) there exists an i (1 ≤ ij) such that \(\left\{ {u,v} \right\} \in K\left({A_i,B_i } \right).\). Let g(n) denote the minimum of \(\sum {_{1 \le i \le j} } d\left({A_i \cup B_i } \right)\) over all covering families for K(N). Prove by incompressibility that g(n)/n≥logn + O(loglogn).

Comments. An information-theoretic proof appears in [N. Pippenger, J. Comb. Theory, Ser. A, 23(1977), 105–115]. Hint: use the symmetry of information, Theorem 2.8.2, on page 190. Source: M. Li and P.M.B. Vitányi, J. Comb. Theory, Ser. A, 66:2(1994), 226–236.

6.3.5. [25] Consider a random directed graph whose n2 nodes are on the intersections of a two-dimensional n by n grid. All vertical edges (the grid edges) are present and directed upward. For every pair of horizontally neighboring nodes, we flip a three-sided coin; with probability p < ½ we add an edge from left to right, with probability p we add an edge from right to left, and with probability 1 − 2p we add no edge. Use incompressibility to prove that the expected maximum path length over all such random graphs is bounded by O(n).

Comments. Source: T. Jiang and Z.Q. Luo, personal communication, 1992. This problem was studied in connection with communication networks.

6.3.6. [36] From among \(\left({\frac{n}{3}} \right)\) triangles with vertices chosen from n points in the unit square, let T n be the one with the smallest area, and let A n be the area of T n . Heilbronn's triangle problem asks for the maximum value Δ n assumed by A n over all choices of n points. We consider the average case: Show that if the n points are chosen independently and at random (with a uniform distribution), then there exist positive constants c and C such that c/n3 < μ n < C/n3 for all large enough values of n, where μ n is the expectation of A n . Moreover, c/n3 < A n < C/n3, with probability close to one.

Comments. Hint: put the n points on the intersections of a k × k grid and show that the description of the arrangement can be compressed significantly below the maximum, both if the smallest triangle has too large an area and if it has too small an area, independent of k. Source: T. Jiang, M. Li, P.M.B. Vitányi, Random Struct. Alg., 20:2(2002), 206–219, which contains literature pointers to other related results. A generalization of the average case result is given in [G. Grimmett, S. Janson, Random Struct. Alg., 23:2(2003), 206-223]. History: H.A. Heilbronn conjectured that Δ n = O(1/n2) in 1950, and P. Erdős proved that Δ n = Ω(1/n2) in 1950. K.F. Roth proved that \(\Delta _n = O\left({1/n\sqrt {\log \log n} } \right)\) in 1951. W.M. Schmidt improved Roth's bound to \(O\left({1/n\sqrt {\log n} } \right)\) in 1972. Roth further improved this to O(1/n1105) and O(1/n1117) in 1972. J. Komlós, J. Pintz, and E. Szemerédi further improved this to O(1/n8/7−ϵ) in 1981 and they proved an Ω(log n/n2) lower bound in 1982. The problem has many generalizations and several dedicated websites.

6.3.7. [35] Given an n-dimensional cube and a permutation π of its nodes, each node v wants to send an information packet to node π(v) as fast as possible. Label every edge in the cube with its dimension from {1,…, n}. A route (v1v2 →…→ v k ) is ascending if (v i , v i +1) has higher dimension than (v i −1,v i ) for all 2 <i< k − 1. If two packets use the same edge in the same direction at the same time, then a collision occurs, and one packet has to wait. How do we avoid too many collisions on each route? Consider the following probabilistic algorithm Aπ: Step 1. For every node v, choose randomly a node w. Node v sends its packet over the uniquely determined ascending route to w. Step 2. Send the packet from w to π(v) through the unique ascending route. Prove that for every constant c, algorithm Aπ finishes with probability greater than 1 − 2−(c−5nO(1))/2 after at most 2n + 2c steps.

Comments. Hint: show that the description of a route on which too many collisions occur can be compressed. Source: L.G. Valiant and G. Brebner, Proc. 13th ACM Symp. Theory Comput, 1981, pp. 263–277; S. Reisch and G. Schnitger give an incompressibility proof in [Proc. 23rd IEEE Found. Comput. Sci., 1982, pp. 45–52].

6.3.8. [39] Let L ⊂ {0, 1}2n be a language to be recognized by two parties P and Q with unlimited computation power. Party P knows the first n bits of the input and party Q knows the last n bits. P and Q exchange messages to recognize L according to some bounded-error two-way probabilistic protocol. An input is accepted if the probability of acceptance is at least 1 − ϵ for some fixed ϵ, 0 ≤ ϵ < ½; an input is rejected if the probability of rejection is at least 1 − ϵ; and every input must be either rejected or accepted. The probabilistic communication complexity of an input (x1,…, x2n) is the worst case, over all sequences of fair coin tosses, of the number of bits exchanged. The probabilistic communication complexity of the language is the maximum of this over all inputs. The set intersection language SETIN is defined to be the set of all sequences a1anb1b n over {0,1} with \(\sum {\begin{array}{*{20}c} n \\ {i = 1} \\\end{array}a_i b_i \ge 1.} \) (P knows a1,…, an and Q knows b1,…,b n .) Prove that the probabilistic communication complexity of SETIN is Ω(n).

Comments. Source: B. Kalyanasundaram and G. Schnitger, SIAM J. Discrete Math., 5:4(1992), 545–557.

6.3.9. [37] An (n, d, m)-graph is a bipartite multigraph with n vertices on the left side and m vertices on the right side, with every vertex on the left having degree d, and every vertex on the right having degree dn/m (assuming m\dn). An (n, d, m)-graph is (α, β)-expanding if every subset S of αn vertices on the left has more than βm neighbors on the right, for 0 < α < β < 1. Prove that for every n, 0 < α < β < 1, λ < 0, there is a
$$d > \frac{{h\left(\alpha \right) + h\left(\beta \right){\rm{\lambda }}}}{{h\left(\alpha \right) - h\left({\alpha /\beta } \right)\beta }}$$

such that there is an (α, β)-expanding (n, d, λn)-graph.

Comments. Hint: take a (n, d, λn)-graph of maximal complexity. Source: U. Schöning, Random Struct. Alg., 17(2000), 64–77. The original probabilistic proof with λ = 1 is in [L.A. Bassalygo, Prob. Inform. Transmission, 17(1981), 206–211].

6.3.10. [25] An (n, d, α, c) OR-concentrator is a bipartite graph G(L + R, E) on the independent vertex sets L and R with d(L) = d(R) = n such that (i) every vertex in L has degree d, and (ii) every subset SL with d(S) ≤ αn is connected to at least cn neighbors (in R). Show that there exist \(\left({n,9.48,\frac{1}{3},2} \right)\) OR-concentrators.

Comments. A simple incompressibility proof is given by M. Fouz, CS798 Course Report, University of Waterloo, December 2007. A probabilistic proof (with worse constants) is found in [R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge Univ. Press, 1995, pp. 108–110].

6.3.11. [40] Let s1sn be a string over an alphabet of cardinality c. A monochromatic arithmetic progression (m.a.p.) of length k is a subsequence s i s i +tsi+2ts i +(k−1)t with all characters equal. The van der Waerden number w(k; c) is the least number n such that every string of length n contains a m.a.p. of length k. Use the incompressibility method in the problems below.
  1. (a)

    Show that \(w\left({k;c} \right) > \sqrt {k - 1}.c^{\frac{k}{2} - 1}.\)

  2. (b)

    Strengthen the bound to \(w\left({k;c} \right) > \frac{{c^{k - 2} }}{{4k}}.\frac{{k - 1}}{k}.\)

Comments. The lower bound of Item (b) matches the one obtained orig inally by L. Lovász, and is worse than later applications of Lovász's local lemma by a factor 4/e; see for example [Z. Szábo, Random Struct. Alg., 1:3(1990), 343–360]. The method can also be used for other Ramsey-type lower bounds. Hint for Item (b):
  1. (i)

    Show that \(\left\lceil {\log _c \left({n \cdot k \cdot \frac{k}{{k - 1}}} \right)} \right\rceil + 1\) characters suffice to encode a m.a.p., if it is known to intersect some other fixed progression of length k.

  2. (ii)

    Consider a procedure which in a long incompressible string repeat edly does the following: Find a m.a.p. within the first w(k; c) characters. Encode it by some string, delete the corresponding characters, and replace them with characters from the end of the string.

  3. (iii)

    Use the following fact without proof: With log c 4 additional characters in the encoding of every deleted progression one can guarantee that there is always a m.a.p. in the first w(k; c) characters that intersects the positions of a previously deleted progression (determinable without additional characters).


Implement Items (i) through (iii) as follows: maintain a stack whose elements each contain the positions of a progression that has been deleted. Upon deletion of a progression, push its positions onto the stack. If a progression in the current string intersects the positions on top of the stack, encode it this way; otherwise, delete the top of the stack. Encoding which case happened can be done with log c 2 characters and every case happens at most once per deleted progression. Source: P. Schweitzer, Using the incompressibility method to obtain local lemma results for Ramsey-type problems, Inform. Process. Lett, to appear.

6.6 Kolmogorov Random Graphs

Statistical properties of strings with high Kolmogorov complexity were studied in Section 2.6. The interpretation of strings as more complex combinatorial objects leads to a completely new set of properties and problems that have no direct counterpart in the flatter string world. Here we derive topological, combinatorial, and statistical properties of graphs with high Kolmogorov complexity. Every such graph possesses simultaneously all properties that hold with high probability for randomly generated graphs. They constitute almost all graphs, and the derived properties a fortiori hold with probability that goes to 1 as the number of nodes grows unboundedly, in the sense of Section 6.2.

6.6.1 Definition 6.4.1

Every labeled graph G = (V, E) on n nodes V = {1, 2,…, n} can be coded (up to automorphism) by a binary string E(G) of length n(n − 1)/2. We enumerate the n(n − 1)/2 possible edges (i, j) in a graph on n nodes in standard lexicographic order without repetitions and set the ith bit in the string to 1 if the edge is present and to 0 otherwise. Conversely every binary string of length n(n − 1)/2 encodes a graph on n nodes. Hence we can identify every such graph with its corresponding binary string.

6.6.2 Definition 6.4.2

A labeled graph G on n nodes has randomness deficiency at most δ(n), and is called δ(n)-random, if it satisfies
$$C\left({E\left(G \right)\left| {n,\delta } \right.} \right) \ge n\left({n - 1} \right)/2 - \delta \left(n \right).$$

6.6.3 Lemma 6.4.1

A fraction of at least 1 − 1/2δ(n) of all labeled graphs G on n nodes is δ(n)-random.

This is a corollary of Lemma 6.2.1. For example, the clogn-random labeled graphs constitute a fraction of at least (1 − 1/nc) of all graphs on n nodes, where c> 0 is an arbitrary constant.

High-complexity labeled graphs have many specific topological properties, which seems to contradict their randomness. However, randomness is not lawlessness but rather enforces strict statistical regularities, for example, to have diameter exactly two.

6.6.4 Lemma 6.4.2

The degree d of every node of a δ(n)-random labeled graph satisfies
$$\left| {d - \left({n - 1} \right)/2\left| { = O\left({\sqrt {\left({\delta \left(n \right) + \log n} \right)n} } \right).} \right.} \right.$$

Proof. Assume that there is a node such that the deviation of its degree d from (n− 1)/2 is greater than k. From the lower bound on C(E(G)∣n, δ) corresponding to the assumption that G is random, we can estimate an upper bound on k as follows:

In a description of G = (V, E) given n, δ we can indicate which edges are incident on node i by giving the index of the interconnection pattern (the characteristic sequence of the set V i = {jV − {i} : (i,j) ϵ E} inn−1 bits where the jth bit is 1 if jV i and 0 otherwise) in the ensemble of
$$m = \sum\limits_{\left| {d - \left({n - 1} \right)/2\left| { > k} \right.} \right.} {\left({\begin{array}{*{20}c} {n - 1} \\ d \\ \end{array}} \right) \le 2^n e^{ - 2k^2 /3\left({n - 1} \right)} } $$
possibilities. The last inequality follows from a general estimate of the tail probability of the binomial distribution, with s n the number of successful outcomes in n experiments with probability of success p = ½. Namely by Chernoff's bounds, Equation 2.4 on page 167,
$${\rm{pr}}\left({\left| {s_n - pn\left| { > k} \right.} \right.} \right) \le 2e^{ - k^2 /3pn}.$$
To describe G it then suffices to modify the old code of G by prefixing it with
  • A description of this discussion in O(1) bits;

  • the identity of node i in ⌊log(n + 1)⌋ bits;

  • the value of k in ⌊log(n + 1)⌋ bits, possibly adding nonsignificant 0's to pad up to this amount;

  • the index of the interconnection pattern in log m bits (we know n, k and hence logm); followed by

  • the old code for G with the bits in the code denoting the presence or absence of the possible edges that are incident on node i deleted.

Clearly, given n we can reconstruct the graph G from the new description. The total description we have achieved is an effective program of
$$\log m + 2\log n + n\left({n - 1} \right)/2 - n + O\left(1 \right)$$
bits. This must be at least the length of the shortest effective binary program, which is C(E(G)∣n, δ), satisfying Equation 6.3. Therefore,
$$\log m \ge n - 2\log n - O\left(1 \right) - \delta \left(n \right).$$
Since we have estimated in Equation 6.4 that
$$\log m \le n - \left({2k^2 /3\left({n - 1} \right)} \right)\log e,$$

it follows that \(\le \sqrt {\frac{3}{2}\left({\delta \left(n \right) + 2\log n + O\left(1 \right)} \right)\left({n - 1} \right)/\log e.}\)

6.6.5 Lemma 6.4.3

All o(n)-random labeled graphs have ¼n + o(n) disjoint paths of length 2 between every pair of nodes i,j. In particular, all o(n)-random labeled graphs have diameter 2.

Proof. The only graphs with diameter 1 are the complete graphs that can be described in O(1) bits, given n, and hence are not random. It remains to consider an o(n)-random graph G = (V, E) with diameter greater than or equal to 2. Let i, j be a pair of nodes connected by r disjoint paths of length 2. Then we can describe G by modifying the old code for G as follows:
  • a program to reconstruct the object from the various parts of the encoding in O(1) bits;

  • the identities of i < j in 2logn bits;

  • the old code E(G) of G with the 2(n − 2) bits representing presence or absence of edges (j, k) and (i, k) for every ki, j deleted;

  • a shortest program for the string e i,j consisting of the (reordered) n − 2 pairs of bits deleted above.

From this description we can reconstruct G in
$$O\left({\log n} \right) + \left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - 2\left({n - 2} \right) + C\left({e_{i,j} \left| n \right.} \right)$$

bits, from which we may conclude that \(C\left({e_{i,j} \left| n \right.} \right) \ge l\left({e_{i,j} } \right) - o\left(n \right).\) As shown in Lemma 2.6.1 this implies that the frequency of occurrence in e i,j of the aligned 2-bit block 11—which by construction equals the number of disjoint paths of length 2 between i and \(j - {\rm{is}}\frac{1}{4}n + o\left(n \right).\)

A graph is k-connected if there are at least k node-disjoint paths between every pair of nodes.

6.6.6 Corollary 6.4.1

All o(n)-random labeled graphs aren + o(n))-connected.

6.6.7 Lemma 6.4.4

Let G = (V, E) be a graph on n nodes with randomness deficiency O(logn). Then the largest clique in G has at most ⌊2logn⌋ +O(1) nodes.

Proof. This is the same proof as that for the largest transitive subtour-nament in a high-complexity tournament, Theorem 6.3.1. ◻

With respect to the related property of random graphs, in [N. Alon, J.H. Spencer, and P. Erdős, The Probabilistic Method, 1992, pp. 86, 87] it is shown that a random graph with edge probability ½ contains a clique on asymptotically 2 log n nodes with probability at least \(1 - e^{ - n^2 }.\)

6.6.8 Lemma 6.4.5

Let c be a fixed constant. If G is a clogn-random labeled graph, then from every node i all other nodes are either directly connected to i or are directly connected to one of the least (c + 3)logn nodes directly adjacent to i.

Proof. Given i, let A be the set of the least (c + 3) logn nodes directly adjacent to i. Assume by way of contradiction that there is a node k of G that is not directly connected to a node in \(A \cup \left\{ i \right\}.\). We can describe G as follows:
  • a description of this discussion in O(1) bits;

  • a literal description of i in logn bits;

  • a literal description of the presence or absence of edges between i and the other nodes in n − 1 bits;

  • a literal description of k and its incident edges in logn + n − 2 − (c + 3) logn bits;

  • the encoding E(G) with the edges incident with nodes i and k deleted, saving at least 2n • 2 bits.

Altogether the resultant description has
$$n\left({n - 1} \right)/2 + 2\log n + 2n - 3 - \left({c + 3} \right)\log n - 2n + 2$$
bits, which contradicts the clogn-randomness of G by Equation 6.3 on page 461. The lemma is proven. ◻

In the description we have explicitly added the adjacency pattern of node i, which we deleted later again. This zero-sum swap is necessary to be able to unambiguously identify the adjacency pattern of i in order to reconstruct G. Since we know the identities of i and the nodes adjacent to i (they are the prefix where no bits have been deleted), we can reconstruct G from this discussion and the new description, given n.

6.6.9 Statistics of Subgraphs

We start by defining the notion of labeled subgraph of a labeled graph. Let G = (V, E) be a labeled graph on n nodes. Consider a labeled graph Honk nodes {1,2,…,k}. Each subset of k nodes of G induces a subgraph G k of G. The subgraph G k is an ordered labeled occurrence of H when we obtain H by relabeling the nodes i1 < i2 < … < i k of G k as 1,2,…,k.

It is easy to conclude from the statistics of high-complexity strings in Lemma 2.6.1 that the frequency of every of the two labeled two-node subgraphs in a δ(n)-random graph G is
$$\frac{{n\left({n - 1} \right)}}{4} \pm \sqrt {\frac{3}{4}\left({\delta \left(n \right) + O\left(1 \right)} \right)n\left({n - 1} \right)/\log e.} $$

This case is easy, since the frequency of such subgraphs corresponds to the frequency of 1's or 0's in the \(\left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right)\)-length standard encoding E(G) of G. However, to determine the frequencies of labeled subgraphs on k nodes (up to isomorphism) for k > 2 is a matter more complicated than the frequencies of substrings of length k. Clearly, there are \(\left({\begin{array}{*{20}c} n \\ k \\\end{array}} \right)\) subsets of k nodes out of n and hence that many occurrences of subgraphs. Such subgraphs may overlap in more complex ways than substrings of a string. Let #H(G) be the number of times H occurs as an ordered labeled subgraph of G (possibly overlapping). Let p be the probability that we obtain H by flipping a fair coin to decide for every pair of nodes whether it is connected by an edge. Then, p = 2k(k−1)/2. The proof of the following theorem is deferred to Exercise 6.4.2 on page 468.

6.6.10 Theorem 6.4.1

Assume the terminology above with G = (V, E) a labeled graph on n nodes, k a positive integer dividing n, and H a labeled graph on \(k \le \sqrt {2\log n} \) nodes. Let C(E(G)∣n) ≥ \(\left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right) - \delta \left(n \right).\) − δ(n). Then
$$\begin{array}{l} \left| {\# H\left(G \right) - \left({\begin{array}{*{20}c} n \\ k \\ \end{array}} \right)p\left| { \le \left({\begin{array}{*{20}c} n \\ k \\ \end{array}} \right)\sqrt {\alpha \left({k/n} \right)p,} } \right.} \right. \\ with\,\alpha = \left({K\left({H\left| n \right.} \right) + \delta \left(n \right) + \log \left({\begin{array}{*{20}c} n \\ k \\ \end{array}} \right)/\left({n/k} \right) + O\left(1 \right)} \right)3/\log e. \\ \end{array}$$

6.6.11 Unlabeled Graph Counting

An unlabeled graph is a graph with no labels. For convenience we can define this as follows: Call two labeled graphs equivalent (up to relabeling) if there is a relabeling that makes them equal. An unlabeled graph is an equivalence class of labeled graphs. An automorphism of G = (V, E) is a permutation π of V such that (π(u), π(v)) ∊ E iff (u, v) ∊ E. Clearly, the set of automorphisms of a labeled graph forms a group with group operation of function composition and the identity permutation as unity. It is easy to verify that π is an automorphism of G iff π(G) and G have the same binary string standard encoding, that is, E(G) = E(π(G)). This contrasts with the more general case of permutation relabeling, where the standard encodings may be different. A labeled graph is rigid if its only automorphism is the identity automorphism. It turns out that Kolmogorov random labeled graphs are rigid graphs. To obtain an expression for the number of unlabeled graphs we have to estimate the number of automorphisms of a graph in terms of its randomness deficiency Below, ‘graph’ means ‘labeled graph’ unless indicated otherwise.

In [F. Harary and E.M. Palmer, Graphical Enumeration, Academic Press, 1973] an asymptotic expression for the number of unlabeled graphs is derived using sophisticated methods. We give a new elementary proof by incompressibility. Denote by g n the number of unlabeled graphs on n nodes—that is, the number of isomorphism classes in the set Q n of undirected labeled graphs on nodes {0,1,…, n − 1}.

6.6.12 Theorem 6.4.2

$$gn \sim \frac{{2^{\left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right)} }}{{n!}}.$$
Proof. Clearly,
$$gn = \sum\limits_{G \in g_n } {\frac{1}{{d\left({\left[ G \right]} \right)}},} $$
where [G] is the isomorphism class of graph G. By elementary group theory,
$$d\left({\left[ G \right]} \right) = \frac{{d\left({S_n } \right)}}{{d\left({{\rm{Aut}}\left(G \right)} \right)}} = \frac{{n!}}{{d\left({{\rm{Aut}}\left(G \right)} \right)}},$$

where S n is the group of permutations on n elements, and Aut(G) is the automorphism group of G. Let us partition G n into \(G_n = G_n^0 \cup \cdots \cup G_n^n,\), where \(G_n^m \) is the set of graphs for which m is the number of nodes moved (mapped to another node) by any of its automorphisms.

6.6.13 Claim 6.4.1

For \(G \in g_n^m,d\left({{\rm{Aut}}\left(G \right)} \right) \le n^m = 2^{m\log n}.\)

Proof. \(G \in G_n^m \)

Consider every graph GG n having a probability \(\Pr \left(G \right) = 2^{ - \left({\frac{n}{2}} \right)}.\).

6.6.14 Claim 6.4.2

\(\Pr \left({G \in G_n^m } \right) \ge 2^{ - m} \left({\frac{1}{2}n - \frac{3}{8}m - \log n} \right).\)

Proof. By Lemma 6.4.1 it suffices to show that if \(G \in G_n^m \) and then δ(n, m) satisfies
$$\begin{array}{l} C\left({E\left(G \right)\left| {n,m} \right.} \right) \ge \left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - \delta \left({n,m} \right) \\ {\rm{then}}\,\delta \left({n,m} \right)\,{\rm{satisfies}} \\ \delta \left({n,m} \right) \ge m\left({\frac{1}{2}n - \frac{3}{8}m - \log n} \right). \\ \end{array}$$

Let π ∈ Aut(G) move m nodes. Suppose π is the product of k disjoint cycles of sizes c1,…,c k . Spend at most m log n bits describing π: For example, if the nodes i1 < … < i m are moved, then list the sequence π(i1),…, π(i m ). Writing the nodes of the latter sequence in increasing order, we obtain i1,…,i m again, that is, we execute permutation π−1, and hence we obtain π.

Select one node from each cycle—say, the lowest-numbered one. Then for every unselected node on a cycle, we can delete the nm bits corresponding to the presence or absence of edges to stable nodes, andmk half-bits corresponding to presence or absence of edges to the other, unselected, cycle nodes. In total we delete
$$\sum\limits_{i = 1}^k {\left({c_i - 1} \right)\left({n - m + \frac{{m - k}}{2}} \right) = \left({m - k} \right)\left({n - \frac{{m + k}}{2}} \right)} $$
bits. Observing that k = ½m is the largest possible value for k, we arrive at the claimed δ(n, m) of G (difference between savings and spending is ½m(n − ¾m) − m log n) of Equation 6.6. ◻
We continue the proof of the main theorem:
$$g_n = \sum\limits_{G \in g_n } {\frac{1}{{d\left({\left[ G \right]} \right)}} = \sum\limits_{G \in g_n } {\frac{{d\left({{\rm{Aut}}\left(g \right)} \right)}}{{n!}}} } = \frac{{2^{\left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right)} }}{{n!}}E_n,$$
where E n = ∑ G∊G n is the expected size of the automorphism group of a graph on n nodes. Clearly, E n ≥ 1, yielding the lower bound on g n . For the upper bound on g n , noting that \(G_n^1 = \emptyset \) and using the above claims, we find that
$$\begin{array}{l} E_n = \sum\limits_{m = 0}^n {{\rm{pr}}\left({G \in g_n^m } \right){\rm{Avg}}_{G \in g_n^m } d\left({{\rm{Aut}}\left(G \right)} \right)} \\ \le 1 + \sum\limits_{m = 2}^n {2^{ - m\left({\frac{1}{2}n - \frac{3}{8}m - 2\log n} \right)} } \\ \le 1 + 2^{ - \left({n - 4\log n - 2} \right)}, \\ \end{array}$$

with Avg meaning ‘the average,’ which proves the theorem.◻

The proof of the theorem shows that the error in the asymptotic expression is very small:

6.6.15 Corollary 6.4.2

\(\frac{{2^{\left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right)} }}{{n!}} \le gn \le \frac{{2^{\left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right)} }}{{n!}}\left({1 + \frac{{4n^4 }}{{2n}}} \right).\).

Equation 6.6 yields the following (note that m = 1 is impossible):

6.6.16 Corollary 6.4.3

If a graph G has randomness deficiency slightly less than n (more precisely, \(C\left({E\left(G \right)\left| n \right.} \right) \ge \left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right) - n - \log n - \left. 2 \right)\) then G is rigid.

The expression for g n can be used to determine the maximal complexity of an unlabeled graph on n nodes. Namely, we can effectively enumerate all unlabeled graphs as follows:
  • Step 1. Effectively enumerate all labeled graphs on n nodes by enumerating all binary strings of length n and, and for every enumerated labeled graph G do Step 2

  • Step 2. If G cannot be obtained by relabeling from any previously enumerated labeled graph then G is added to the set of unlabeled graphs.

In this way we obtain every unlabeled graph by precisely one labeled graph representing it. Since we can describe every unlabeled graph by its index in this enumeration, we find by Theorem 6.4.2 and Stirling's formula that if G is an unlabeled graph, then
$$C\left({E\left(G \right)\left| n \right.} \right) \le \left({\begin{array}{*{20}c} n \\ 2 \\ \end{array}} \right) - n\log n + O\left(n \right).$$

6.6.17 Lemma 6.4.6

Let G be a labeled graph on n nodes and let G 0 be the unlabeled version of G. There exists a graph Gand a label permutation π such that G′ = π(G) and up to additional constant terms C(E(G′)) = C(E(G 0 )) and C(E(G)∣n) = C(E(G 0 ),π∣n).

By Lemma 6.4.6, for every graph G on n nodes with maximum complexity there is a relabeling (permutation) that causes the complexity to drop by as much as n log n. Our proofs of topological properties by the incompressibility method required the graph G to be Kolmogorov random in the sense of \(C\left({E\left(G \right)\left| n \right.} \right) \ge \left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right) - O\left({\log n} \right)\) or for some results \(C\left({E\left(G \right)\left| n \right.} \right) \ge \left({\begin{array}{*{20}c} n \\ 2 \\\end{array}} \right) - o\left(n \right).\) Hence by relabeling such a graph we can always obtain a labeled graph that has a complexity too low to use our incompressibility proof. Nonetheless, topological properties do not change under relabeling.

6.7 Exercises

6.4.1. [M40] Use the terminology of Theorem 6.4.1. A cover of G is a set C = {S1, …,S N } with N = n/k, where the S i 's are pairwise disjoint subsets of V and \( \cup _{i = 1}^N S_i = V.\). There is a partition of the \(\left({\begin{array}{*{20}c} n \\ k \\\end{array}} \right)\) different k-node subsets into \(h = \left({\begin{array}{*{20}c} n \\ k \\\end{array}} \right)/N = \left({\begin{array}{*{20}c} {n - 1} \\ {k - 1} \\\end{array}} \right)\) distinct covers of G, every cover consisting of N = n/k disjoint subsets. That is, every subset of k nodes of V belongs to precisely one cover.

Comments. Source: Zs. Baranyai, pp. 91–108 in: A. Hajnal, R. Rado, and V.T. Sós, eds., Infinite and Finite Sets, Proc. Coll. Keszthely, Colloq. Math. Soc. János Bolyai, 10, Vol. 1, North-Holland, Amsterdam, 1975.

6.4.2. [27] Use Exercise 6.4.1 to prove Theorem 6.4.1.

Comments. Hint: similar to the proof of Theorem 2.6.1, with the labeled graph G in the part of the overall string, and cover elements (subsets of labeled nodes inducing subgraphs) taking the part of the blocks. Source: H.M. Buhrman, M. Li, J.T. Tromp, and P.M.B. Vitányi, SIAM J. Com-put, 29:2(1999), 590–599. This is also the source for the next exercise.

6.4.3. [20] In Section 2.6 we investigated up to which length l all blocks of length l occurred at least once in every δ(n)-random string of length n.

Let \(\delta \left(n \right) = 2^{\sqrt {2\log n/2} } /4\log n\) and G be a δ(n)-random graph on n nodes. Show that for sufficiently large n, the graph G contains all subgraphs on \(\sqrt {2\log n} \) nodes.

6.4.4. [26] Show that almost every labeled tree on n nodes has maximum degree of O(logn/ log logn).

Comments. Hint: represent a labeled tree by a binary sequence of length (n − 2) logn (the Prüfer code). Prove a one-to-one correspondence between labeled trees and binary sequences of such length. Use incompress-ibility to show that if a tree has larger degree, then one can compress the corresponding binary sequence. Since most binary sequences cannot be compressed, most trees do not have larger degree. Source: W.W. Kirchherr, Inform. Process. Lett, 41(1992), 125–130.

6.8 Compact Routing

In very large networks such as the global telephone network or the Internet, the mass of messages being routed creates major bottlenecks, degrading performance. We analyze a tiny part of this issue by determining the optimal space to represent routing schemes in communication networks on average for all static networks.

A universal routing strategy for static communication networks will, for every network, generate a routing scheme for that particular network. Such a routing scheme comprises a local routing function for every node in this network. The routing function of node u returns for every destination vu an edge incident to u on a path from u to v. In this way, a routing scheme describes a path, called a route, between every pair of nodes u, v in the network.

It is easy to see that we can do shortest-path routing by entering a routing table in every node u that for every destination node v indicates to what adjacent node w a message to v should be routed first. If u has degree d, it requires a table of at most nlogd bits, and the overall number of bits in all local routing tables never exceeds n2 logn. Several factors may influence the cost of representing a routing scheme for a particular network. We use a basic model and leave variations to the exercises. Here we consider point-to-point communication networks on n nodes described by an undirected labeled graph G = (V, E), where V = {1,…, n}. Assume that nodes know the identities of their neighbors.

In [H.M. Buhrman, J.H. Hoepman, and P.M.B. Vitányi, SI AM J. Comput, 28:4(1999), 1414–1432], it is shown that in most models, for almost all graphs (that is, networks), Θ(n2) bits are necessary and sufficient for shortest-path routing. By ‘almost all graphs’ we mean the Kolmogorov random graphs that constitute a fraction of 1—1/n c of all graphs on n nodes, where c ≥ 3 is an arbitrary fixed constant. In contrast, there is a model that causes the average-case lower bound to rise to Ω(n2 log n) and another model whose average-case upper bound drops to O(n log2 n). This clearly exposes the sensitivity of such bounds to the model under consideration.

6.8.1 Upper Bound

In general (on almost all networks), one can use shortest-path routing schemes occupying at most O(n2) bits. Relaxing the requirement of shortest path is expressed in the stretch factor of a routing scheme. This equals the maximum ratio between the length of a route it produces and the shortest path between the endpoints of that route. The stretch factor of a routing strategy equals the maximal stretch factor attained by any of the routing schemes it generates. The shortest-path routing strategy has stretch factor equal to 1. Allowing stretch factors larger than 1 reduces the space requirements—to as low as O(n) bits for stretch factors of O(log n), Exercise 6.5.2.

Theorem 6.5.1

For shortest-path routing in O(log n)-random graphs, local routing functions can be stored in 6n bits per node. Hence the complete routing scheme is represented by 6n2 bits.

Proof. Let G be an O(logn)-random graph on n nodes. By Lemma 6.4.5 we know that from every node u we can route via shortest paths to every node v through the O(logn) directly adjacent nodes of u that have the least indexes. By Lemma 6.4.3, G has diameter 2. Once the message has reached node v its destination is either node v or a direct neighbor of node v (which is known in node v by assumption). Therefore, routing functions of size O(n log log n) can be used to do shortest-path routing. We can do better than this.

Let A0V be the set of nodes in G that are not directly connected to u. Letv1,…,v m be the O(logn) least indexed nodes directly adjacent to node u (Lemma 6.4.5), through which we can route via shortest paths to all nodes in A0. For t = 1, 2 …, l define \(At = \left\{ {w \in A_0 - \cup _{s = 1}^{t - 1} As:\left({v_t,w} \right) \in E} \right\}.\) Let m0 = d(A 0 ) and define mt+1 = m t d(At+1). Let l be the first t such that m t < n/ log log n. Then we claim that v t is connected by an edge in E to at least 1/3 of the nodes not connected by edges in E to nodes u,v1, …,vt−1.

Claim 6.5.1

d(A t ) > mt−1/3 for 1 ≤ tl.

Proof. Suppose by way of contradiction that there exists a least tl such that ∣d(A t ) − mt−½∣ > mt−1/6. Then we can describe G, given n, as follows:
  • This discussion in O(1) bits.

  • Nodes u, v t in 2logn bits, padded with 0's if need be.

  • The presence or absence of edges incident with nodes u,v1,…,vt−1 inr = n−1 + … +n−(t−1) bits. This gives us the characteristic sequences of A 0 ,…, At−1 in V, where a characteristic sequence of A in V is a string of d(V) bits such that for every vV, the vth bit equals 1 if vA and the vth bit is 0 otherwise.

  • A self-delimiting description of the characteristic sequence of A t in \(A_0 - \cup _{s = 1}^{t - 1} A_s,\), using Chernoff's bound, Equation 2.4 on page 167, in at most \(m_{t - 1} - \frac{2}{3}\left({\frac{1}{6}} \right)^2 m_{t - 1} \log e + O\left({\log m_{t - 1} } \right)\) bits.

  • The description E(G) with all bits corresponding to the presence or absence of edges between v t and the nodes in \(A_0 - \cup _{s = 1}^{t - 1} A_s \) deleted, saving mt−1 bits. Furthermore, we also delete all bits corresponding to presence or absence of edges incident with u,v1,…,vt−1, saving a further r bits.

This description of G uses at most
$$\frac{1}{2}n\left({n - 1} \right) + O\left({\log n} \right) + m_{t - 1} - \frac{2}{3}\left({\frac{1}{6}} \right)^2 m_{t - 1} \log e - m_{t - 1} $$
bits, which contradicts the O(logn)-randomness of G by Equation 6.3 on page 461, because mt−1 > n/loglogn. ◻
Recall that l is the least integer such that m l < n/ log log n. We construct the local routing function F(u) as follows:
  • A table of intermediate routing node entries for all the nodes in A 0 in increasing order. For every node w in \( \cup _{s = 1}^l A_s \) we enter in the wth position in the table the unary representation of the least intermediate node v, with (u,v), (v,w) ∈ E, followed by a 0. For the nodes that are not in \( \cup _{s = 1}^l A_s \) we enter a 0 in their position in the table indicating that an entry for this node can be found in the second table. By Claim 6.5.1, the size of this table is bounded by
    $$n + \sum\limits_{s = 1}^l {\frac{1}{3}\left({\frac{2}{3}} \right)^{s - 1} sn \le n + \sum\limits_{s = 1}^\infty {\frac{1}{3}\left({\frac{2}{3}} \right)^{s - 1} sn \le 4n.} } $$
  • A table with explicitly binary-coded intermediate nodes on a shortest path for the ordered set of the remaining destination nodes. Those nodes have a 0 entry in the first table and there are at most m l < n/ log log n of them, namely the nodes in \(A_0 - \cup _{s = 1}^l A_s.\) Each entry consists of the code of length log log n + O(1) for the position in increasing order of a node out of v1, …, v m with m = O(logn) by Lemma 6.4.5. Hence this second table requires at most 2n bits.

The routing algorithm is as follows: The direct neighbors of u are known in node u and are routed without the routing table. If we route from start node u to target node w that is not directly adjacent to u, then we do the following. If node w has an entry in the first table then route over the edge coded in unary; otherwise find an entry for node w in the second table.

Altogether, we have d(F(u))≤ Slightly more precise counting and choosing l such that m l is the first such quantity < n/logn shows that d(F(u)) ≤ 3n.◻

6.8.2 Lower Bound

We show that Ω(n2) bits are required to perform routing on Kolmogorov random graphs. Hence the upper bound in Theorem 6.5.1 is tight up to order of magnitude.

Theorem 6.5.2

For shortest-path routing in o(n)-random graphs, every local routing function must be stored in at least ½n − o(n) bits per node. Hence the complete routing scheme requires at least n2/2—o(n2 bits to be stored.

Proof. Let G be an o(n)-random graph. Let F(u) be the local routing function of node u of G, and let d(F(u)) be the number of bits used to store F(u). Let E(G) be the standard encoding of G in n(n − 1)/2 bits as in Definition 6.4.1. We now give another way to describe G using some local routing function F(u):
  • A description of this discussion in O(1) bits.

  • A description of u in exactly logn bits, padded with 0's if needed.

  • A description of the presence or absence of edges between u and the other nodes in V in n − 1 bits.

  • A self-delimiting description of F(u) in d(F(u)) +2 logd(F(u)) bits.

  • The code E(G) with all bits deleted corresponding to edges (v, w) ∈ E for every v and w such that F(u) routes messages to w through the least intermediary node v. This saves at least ½n − o(n) bits, since there are at least ½n − o(n) nodes w such that (u, w) ∉ E by Lemma 6.4.2, and since the diameter of G is 2 by Lemma 6.4.3, there is a shortest path (u, v), (v, w) ∈ E2 for some v. Furthermore, we delete all bits corresponding to the presence or absence of edges between u and the other nodes in V, saving another n − 1 bits. This corresponds to the n − 1 bits for edges connected to u that we added in one connected block above.

In the description we have explicitly added the adjacency pattern of node u that we deleted elswhere. This zero-sum swap is necessary in order to unambiguously identify the adjacency pattern of u to reconstruct G given n, as follows: Reconstruct the bits corresponding to the deleted edges using u and F(u) and subsequently insert them in the appropriate positions of the remnants of E(G). We can do so because these positions can be simply reconstructed in increasing order. In total, this new description has
$$\frac{1}{2}n\left({n - 1} \right) + O\left(1 \right) + O\left({\log n} \right) + d\left({F\left(u \right)} \right) - \frac{1}{2}n + o\left(n \right)$$
bits, which must be at least n(n − 1)/2 − o(n) by Equation 6.3. Hence, d(F(u)) ≥½n− o(n), which proves the theorem. ◻

6.8.3 Average Case

Consider the average cost, taken over all labeled graphs of n nodes, of representing a routing scheme for graphs over n nodes. For a graph G, let T(G) be the number of bits used to store its routing scheme. The average total number of bits to store the routing scheme for routing over labeled graphs on n nodes is \(\sum {T\left(G \right)} /2^{n\left({n - 1} \right)/2},\), with the sum taken over all graphs G on nodes {1,2,…, n}, that is, the uniform average over all the labeled graphs on n nodes.

The results on Kolmogorov random graphs above have the following corollaries. Consider the subset of (3logn)-random graphs within the class of O(logn)-random graphs on n nodes. They constitute a fraction of at least (1 − 1/n3) of the class of all graphs on n nodes. The trivial upper bound on the minimal total number of bits for all routing functions together is O(n2 logn) for shortest-path routing on all graphs on n nodes (or O(n3) for full-information shortest-path routing as in Exercise 6.5.5). Simple computation of the average of the total number of bits used to store the routing scheme over all graphs on n nodes shows that Theorem 6.5.1, Theorem 6.5.2, and Exercise 6.5.2 all hold for the average case.

6.9 Exercises

6.5.1. [19] Show that there exist labeled graphs on n nodes such that each local routing function must be stored in at least ½n log ½nO(n) bits per node (hence the complete routing scheme requires at least (n2/2) log ½nO(n2) bits to be stored).

Comments. Source: H.M. Buhrman, J.H. Hoepman, and P.M.B. Vitányi, SIAM J. Comput, 28:4(1999), 1414–1432. This is also the source for the next four exercises.

6.5.2. [22]
  1. (a)

    Show that routing with any stretch factor > 1 in clogn-random graphs can be done with n − 1 − (c + 3) logn nodes with local routing functions stored in at most log(n + 1) bits per node, and 1 + (c + 3) log n nodes with local routing functions stored in 6n bits per node (hence the complete routing scheme is represented by fewer than (6c + 20)n log n bits).

  2. (b)

    Show that routing with stretch factor 2 in c log n-random graphs can be done using n − 1 nodes with local routing functions stored in at most log log n bits per node and 1 node with its local routing function stored in 6n bits (hence the complete routing scheme is represented by n log log n + 6n bits).

  3. (c)

    Show that routing with stretch factor (c + 3) log n in c log n-random graphs can be done with local routing functions stored in O(1) bits per node (hence the complete routing scheme is represented by O(n) bits).


Comments. Hint: use Lemma 6.4.5 on page 464 and restricted use of tables (Items (a) and (b)) as in the proof of Theorem 6.5.1 and no tables in Item (c).

6.5.3. [31] Prove the following: for shortest-path routing on c log n-random graphs, if nodes know their neighbors and nodes may be relabeled by arbitrary identifiers (which therefore can code information), then with labels of size at most (1 + (c + 3) log n) log n bits the local routing functions can be stored in O(1) bits per node. Hence the complete routing scheme including the label information is represented by (c + 3)n log2 n + nlogn + O(n) bits.

6.5.4. [34] Show that for shortest-path routing in graphs that are o(n)-random, if the neighbors are not known, then the complete routing scheme requires at least n2/32 − o(n2) bits to be stored. This holds also under a slightly weaker model.

6.5.5. [29] In a full-information shortest-path routing scheme, the routing function in u must, for every destination v, return all edges incident to u on shortest paths from u to v. These schemes allow alternative shortest paths to be taken whenever an outgoing link is down. Show that for full-information shortest-path routing on o(n)-random graphs, the local routing function requires n2/4−o(n2) bits for every node (hence the complete routing scheme requires at least n3/4 − o(n3) bits to be stored). This is also the trivial upper bound.

6.5.6. [30] In interval routing on a graph G = (V, E), V = {1,…, n}, each node i has for each incident edge e a (possibly empty) set of pairs of node labels representing disjoint intervals with wraparound. Each pair indicates the initial edge on a shortest path from i to any node in the interval, and for every node ji there is such a pair. We are allowed to permute the labels of graph G to optimize the interval setting.
  1. (a)

    Show that there are graphs such that for each interval routing scheme some incident edge on each of Ω(n) nodes are labeled by Ω(n) intervals.

  2. (b)

    Show that for every d ≥ 3 there are graphs of maximal node degree d such that for each interval routing scheme some incident edge on each of Ω(n) nodes is labeled by Ω(n/logn) intervals.


Comments. Source: E. Kranakis and D. Krizanc, Proc. 13th Symp. Theo-ret. Aspects Comput. Sci., Lect. Notes Comput. Sci., Vol. 1046, Springer-Verlag, 1996, pp. 529–540. Item (b) is improved by C. Gavoile and S. Pérennès [Proc. 15th ACM Symp. Principles Distr. Comput, 1996, pp. 125–133], who showed that for every interval routing scheme, each of Ω(n) edges is labeled by Ω(n) intervals. This shows that interval routing can be worse than straightforward coding of routing tables, which can be done in O(n2 logd) bits total.

6.5.7. Consider routing schemes for n-node graphs G = (V, E), V = {1,…,n}, with maximal node degree d. Choose the most convenient labeling to facilitate compact routing schemes.
  1. (a)

    Show that for every d ≥ 3 there are networks for which any shortest-path routing scheme requires a total of Ω(n2/logn) bits.

  2. (b)

    Same as Item (a) but now with stretch factor < 2 requiring a total ofΩ(n2/log2n)bits.


Comments. Source: E. Kranakis and D. Krizanc, Ibid. Item (a) is improved by C. Gavoile and S. Pérennès [Ibid.] for 3 ≤ d ≤ ϵn (0 < ϵ < 1) to Θ(n2 logd). This is optimal, since straightforward coding of routing tables takes O(n2 logd) bits total.

6.5.8. Consider a computer network consisting of n computers connected in a ring by bidirectional communication channels. The message transmission takes unknown time, but messages do not overtake each other. The computers are anonymous, that is, they do not have unique identities. To be able to discuss them individually we number them 1,…, n. Let x be any string in {0,1} n . At the start of the computation every computer i in the ring owns a copy of x and a bit y i . Define yx if there is an s (0 ≤ sn) such that yi+s mod n = x i for all i (1 ≤ in). The problem is to compute a Boolean function f x : {0,1} n → {0,1} defined by f x (y) = 1 if yx and 0 otherwise. Each computer executes the same algorithm to compute f x and eventually outputs the value f x (y). Show that there is an algorithm to compute f x (∙), with C(x) ≥nO(logn), on an anonymous ring of n computers using O(n log n) bit exchanges for a fraction of at least 1 − 1/n of all 2 n inputs, and hence Θ(nlogn) bit exchanges on average.

Comments. S. Moran and M. Warmuth [SIAM J. Comput, 22:2(1993), 379–399] have shown that to compute nonconstant functions f, the computers need to exchange Ω(nlogn) bits, and that this bound is tight. This creates a gap with the case of computing constant f requiring zero messages. Source: E. Kranakis, D. Krizanc, and F.L. Luccio, pp. 392–401 in: Proc. 13th Symp. Math. Found. Comput. Sci., Lect. Notes Comput. Sci., Vol. 969, Springer-Verlag, 1995.

6.10 Average-Case Analysis of Sorting

For many algorithms, it is very difficult to analyze their average-case complexity. In average-case analysis, the incompressibility method has an advantage over a probabilistic approach. In the latter approach, one deals with expectations or variances over some ensemble of objects changing over the course of the computation. Using Kolmogorov complexity, we can reason about a fixed incompressible individual object. Because it is incompressible, it has all statistical properties with certainty, rather than having them hold with some (high) probability as in a probabilistic analysis. This fact greatly simplifies the resulting analysis.

6.10.1 Heapsort

Heapsort is a widely used sorting algorithm. One reason for its prominence is that its running time is guaranteed to be of order n logn, and it does not require extra memory space. The method was first discovered by J.W.J. Williams [Comm. Assoc. Comp. Mach., 7(1964), 347–348], and subsequently improved by R.W. Floyd. Only recently has one succeeded in giving a precise analysis of its average-case performance. I. Munro has suggested the simple solution using incompressibility presented here.

A ‘heap’ can be visualized as a complete directed binary tree with possibly some rightmost nodes being removed from the deepest level. The tree has n nodes, each of which is labeled with a different key, taken from a linearly ordered domain. The largest key k1 is at the root (on top of the heap), and every other node is labeled with a key that is less than the key of its parent.

Definition 6.6.1

Let keys be elements of N. An array of keys k1, …, k n is a heap if they are partially ordered such that k[j/2]k j for 1 ≤ [j/2\ <jn.

Thus, k1k2, k1k3, k2k4, and so on. We consider in place sorting of n keys in an array A[1..n] without use of additional memory.

Heapsort {Initially A[1..n] contains n keys. After sorting is completed, the keys in A will be ordered as A[1] < A[2] > … < A[n]}

Heapify: {Regard A as a tree: the root is in A[1]; the two children of A[i] are at A[2i] and A[2i + 1], when 2i, 2i + 1 ≤ n. We convert the tree in A to a heap} Repeat for i = [n/2\, [n/2\ − 1,…, 1: {the subtree rooted at A[i] is now almost a heap except for A[i]} push the key, say k, at A[i] down the tree (determine which of the two children of A[i] possesses the greatest key, say k′ in child A[2i + j] with j equal 0 or 1); if k′ > k then put k in A[2i + j] and repeat this process, pushing k′ at A[2i+ j] down the tree until the process reaches a node that does not have a child whose key is greater than the key now at the parent node.

Sort: Repeat for i = n,n − 1,…,2: {A[1..i] contains the remaining heap and A[i + 1..n] contains the already sorted list ki+1, …,k n of largest elements; by definition, the element on top of the heap in A[1] must be k i } switch the key k i in A[1] with the key k in A[i], extending the sorted list to A[i..n]. Rearrange A[1..i − 1] to a heap with the largest element at A[1].

It is well known that the Heapify step can be performed in O(n) time. It is also known that the Sort step takes no more than O(n log n) time. We analyze the precise average-case complexity of the Sort step. There are two ways of rearranging the heap: Williams's method and Floyd's method.

  • Williams's Method: {Initially, A[1] = k}

  • Repeat compare the keys of k's two direct children; if m is the larger of the two then compare k and m; if k < m then switch k and m in A[1..i−1] until k > m.

  • Floyd's Method: {Initially, A[1] is empty} Set j := 1;
    • while A[j] is not a leaf do:

    • if A[2j] > A[2j + 1] then j := 2j

    • elsej :=2j + 1;

  • while k > A[j] do:
    • {back up the tree until the correct position for k} j := [j/2]

  • move keys of A[j] and each of its ancestors one node upward;
    • Set A[j] := k.

The difference between the two methods is as follows. Williams's method goes from the root at the top down the heap. It makes two comparisons with the child nodes and one data movement at every step until the key k reaches its final position. Floyd's method first goes from the root at the top down the heap to a leaf, making only one comparison at every step. Subsequently, it goes from the bottom of the heap up the tree, making one comparison at each step, until it finds the final position for key k. Then it moves the keys, shifting every ancestor of k one step up the tree. The final positions in the two methods are the same; therefore both algorithms make the same number of key movements. Note that in the last step of Floyd's algorithm, one needs to move the keys carefully up the tree, avoiding swaps that would double the number of moves.

The heap is of height logn. If Williams's method uses 2d comparisons, then Floyd's method uses d + 2δ comparisons, where δ = lognd. Intuitively, δ is generally very small, since most elements tend to be near the bottom of the heap. This makes it likely that Floyd's method performs better than Williams's method. We analyze whether this is the case. Assume a uniform probability distribution over the lists of n keys, so that all input lists are equally likely.

Theorem 6.6.1

With probability going to 1 for n → ∞, and on average, Heapsort makes nlogn+O(n) data movements. Williams's method makes 2n log nO(n) comparisons on average. Floyd's method makes nlogn + O(n) comparisons on average.

Proof. Given n keys, there are n! permutations. Hence we can choose a permutation π of n keys such that C(π∣n,A,P) ≥logn!−n,

justified by Theorem 2.2.1, page 117. In fact, a (1 − 1/2 n ) fraction of all permutations of n keys satisfy this. Here A represents the Heapsort algorithms involved and P represents the reconstruction programs used below. Since \(n! \approx n^n e^{ - n} \sqrt {2{\rm{\pi }}n} \) by Stirling's formula, log n! < n log n2n.

Claim 6.6.1

Let h be the heap constructed by the Heapify step with input π that satisfies the last displayed equation. Then,
$$C\left({h\left| {n,A,P} \right.} \right) \ge \log n! - 5n.$$

Proof. Assume the contrary, C(hn,A,P) < logn! − 5n. Then we show how to describe π, using h and n, in fewer than logn! −n bits as follows. We will encode the Heapify process that constructs h from π. At each loop, when we push k = A[i] down the subtree, we record the path that key k traveled: 0 indicates a left branch, 1 means a right branch, 2 means halt. In total, this requires \(\left({n\log 3} \right)\sum {_j j/2^{j + 2} \le 2n\log 3} \) bits. Given the final heap h and the above description of updating paths, we can reverse the procedure of Heapify and reconstruct p. Hence, C(π∣n,A,P) < C(hn,A,P) + 2nlog3 + O(1) < logn! − n, which is a contradiction. (The term 5n above can be reduced by a more careful encoding and calculation.)◻

We give a description of h using the history of the n − 1 heap rearrangements during the Sort step. We need to record, for i := n − 1,…, 2, at the (ni + 1)st round of the Sort step, only the final position where A[i] is inserted into the heap. Both algorithms insert A[i] into the same slot using the same number of data moves, but a different number of comparisons.

We encode such a final position by describing the path from the root to the position. A path can be represented by a sequence s of 0's and 1's, with 0 indicating a left branch and 1 indicating a right branch. Each path i is encoded in self-delimiting form by giving the value δ i = logn-l(s i ) encoded in self-delimiting binary form, followed by the literal binary sequence s i encoding the actual path. This description requires at most
$$l\left({s_i } \right) + 2\log \delta _i $$
bits. Concatenate the descriptions of all these paths into sequence H.

Claim 6.6.2

We can effectively reconstruct heap h from H and n.

Proof. Assume that H is known and the fact that h is a heap on n different keys. We simulate the Sort step in reverse. Initially, A[1..n] contains a sorted list with the least element in A[1].

for i := 2,…, n − 1 do: {now A[1..i − 1] contains the partially constructed heap and A[i..n] contains the remaining sorted list with the least element in A[i]} Put the key of A[i] into A[1], while shifting every key on the (ni)th path in H one position down starting from the root at A[1]. The last key on this path has nowhere to go and is put in the empty slot in A[i].

termination {Array A[1..n] contains heap h}◻

It follows from Claim 6.6.2 that C(h\n,A,P) ≤ l(H) + O(1). Therefore, by Equation 6.7, we have l(H) ≥ logn! − 5nO(1). By the description in Equation 6.8, we therefore have
$$\sum\limits_{i = 1}^n {\left({l\left({s_i } \right) + 2\log \delta _i } \right) = \sum\limits_{i = 1}^n {\left({\left({\log n} \right) - \delta _i + 2\log \delta _i } \right)} \ge \log n! - 5n - O\left(1 \right).} $$

It follows that \(\sum {_{i = 1}^n } \left({\delta _i - 2\log \delta _i } \right) \le 5n.\) This is possible only if \(\sum {_{i = 1}^n } \delta _i = O\left(n \right).\) Therefore, the average path length is at least lognc, for some fixed constant c. In every round of the Sort step the path length equals the number of data moves. The combined total path length is at least n log nnc.

It follows that starting with heap h, Heapsort performs at least nlognO(n) data moves. Trivially, the number of data moves is at most nlogn. Together this shows that Williams's method makes 2nlognO(n) key comparisons, and Floyd's method makes n log n+O(n) key comparisons.

Since a (1 − 1/2 n ) fraction of all permutations π on n keys satisfies C(π∣n, A, P) ≥ logn! − n, these bounds for one such permutation π also hold for all permutations on average.◻

6.10.2 Shellsort

D.L. Shell [Comm. Assoc. Comp. Mach., 2:7(1959), 30–32)] proposed the Shellsort algorithm in 1959. Since then, the question of the average-case complexity of Shellsort has been open. Recently, the first general lower bound for this problem was proven using the incompressibility method.

Shellsort sorts a list of n elements in p passes using a sequence of increments h1,…,h p . In the kth pass the main list is divided into h k separate sublists, called h k -chains, each of length [n/h k ], where the ith sublist consists of the elements at positions j (j mod h k i − 1) of the main list (i = 1,…, h k ). Every sublist is sorted using a straightforward Bubblesort or Insertion sort, and h p = 1 to ensure sortedness of the final list.

In Bubblesort or Insertion sort we go from left to right over the list, comparing every key with its right neighbor and switching them if the left key is larger. At the end, the largest key is in the rightmost position. Then repeat this process with the remaining list, and so on.

The efficiency of the Shellsort method is governed by the number of passes p and the selected increment sequence h1,…,h p . For example, the original logn-pass increment sequence [n/2J], [n/4],…,1 of Shell uses worst case Θ(n2) time. Many increment sequences have been proposed. The elegant method of V.R. Pratt uses all log2 n increments of the form 2 i 3 j < [n/2\ to obtain time O(n log2 n) in the worst case. Moreover, since every pass takes at least n steps, the average-case time complexity using Pratt's increment sequence is Θ(nlog2n). D.E. Knuth proved an average-case time complexity of Θ(n5/3) for the best choice of increments in p = 2 passes; A.C.C. Yao analyzed the average case for p = 3 but did not obtain an analytic form; Yao's analysis was used by S. Janson and D.E. Knuth, who proved an O(n23/15) average-case time-complexity upper bound for particular increments in p = 3 passes. Apart from this, no nontrivial results had been known for the average case.

The idea of the proof is simple. For every incompressible permutation π, encode the moves of Shellsort in the most compressed manner. If the used algorithm does not make a certain number of moves, then one obtains too short an encoding of π. Since most permutations are incompressible, like π, the particular bound for an incompressible π holds on average. The average is taken over the uniform distribution on all permutations of n keys.

Theorem 6.6.2

With probability going to 1 for n → ∞, and on average, p-pass Shellsort takes Ω (pn1+1/p) steps, for every increment sequence.

Proof. Let A be ap-pass Shellsort algorithm with increments (h1, …, h p ), where h k is the increment in the kth pass and h p = 1. Since the running time is at least pn (every key is compared in every pass), the theorem is true for p = Ω(logn). It remains to prove the theorem for p = o(logn). There are n! permutations of n keys. Choose a permutation π on n keys {1,…,n} such that C(π∣ n,A,P) ≥ logn! − n,

where P is a constant-size reconstruction program to be specified later.

For all 1 ≤ in and 1 ≤ kp, let m i k be the distance the ith key moves in the h k -chain containing key i, in pass k. Then,
$$M = \sum\limits_{k = 1}^P {\sum\limits_{i = 1}^n {m_{i,k} } } $$
is precisely the number of data movements made by A to sort π, and therefore is a lower bound on the time complexity T of A.

Claim 6.6.3

Given all the mi,k 's in lexicographic order, we can reconstruct the original permutation π.

Proof. Given mi,k, for i = 1,…, n, and the final permutation of pass k, we can reconstruct the initial permutation of pass k.◻

The lexicographically ordered mi,k 's can be described as a partition of M in nonnegative integer summands. There are
$$D\left(M \right) = \left({\begin{array}{*{20}c} {M + np - 1} \\ {np - 1} \\ \end{array}} \right)$$
distinct partitions of M into np ordered nonnegative integral mi,k's. Since every mi,kn and pn, we have Mn3. Given n, we can first describe p and M self-delimitingly in O(logn) bits, and second describe the partition of M, yielding the lexicographically ordered mi,k 's, in total O(logn) + logD(M) bits.

By Claim 6.6.3, and letting P be the program reconstructing π from this description, given n,p,A, we must have logD(M) + O(logn) ≥C(π∣n,A,P) ≥ logn! − n.

Rewriting log D(M) using the identity in Exercise 1.3.3 on page 10, we can estimate the resulting terms asymptotically for n → ∞ and p = o(logn), yielding M = Ω(pn1+1/p).

The running time T of the algorithm A on permutation π satisfies TM. The number of permutations with C(π∣n, A, P) ≥ logn!−n is at least a (1 − 1/2 n ) fraction of all permutations, which proves the theorem. ◻

Example 6.6.1

The question whether there exists an increment sequence for Shellsort to achieve O(n log n) average performance is still open. Theorem 6.6.2 implies that such an increment sequence, if it exists, must be of length Θ(logn).

Example 6.6.2

The initial idea to prove Theorem 6.6.2 was to simply to describe the mi,k's by standard self-delimiting codes, giving a total length of
$$\sum\limits_{k = 1}^p {\sum\limits_{i = 1}^n {\left({\log m_{i,k} + 2\log \log m_{i,k} + 1} \right)}.} $$
$$\sum\limits_{k = 1}^p {\sum\limits_{i = 1}^n {\left({\log m_{i,k} + 2\log \log m_{i,k} + 1} \right)} \ge C\left({{\rm{\pi }}\left| {n,A,P} \right.} \right) \ge \log n! - n.} $$
By the concavity of the logarithm function, the left-hand side of the above is maximized when all the mi,k's are equal, say m. Therefore, np log m + 2np log log m + np ≥ log n! − n, and since logn! ≥ nlogn2n, we have
$$\begin{array}{l} np\log m + 2np\log \log m + np \ge \log n! - n, \\ {\rm{and}}\,{\rm{since}}\,\log n! \ge n\log n - 2n,{\rm{we}}\,{\rm{have}} \\ m = \Omega \left({\frac{{n^{1/p} }}{{\left({\left({\log n} \right)/p} \right)^2 }}} \right){\rm{and}}\,T \ge pnm \ge \Omega \left({\frac{{pn^{1 + 1/p} }}{{\left({\left({\log n} \right)/p} \right)^2 }}} \right). \\ \end{array}$$

This is the result of Theorem 6.6.2 forp = Θ(logn), but the less optimal code results in a slightly weaker result for p = o(log n).

6.11 Exercises

6.6.1. [25] Consider the following game. Carole chooses a number from {1, 2,…, n}. Paul has to guess the secret number using only “yes/no” questions. Prove the following lower bounds on the number of questions needed for Paul to determine the number: (i) logn if Carole answers every question truthfully; (ii) logn + log logn if Carole is allowed to lie at most once; (iii) \(\log n + \log \sum {_{i \le k} } \left({\begin{array}{*{20}c} n \\ i \\\end{array}} \right)\) if Carole is allowed to lie at most k times; (iv) logn + log logn +k if i Carole is allowed to lie at most k times, but all lies (possibly fewer than k and possibly nonconsecutive) have to occur in k consecutive rounds.

Comments. Simple proofs using the incompressibility method are due to M. Fouz, CS798 Course Report, University of Waterloo, December 2007. The one-lie game was fully analyzed by A. Pelc in [J. Comb. Theory, Ser. A, 44:1(1987), 129–140]. J.H. Spencer generalized this result to the k-lies game in [Theoret. Comput. Sci., 95:2(1992), 307–321]. The interval variant was introduced and analyzed by B. Doerr, J. Lengler, D. Steurer in [Proc. 17th Int. Symp. Algor. Comput, Lect. Notes Comput. Sci., Vol. 4288, Springer-Verlag, Berlin, 2006, 318–327].

6.6.2. [40] In computational biology, evolutionary trees are represented by unrooted unordered binary trees with uniquely labeled leaves and unlabeled internal nodes. Measuring the distance between such trees is useful in biology. A nearest neighbor interchange (nni) operation swaps two subtrees that are separated by an internal edge (u,v), as shown in Figure 6.2.

The two possible nnis on (u,v): swap BC or BD

  1. (a)
    Show that in Figure 6.3 it takes 2 nni moves to convert (i) to (ii).
    FIGURE 6.3.

    The nni distance between (i) and (ii) is two

  2. (b)

    Show that n log n + O (n) nni moves are sufficient to transform a tree of n leaves to any other tree with the same set of leaves.

  3. (c)

    Prove an Ω(nlogn) lower bound for Item (b), using the incompressibility method.


Comments. Item (b) is from [K. Culik II and D. Wood, Inform. Process. Lett, 15(1982), 39–42; M. Li, J.T. Tromp, and L. Zhang, J. Theoret. Biology, 182(1996), 463–467]. The latter paper contains principal references related to the nni metric. Item (c) is by D. Sleator, R.E. Tarjan, and W. Thurston [SIAM J. Discr. Math., 5(1992), 428–450], who proved the Ω(nlogn) lower bound for a more general graph transformation system.

6.6.3. [25] Improve the logn! − 5n bound in Equation 6.7, page 478, by reducing 5n via a better encoding and more precise calculation.

6.6.4. [41] Show that the worst-case time complexity of p-pass Shell-sort of n items is at least Ω(n log2 n/(log log n)2) for every number p of passes and every increment sequence.

Comments. This shows that the best possible average-case time complexity of Shellsort for any number of passes and all increment sequences may be of larger order of magnitude than nlogn. Source: C.G. Plaxton, B. Poonen, T. Suel, Proc. 33rd IEEE Symp. Found. Comput. Sci., pp. 226–235, 1992.

6.6.5. [O48]
  1. (a)

    Prove or disprove that there is a number of passes p and an increment sequence such that Shellsort has average-case time complexity O(log n).

  2. (b)

    Find a better lower bound on average-case time complexity of Shell-sort than Theorem 6.6.2; give a good or optimal upper bound on average-case time complexity of p-pass Shellsort for the best increment sequences.


Comments. See Exercise 6.6.4 and the comment following the proof of Theorem 6.6.2. Source: M. Li and P.M.B. Vitányi, J. Assoc. Comp. Mach., 47:5(2000), 905–911.

6.6.6. [10] Use the idea in the proof for Theorem 6.6.2 to obtain Ω(n2) average-case lower bounds for Bubblesort, Selection sort, and Insertion sort.

6.6.7. [22/O46] Sorting by stacks. The input is a permutation of n integers. These integers, one at a time, pass through a sequence of m first-in-last-out stacks S1,…, S m , from S1 to S m . If an integer k is to be pushed on S i , then this stack can pop some integers from the top down, pushing them on Si+1 in that order, before pushing k into S i . The output sequence from S m gives the final permutation.
  1. (a)

    Show that we can sort integers with logn stacks.

  2. (b)

    Use the incompressibility method to show that ½logn stacks are needed on average.

  3. (c)

    Open: Close the gap between Item (a) and (b).


Comments. The problem was investigated by R.E. Tarjan [J. Assoc. Comp. Mach., 19(1972), 341–346] and D.E. Knuth [The Art of Computer Programming, Vol. 3: Sorting and Searching, 1998 (2nd edition), Section 5.2.4, Exercises 19 and 20]. Item (b), and related studies such as sorting with parallel stacks and queues can be found in [T. Jiang, M. Li and P.M.B. Vitányi, J. Comput. Sci. Tech., 15:5(2000), 402–408].

6.6.8. [36/O39] Consider the following algorithm.

QuickSort(Array π[1..n]): If n = 1 then return π; p := π[1]; π l := (x ∈ π,x < p) in stable order; π r := (x ∈ π, x > p) in stable order; QuickSort(π l ); QuickSort(π R ); π := π L pπ R
  1. (a)

    Use the incompressibility method to show that the average height of a Quicksort tree (its deepest recursion level), or equivalently a binary insertion tree, is O(logn). This also gives an alternative O(nlogn) average-case analysis of Quicksort.

  2. (b)

    Obtain the 4.31107logn upper bound using the incompressibility method.


Comments. Hint: Consider a pivot sequence (p1,p2,…,p k ), where pi+1 is a pivot for one of the subranges defined byp i for all i. The longest such sequence corresponds to the binary search tree height. This sequence can be encoded efficiently. Suppose π has a pivot chain of length clogn, where c is sufficiently large. Let x be the string of length clogn such that x[i] = 1 iff p i occurs in the middle half of its range. Let z be the string of length clogn such that z[i] = 1 iff pi+1 is the pivot for the smaller range defined by p i . Then the number of ones in x is at most c′logn, where \(c' = 1/\log \frac{4}{3},\), and the number of ones in z is at most logn, since otherwise the size of the ranges for the p i will reach 1. Now note that if we are given x and z, we can save one bit for every entry p i in π, since p i begins in 01 or 10 iff x[i] = 1, and z tells us which of p i 's subpivots is pi+1. Thus, given x and z, we save clogn bits from the encoding of π. Now π can be recursively encoded by \(\log \left({A!} \right) = \log \left({\begin{array}{*{20}c} A \\ B \\\end{array}} \right) + \log \left({B!} \right) + \log \left({\left({A - B} \right)!} \right)\) while compressing the pivots along the pivot sequence. Source: B. Lucier, T. Jiang, and M. Li, Inform. Process. Lett., 103:2(2007), 45–51. A tight bound for this problem by probabilistic analysis is given in [L. Devroye, J. Assoc. Comp. Mach. 33:3(1986), 480–498].

6.6.9. [22] Consider two variants of p-pass Shellsort. In each pass, instead of fully sorting every sublist, we make only one pass of Bubblesort, or two such passes in opposite directions, for every sublist. In both cases the sequence may stay unsorted, even if the last increment is 1. A final phase, a straight insertion sort, is required to guaranty a fully sorted list.
  1. (a)

    Prove an Ω(n2/2p) lower bound on the average-case running time for the one-pass variant of p-pass Shellsort.

  2. (b)

    Prove an Ω(n2/4p) lower bound on the average-case running time for the two-pass variant of p-pass Shellsort.


Comments. The one-pass variant of Shellsort is called Dobosiewicz sort by D.E. Knuth [The Art of Computer Programming, Vol. 3, Sorting and Searching, Addison-Wesley, 1973, Exercise, page 105]. Source: W. Dobosiewicz, Inform. Process. Lett. 11:1(1980), 5–6. The two-pass variant, proposed by J. Incerpi and R. Sedgewick [Inform. Process. Lett. 26:1(1980), 37–43], is called Shakersort. Solutions for both (a) and (b) were given by B. Brejová [Inform. Process. Lett., 79:5(2001), 223–227].

6.12 Longest Common Subsequence

Certain problems concerning subsequences and supersequences of a given set of sequences arise naturally in quite practical situations. For example, in molecular biology, the longest common subsequence of some DNA sequences is commonly used as a measure of similarity of these sequences. Other applications of longest common subsequences include data compression and syntactic pattern recognition.

6.12.1 Definition 6.7.1

If s = s1s m and t = t1t n are two sequences, then s is a subsequence of t, and equivalently, t is a supersequence of s, if for some sequence of indices i1 < … < i m , we have s j = \(s_j = t_{i_j } \) for all j (1 ≥ jm). Given a finite set of sequences S, a shortest common supersequence (SCS) of S is a shortest sequence s such that each sequence in S is a subsequence of s. A longest common subsequence (LCS) of S is a longest sequence s such that each sequence in S is a supersequence of s.

It is well known that the SCS and LCS problems are NP-hard. In the worst case, the SCS and LCS problems cannot even be efficiently approximated unless P = NP. For example, the following is known for the LCS problem. If there is a polynomial-time algorithm that on some input sequences always finds a common subsequence of length c > 0 times the length of the longest common subsequence, then P = NP. This holds also for the problem as stated in the theorem below. However, many simple heuristic algorithms for SCS and LCS turn out to work well in practice. An incompressibility argument shows that indeed, these algorithms perform well on average.

6.12.2 Definition 6.7.2

Consider LCS problems on an alphabet Σ = {a1,…,ak}. Let lcs(S) denote the length of an LCS of a set S ⊆ Σ* of sequences.

Algorithm Long-Run
  • Step 1. Determine the greatest m such that a m is a common subsequence of all input sequences, for some a ∈ Σ.

  • Step 2. Output a m as a common subsequence.

6.12.3 Theorem 6.7.1

Assume the notation above. Let S ⊆ Σ* be a set of n sequences each of length n, and let ⊆ϵ > 0 be a constant. Algorithm Long-Run outputs a common subsequence of S of length lcs(S)-O(lcs(S)1/2+ϵ) for a fraction of least 1−1/n2 of all inputs, and hence on average.

Proof. Assume the notation above. Fix a string x of length n2 over Σ with
$$C\left(x \right) \ge \left({n^2 - 2\log n} \right)\log k.$$

Divide x into n equal-length segments x1,…,x n . Choose the set S in the theorem as S = {x1,…,x n }.

The following claim is a corollary of the proof of Theorem 2.6.1 on page 170, counting each letter as a block of size log k, assuming that k is a power of 2.

6.12.4 Claim 6.7.1

Let a ∈ Σ,x i S, and let ϵ > 0 be a constant. Denote the number of occurrences of a in x i by m. If \mn/k\ > n1/2+ϵ, then there is a constant δ > 0 such that
$$C\left({x_i \left| k \right.} \right) \le \left({n - \delta n^{2 \in } } \right)\log k.$$
A direct proof of this claim is also easy. There are only \(D = \left({\begin{array}{*{20}c} n \\ m \\\end{array}} \right)\left({k - 1} \right)^{n - m} \)(k− 1) n−m strings of length n with m occurrences of a. Therefore, one can specify x i by n, k, m and its index j, with l(j) = log D, in this ensemble. An elementary estimate by Stirling's formula yields, for some δ > 0,
$$\log \left({\begin{array}{*{20}c} n \\ m \\ \end{array}} \right)\left({k - 1} \right)^{n - m} \le \left({n - \delta n^{2 \in } } \right)\log k.$$

6.12.5 Claim 6.7.2

$${\rm{lcs}}\left(S \right) < \frac{n}{k} + n^{1/2 + \in }.$$

Proof. Let s be an LCS of S. Then l(s) = lcs(S) ≥ n by definition. Assume, by way of contradiction, that l(s) is greater than claimed in the lemma. We give a short description of x, for some fixed δ > 0, by saving nδ logk bits on the description of every x i through the use of s.

Let s = s1s2s p , with p = l(s). Fix an x i . We will do another encoding of x i . We align s with the corresponding letters in x i as far to the left as possible, and rewrite
$$x_i = \alpha _1 s_1 \alpha _2 s_2 \ldots \alpha _p s_p z.$$

Here α1 is the longest prefix of x i containing no s1; α2 is the longest substring of x i starting from s1 containing no s2; and so on. The string z is the remaining part of x i after s n . In this way, α j does not contain an occurrence of letter s j , for j = 1,…,p. That is, every α j contains only letters in Σ−s j .

Then x i can be considerably compressed with the help of s. Divide x i = yz such that the prefix y is
$$y = \alpha _1 s_1 \alpha _2 s_2...\alpha _p s_p.$$

From s we can reconstruct which k−1 letters from Σ appear in α i , for every i. We map y to an equally long string y′ as follows: For i = 1,…, p, change s i to a k , in y. Moreover, change each occurrence of a k in α j to the letter s j . We can map y′ back to y, using s, by reversing this process (because the original α j did not contain an occurrence of s j ).

The letter a k occurs at least (n/k) + n1/2+ϵ times in y′, since l(s) is at least this long. Then, by Claim 6.7.1, for some constant δ > 0, we have
$$C\left({y'\left| k \right.} \right) \le \left({l\left({y'} \right) - \delta n^{2 \in } } \right)\log k.$$
From y′, s, z, k we can reconstruct x i . (We have defined x i = yz.) Giving also the lengths of y′, s, z in self-delimiting format in O(log n) bits, we can describe x i , given k and s, by the number of bits in the right side of the equation below (using l(y′) + l(z) ≥ n):
$$C\left({x_i \left| {k,s} \right.} \right) \le \left({n - \delta n^{2 \in } } \right)\log k + O\left({\log n} \right).$$
We repeat this for every x i . In total, we save more than Ω(n1+2ϵlogk) bits to encode x. Thus,
$$\frac{{C\left({x\left| k \right.} \right)}}{{\log k}} \le n^2 - \Omega \left({n^{1 + 2 \in } } \right) + l\left(s \right) + O\left({n\log n} \right) < n^2 - 2\log n.$$

This contradicts the incompressibility of x asserted in Equation 6.9. ◻

It follows from Claim 6.7.1 and Equation 6.9, by repeating the argument following Equation 6.10 in the proof of Claim 6.7.2, that for some ϵ > 0, each a ∈ Σ occurs in each x1,…,x n at least (n/k)—O(n1/2+ϵ) times. This means that a m with
$$l\left(m \right) = \frac{n}{k} - O\left({n^{{1 \mathord{\left/ {\vphantom {1 {2 + \varepsilon }}} \right. } {2 + \varepsilon }}} } \right)$$
is a common subsequence of x1, …, x n . By Claim 6.7.2, lcs(S)—l(m) = O(n1/2+ϵ).

Altogether, we have shown that the statement in the theorem is true for this particular input x1, …, x n . The fraction of strings of length n2 satisfying the theorem is at least 1—1/n2, since that many strings satisfy Equation 6.9. The theorem follows by taking the average. ◻

6.13 Exercises

6.7.1. [35/O41]
  1. (a)

    Prove that the expected length of the longest common subsequence of two random binary sequences of length n is bounded above by 0.867n.

  2. (b)

    Open: Obtain tight bounds on expected length of the longest common subsequence of two random binary sequences of length n.


Comments. Hint: use the same encoding scheme as in Section 6.7 and count the number of encodings. The number 0.867 is roughly the root of the equation x−2x logx−2(1−x) log(1−x) = 2. Source: T. Jiang, M. Li, and P.M.B. Vitányi, Comput. J., 42:4(1999), 287–293. R.A. Baeza-Yates, R. Gavaldá, G. Navarro, and R. Scheihing [Theor. Comput. Syst., 32:4(1999), 435–452] generalized the above analysis to alphabet size k > 2, and improved the constant to 0.860. This bound was first proved by V. Chvátal and D. Sankoff in [J. Appl. Probab., 12(1975), 306–315]. The current best lower and upper bounds are 0.7739n and 0.8376n, respectively, due to V. Dančik and M. Paterson [Random Struct. Alg., 6(1995), 449–458; Proc. 19th Symp. Math. Found. Comput. Sci., 1994, pp. 127–142].

6.7.2. [39] Consider the SCS problem defined in Section 6.7. Prove by incompressibility the following: Let S ⊆ Σ* be a set of n sequences of length n, and let \(\delta = \sqrt 2 /2 \approx 0.707.\) Let scs(S) be the length of an SCS of S. The algorithm Majority-Merge below produces a common supersequence of length scs(S) + O(scs(S)δ) on the average.

Algorithm Majority-Merge {Input: n sequences, each of length n}
  • Step 1. Set supersequence s := ϵ. {ϵ is the null string}

  • Step 2. {Let the letters a form a majority among the leftmost letters of the remaining sequences} Set s := sa and delete the front a from these sequences. Repeat this step until no sequences are left.

  • Step 3. Output s.

Comments. Source: T. Jiang and M. Li, SIAM J. Comput, 24:5(1995), 1122–1139. Part of the proof was from [D. Foulser, M. Li, and Q. Yang, Artificial Intelligence, 57(1992), 143–181].

6.14 Formal Language Theory

Part of formal language theory consists in establishing a hierarchy of language families. The main division is the Chomsky hierarchy, with regular languages, context-free languages, context-sensitive languages, and recursively enumerable languages.

A ‘pumping’ lemma (for regular languages) shows that some languages are not regular, but often does not decide which languages are regular and which languages are not. There are many different pumping lemmas, each of them appropriate for limited use. Therefore, some effort has been made to present pumping lemmas that are exhaustive, in the sense that they characterize the regular languages [J. Jaffe, SIGACT News, 10:2(1978), 48–49; D. Stanat and S. Weiss, SIGACT News, 14:1(1982), 36–37; A. Ehrenfeucht, R. Parikh, and G. Rozenberg, SIAM J. Comput., 10(1981), 536–541]. These pumping lemmas are complicated and hard to use, while the last reference uses Ramsey theory. Using incompress-ibility we find a characterization of the regular languages that makes our intuition about the finite stateness of these languages rigorous and that is easy to apply.

6.14.1 Definition 6.8.1

Let Σ be a finite nonempty alphabet, and let Q be a (possibly infinite) nonempty set of states. A transition function is a function δ : Σ×QQ. We extend δ to δ′ on Σ* by δ′(ϵ, q) = q and
$$\delta '\left({a_1 \ldots a_n,q} \right) = \delta \left({a_n,\delta '\left({a_1 \ldots a_{n - 1},q} \right)} \right).$$

Clearly, if δ′ is not one-to-one, then the automaton forgets because some x and y from Σ* drive δ′ into the same memory state. An automaton A is a quintuple (Σ, Q, δ, q0, Q f ), where everything is as above, and q0Q is a distinguished initial state and Q f Q is a set of final states. We call A a finite automaton (FA) if Q is finite.

An alternative way of looking at it is as follows: We denote ‘indis-tinguishability’ of a pair of histories x, y ∈ Σ* by x~y, defined as δ′(x, q0) = δ′(y, q0). ‘Indistinguishability’ of strings is reflexive, symmetric, transitive, and right-invariant (δ′(xz, q0) = δ′(yz, q0) for all z). Thus, ‘indistinguishability’ is a right-invariant equivalence relation on Σ*. It is a simple matter to ascertain this formally.

6.14.2 Definition 6.8.2

The language accepted by automaton A as above is the set L = {x : δ′(x,q0) ∈ Q f }. A regular language is a language accepted by a finite automaton.

It is a straightforward exercise to verify from the definitions the following fact (which will be used later):

6.14.3 Theorem 6.8.1

(Myhill, Nerode) The following statements are equivalent.
  1. (i)

    L ⊆ Σ* is accepted by some finite automaton.

  2. (ii)

    L is the union of equivalence classes of a right-invariant equivalence relation of finite index on Σ*.

  3. (iii)

    For all x, y G Σ* define right-invariant equivalence x~y by the following: for all z ∈ Σ* we have xz∈L iff yz∈ L. Then the number of ~ equivalence classes is finite.


Subsequently, closure of finite automaton languages under complement, union, and intersection follows by simple construction of the appropriate δ functions from given ones. Details can be found in any textbook on the subject.

6.14.4 Example 6.8.1

Consider the language {0 k 1 k : k ≥ 1}. If it were regular, then the state q of the accepting finite automaton A, subsequent to processing 0 k , together with A, is a description of k. Namely, by running A, initialized in state q, on input consisting of only 1's, the first time A enters an accepting state is after precisely k consecutive 1's. The size of the description of A and q is bounded by a constant, say c, that is independent of k. Altogether, it follows that C(k) ≤ c+ O(1). But choosing k with C(k) ≥ log k we obtain a contradiction for all large enough k. We generalize this observation in the lemma below.

6.14.5 Lemma 6.8.1

(KC-regularity) Let L ⊆ Σ* be regular, L x = {y : xy ∈ L}. There is a constant c such that for every x, if y is the nth string in L x , then C(y) ≤ C(n) + c.

Proof. Let L be a regular language. A string y such that xyL for some x can be described by
  • this discussion, and a description of the automaton that accepts L;

  • the state of the automaton after processing x, and the number n.

  • The first item requires O(1) bits. Thus C(y) ≤ C(n) + O(1). ◻

6.14.6 Example 6.8.2

Prove that {1 p : p is prime} is not regular. Consider the string xy = 1 p with p the (k + 1)th prime. Set x = 1 p′ , with p′ the kth prime. Then y = 1pp′, and y is the lexicographic first element in L x . Hence, by Lemma 6.8.1, C(pp′) = O(1). But the difference between two consecutive primes grows unboundedly. Since there are only O(1) descriptions of length O(1), we have a contradiction.

6.14.7 Example 6.8.3

Prove that L = {xx R w : x,w ∈ {0,1}* - {ϵ}} is not regular (if x = x1.…x m , then xR = x m x1). Set x = (01) m , where C(m) ≥ log m. Then the lexicographically first word in L x is y = (10) m 0. Hence, C(y) = Ω(log m), contradicting the KC-regularity lemma.

6.14.8 Example 6.8.4

Prove that L = {0 i 1 j : gcd(i,j) = 1} is not regular. Set x = 0(p−1)!1, where p > 3 is a prime, l(p) = n, and C(p) ≥ log n—log log n. Then the lexicographically first word in L x is 1p−1, contradicting the KC-regularity lemma.

6.14.9 Example 6.8.5

Prove that {p : p is the standard binary representation of a prime} is not regular. Suppose the contrary, and p i denotes the ith prime, i ≥ 1. Consider the least binary p m = uv (= u2l(v) + v), with u =п i <kpi and v not in {0}*{1}. Such a prime p m exists, since every interval [n,n + n11/20] of the natural numbers contains a prime [D. Heath-Brown and H. Iwaniec, Invent. Math., 55(1979), 49–69].

Consider p m now as an integer, p m = 2l(v)?i<kpi + v. Since the integer v > 1, and v is not divided by any prime less than p k (because p m is prime), the binary length l(v) is at least l(p k ). Because p k goes to infinity with k, the value C(v) ≥ C(l(v)) also goes to infinity with k. But since v is the lexicographic first suffix, with integer v > 1 such that uvL, we have C(v) = O(1) by the KC-regularity lemma, which is a contradiction.

Characterizations (such as the Myhill-Nerode theorem, Theorem 6.8.1) of regular languages seem to be practically useful only to show regularity. The need for pumping lemmas stems from the fact that characterizations tend to be very hard to use to show nonregularity. In contrast, the compressibility characterization below is useful for both purposes.

6.14.10 Definition 6.8.3

Enumerate Σ* = {y1, y2,…} with y i the ith element in the total order. For L ⊆ Σ* and x ∈ Σ*, let χ = χ1χ2 … be the characteristic sequence ofL x = {y:xyL}, defined by χ i = 1 if xy i L, and χ i = 0 otherwise. We denote χ1 … χ n byχ1:n.

6.14.11 Theorem 6.8.2

(Regular KC-characterization) There is a constant c L depending only on L⊆Σ* such that the following statements are equivalent:
  1. (i)

    L is regular;

  2. (ii)

    for all x ∈ Σ*, for all n, C1:n|n) ≤ c L ;

  3. (iii)

    for all x ∈ Σ*, for all n, C1:n) ≤ C(n) + c L ;

  4. (iv)

    for all x ∈ Σ*, for all n, C1:n) ≤ log n + c L .


Proof. (i) → (ii). By similar proof as for the KC-regularity lemma.

(ii) →(iii) → (iv). Trivial.

(iv) →(i): By (iv) and Claim 6.8.1 below, there are only finitely many distinct χ's associated with the x's in Σ*. Define the right-invariant equivalence relation ~ by x ~ x′ if χ = χ′. This relation induces a partition of Σ* into equivalence classes [x] = {y : y ~ x}. Since there is a one-to-one correspondence between the [x]'s and the χ's, and there are only finitely many distinct χ's, there are also only finitely many [x]'s. This implies that L is regular by the Myhill-Nerode theorem: define a finite automaton using one state for each equivalent class, and define the transition function accordingly. The proof of the theorem is finished, apart from proving Claim 6.8.1.

6.14.12 Claim 6.8.1

for each constant c there are only finitely many sequences ω ∈ {0,1}∞ such that for all n, we have C(ω1:n) ≤ log n + c.

This claim is a weaker version of Item (e) of Exercise 2.3.4, page 131, which recalls that D.W. Loveland in [Inform. Contr., 15(1969), 510–526] credits the following result to A.R. Meyer: For each constant c there are only finitely many ω ∈ {0,1}∞ with C(ω1:n|n) ≥ c for all n and each such ω is a recursive real. G.J. Chaitin [Theoret. Comput. Sci., 2(1976), 45–48] improves the condition first to C(ω1:n) ≥ C(n) + c, and then further to C1:n) ≥ logn + c. We provide an alternative and simpler proof, which is sufficient for our purpose, avoiding establishing that the ω's are recursive reals.

Proof. Let c be a positive constant, and let
$$\begin{array}{l} A_n = \left\{ {x \in \left\{ {0,1} \right\}^n :C\left(x \right) \le \log n + c} \right\}, \\ A = \left\{ {\omega \in \left\{ {0,1} \right\}^\infty :\forall _{n \in N} \left[ {C\left({\omega _{1:n} } \right) \le \log n + c} \right]} \right\}. \\ \end{array}$$

If the cardinality d(A n ) of A n dips below a fixed constant ć for infinitely many n, then | is an upper bound on d(A). This is because it is an upper bound on the cardinality of the set of prefixes of length n of the elements in A for all n.

Fix any l ∈ N. Choose a binary string y of length 2l + c + 1 satisfying
$$C\left(y \right) \ge 2l + c + 1.$$
Choose i maximal such that for the division of y into mn with l(m) = i we have
$$m \le d\left({A_n } \right).$$
(This holds at least for i = 0 = m.) Define similarly a division y = sr with l(s) = i + 1. By maximality of i, we have s > d(A r ). From the easily proven s ≤ 2m + 1, it then follows that
$$d\left({A_r } \right) \le 2m.$$
We prove l(r) ≥ l. Since by Equations 6.14 and 6.12 we have
$$m \le d\left({A_n } \right) \le 2^c n,$$
it follows that l(m) ≤ l(n) + c. Therefore,
$$2l + c + 1 = l\left(y \right) = l\left(n \right) + l\left(m \right) \le 2l\left(n \right) + c,$$

which implies that l(n) > l. Consequently, l(r) = l(n)—1 ≥ l.

We prove d(A r ) = O(1). By dovetailing the computations of the reference universal machine U (Theorem 2.1.1, page 105) for all programs p with l(p) ≤ log n + c, we can enumerate all elements of A n . We can reconstruct y from the mth element, say y0, of this enumeration. Namely, from y0 we reconstruct n, since l(y0) = n, and we obtain m by enumerating A n until y0 is generated. By concatenation we obtain y = mn. Therefore,
$$C\left(y \right) \le C\left({y_0 } \right) + O\left(1 \right) \le \log n + c + O\left(1 \right).$$
From Equation 6.13 we have
$$C\left(y \right) \ge \log n + \log m.$$
Combining Equations 6.16 and 6.17, it follows that logmc + O(1). Therefore, by Equation 6.15,
$$d\left({A_r } \right) \le 2^{c + O\left(1 \right)}.$$

Here, c is a fixed constant independent of n and m. Since l(r) ≥ l and we can choose l arbitrarily, d(A r ) ≤ c0 for a fixed constant c0 and infinitely many r, which implies d(A) ≤ c0, and hence the claim. ◻ ◻

The KC-regularity lemma may be viewed as a corollary of the KC-characterization theorem. If L is regular, then trivially L x is regular. It follows immediately that there are only finitely many associated χ's, and each can be specified in at most c bits, c a constant depending only on L. If y is, say, the mth string in L x , then we can specify y as the string corresponding to the mth 1 in χ, using only C(m) + O(1) bits to specify y (absorbing c in the O(1) term). Hence, C(y) ≤ C(m) + O(1).

6.15 Exercises

6.8.1. [10] The KC-regularity lemma can be generalized in several ways. Prove the following version. Let L be regular and L x = {y :xy∈L}. Let ϕ be a partial recursive function depending only on L that enumerates strings in Σ*. For each x, if y is the nth string in the complement of L x enumerated by ϕ, then C(y) ≤ C(n) + c, with c a constant depending only on L. Use this generalization to give an alternative proof of Example 6.8.4.

Comments. Source: M. Li and P. Vitányi, SIAM J. Comput, 24:2(1995), 398–410.

6.8.2. [10] Prove that {0 n 1 m : m > n} is not regular.

6.8.3. [18] Prove that L = {x#y : x appears (possibly nonconsecu-tively) in y} is not regular.

6.8.4. [20] Prove that L = {x#y : at least half of x is a substring in y} is not regular.

6.8.5. [20] Prove that L = {x#y#z : xy = z} is not regular.

6.8.6. [37] A DCFL language is a language that is accepted by a deterministic pushdown automaton.
  1. (a)

    Show that {xx R : x ∈ Σ*} and {xx : x ∈ Σ*} are not DCFL languages, using an incompressibility argument.

  2. (b)
    Similar to Lemma 6.8.1, the following is a criterion separating DCFL from CFL. Prove it. Let L ⊆ Σ* be a DCFL and c a constant. Let x and y be fixed finite words over Σandωa recursive sequence over Σ. Let u be a suffix ofyyyx,va prefix of ω, and w ∈ Σ∊ such that
    1. 1.

      v can be described in c bits given L u in lexicographic order;

    2. 2.

      w can be described in c bits given L uv in lexicographic order; and

    3. 3.

      C(v) ≥2loglogl(u).


Then there is a constant c′ depending only on L, c, x, y, ω such that C(w) ≤ c′.

(c) Use (b) to prove (a).

Comments. Source: M. Li and P. Vitanyi, SIAM J. Comput, 24:2(1995), 398–410. In this paper, an incompressibility criterion more general than Item (b) is given for separating DCFL from CFL. See also [S. Yu, Inform. Process. Lett., 31(1989), 47–51] and [M.A. Harrison, Introduction to Formal Language Theory, Addison-Wesley, 1978] for basics of formal language theory and traditional approaches to this problem such as iteration lemmas.

6.8.7. [35] We have characterized the regular languages using Kol-mogorov complexity. It is immediately obvious how to characterize recursive languages in terms of Kolmogorov complexity. If L ⊆ Σ* and Σ* = {v1,v2,…} is an effective enumeration, then we define the characteristic sequence χ = χ1 χ2 … of L by χ i = 1 if v i L and χ i = 0 otherwise. A language L is recursive if χ is a recursive sequence.
  1. (a)

    If a set L ⊆ Σ* is recursive then there exists a constant c L (depending only on L) such that for all n, we have C1:n|n) < c L .

  2. (b)

    If L is recursively enumerable, then there is a constant c L such that for all n, we have C1:n|n) ≤ log n + c L .

  3. (c)

    There exists a recursively enumerable set L such that C1:n) > log n, for all n.


Comments. Item (a) is straightforward. Its converse is hard: see the text preceding the proof of Claim 6.8.1 on page 493. This converse is given by Item (e) of Exercise 2.3.4 on page 131. Items (b) and (c) are Barzdins's lemma, Theorem 2.7.2, restated. It quantitatively characterizes all recursively enumerable languages in terms of Kolmogorov complexity. Hint for Item (c): Exercise 2.3.4. With L as in Item (c), Σ*—L also satisfies Item (b), so Item (b) cannot be extended to a Kolmogorov complexity characterization of recursively enumerable sets.

6.8.8. [23] Assume the terminology in Exercise 6.8.7. Consider χ defined in the proof for Item (ii) of Barzdins's lemma, Theorem 2.7.2. Essentially, χ i = 1 if the ith program started on the ith input string halts and outputs 0, and χ i = 0 otherwise. Let A be the language with χ as its characteristic sequence.
  1. (a)

    Show that A is a recursively enumerable set and its characteristic sequence satisfies C1:n) — logn, for all n.

  2. (b)
    Let χ be as in Item (a). Define a sequence h by
    $$h = X_1 0^2 X_2 0^{2^2 } \cdots X_i 0^{2^i } X_{i + 1} \ldots \,.$$

Prove that C(h1:n) = O(C(n)) + ⊖(log logn). Therefore, if h is the characteristic sequence of a set B, then B is not recursive, but more sparsely nonrecursive, as is A.

Comments. Item (a) follows from the proof of Barzdins's lemma, Theorem 2.7.2. Source: J.M. Barzdins, Soviet. Math. Dokl, 9(1968), 1251–1254; D.W. Loveland, Proc. 1st ACM Symp. Theory Comput, 1969, pp. 61–66.

6.8.9. [19] The probability that the universal prefix machine U halts on self-delimiting binary input p, randomly supplied by tosses of a fair coin, is Ω (0 < Ω < 1). Let v1, v2, … be an effective enumeration without repetitions of Σ*. Define L ⊆ Σ* such that v i L iff Ω i = 1. Section 3.6.2 implies that K1:n) > n for all but finitely many n. Show that L and its complement are not recursively enumerable.

Comments. It can be proved that L\(L \in \Delta _2^0 - \left({\sum {_1^0 \cup \prod _1^0 } } \right),\) , in the arithmetic hierarchy. See Section 3.6.2, page 225, and Exercise 1.7.21, page 46.

6.16 Online CFL Recognition

The incompressibility proof below demonstrates a lower bound on the time for language recognition by a multitape Turing machine, as shown in Figure 6.4. A multitape Turing machine recognizes a language online if before reading each successive input symbol, it decides whether the partial input string scanned so far belongs to the language.

Multitape Turing machine

A context-free language is linear if it is generated by a linear context-free grammar in which no production rule contains more than one nonterminal symbol on the right-hand side. The known upper bound on the time required for online recognition of a linear context-free language by a multitape Turing machine is O(n2), even if only one work tape is available. We prove a corresponding Ω(n2/ logn) lower bound. Let \(x_i^R \) denote x i written in reverse, and let y,x1,…,x k ∈ {0,1}*. Define a language L as
$$L = \left\{ {y\# x_1 @x_2 @ \ldots @x_k :{\rm{for}}\,{\rm{some}}\,i,\,x_i^R \,{\rm{is}}\,{\rm{a}}\,{\rm{substring}}\,{\rm{of}}\,y} \right\}.$$

The language L is linear context-free, since it is generated by the following linear grammar, with starting symbol S: SS1|S@|S0|S1; S1 → 0S1|1S1|S2; S2 → 0S20|1S21|S3@; S3S30|S31|S3@|S4#; S4 → 0S4|1S4|ϵ.

6.16.1 Theorem 6.9.1

A multitape Turing machine that online recognizes L requires time Ω(n2 /logn).

Proof. Assume that a multitape Turing machine T accepts L online in o(n2/logn) steps. Choose y such that C(y) ≤ l(y) = n. Using y, we will construct a hard input of length O(n) for T. The idea of the proof is to construct an input
$$y\# x_1 @ \ldots @x_k @$$

such that no x l is a reverse of a substring of y and yet each xl is hard enough to make T use ϵn steps, for some ϵ > 0 not depending on n. If k = Ω(n/logn), then T will be forced to take Ω(n2/logn) steps. Our task is to prove the existence of such xl's. We need two lemmas:

6.16.2 Lemma 6.9.1

Let n = l(x), and let p be a program described in the proof below. Assume that C(x|n,p) ≥ n. Then no substring of length longer than 2 logn occurs, possibly overlapping, in x more than once.

Proof. Let x = uvw, with v of length greater than 2logn occurring exactly twice in uv. Let this discussion be formulated in terms of a program p that reconstructs x from the description below using the value of n (given for free). To describe x, given p and n, we need only to concatenate the following information:
  • the locations of the start bits of the two occurrences of v in uv using logn(n−1)bits;

  • the literal word uw, using exactly l(uw) bits.

Altogether, this description requires n−2 log n + log n(n− 1) bits. Since C(x|n,p) is the shortest such description, we have C(x|n,p) < n. This contradicts the assumption in the lemma. ◻

6.16.3 Lemma 6.9.2

If a string has no repetition of length m, then it is uniquely determined by the set of its substrings of length m + 1; that is, it is the unique string with no repetition of length m and with precisely this set of substrings of length m + 1.

Proof. Let S be the set of substrings of x of length m+1. Let a, b ∈ {0,1}, and u,v,w∈ {0,1}*. The prefix of x of length m+1 corresponds uniquely to the uaS such that for no b is bu in S. For any prefix vw of x with l(w) = m, there is a unique b such that wbS. Hence, the unique prefix of length l(vw) + 1 is vwb. The lemma follows by induction. ◻

We continue to prove the theorem. By Lemmas 6.9.1 and 6.9.2 we let m = 3logn, so that y is uniquely determined by its set of substrings of length m. For i = 1,…, k, assume inductively that x1,…, xi−1, each of length m, have been chosen so that the input prefix y#x1@ ⋯ @xi−1@ does not yet belong to L, and T spends at least ϵn steps on each x j @ block for j<i.

We claim that for each i (1 ≤ i ≤ k), there is an x i of length m that is not a reverse substring of y such that appending x i @ to the input requires at least t = ϵn additional steps by T, where ϵ > 0 does not depend on n. Setting k = n/m, this proves the theorem.

Assume, by way of contradiction, that this is not the case. We devise a short description of all the length-m substrings of y, and hence, by Lemma 6.9.2, of y. Simulate T with input y#x1@ ⋯ @x i −1@, and record the following information at the time t0 when T reads the last @ sign:
  • this discussion;

  • the work tape contents within distance t = ϵn of the tape heads;

  • the specification of T, length n, current state of T, and locations of T's heads.

With this information, one can easily search for all x i 's such that l(x i ) = m and \(x_i^R \) is a substring of y as follows: Simulate T from time t0, using the above information and with input suffix x i @, for t steps. By assumption, if T accepts or uses more than t steps, then x i is a reverse substring of y. If ϵ is sufficiently small and n is large, then all the above information adds up to fewer than n bits, a contradiction. ◻

6.17 Exercises

6.9.1. [28] A k-head deterministic finite automaton, abbreviated k-DFA, is similar to a deterministic finite automaton except that it has k, rather than one, one-way read-only input heads. In each step, depending on the current state and the k symbols read by the k heads, the machine changes its state and moves some of its heads one step to the right. It stops when all heads reach the end of input, and at this time it accepts the input if it is in a final state. Use the incompressibility method to show that for each k ≥ 1 there is a language L that is accepted by a (k + 1)-DFA but is not accepted by any k-DFA.

Comments. Hint: use the language
$$L_b = \left\{ {w_1 \# \cdots \# w_b @w_b \# \cdots \# w_1 :w_i \in \left\{ {0,1} \right\} * } \right\}$$

with \(b = \left({\begin{array}{*{20}c} k \\ 2 \\ \end{array}} \right) + 1.\). Intuitively, when w i 's are all random, for each pair of wi's we must have two heads matching them concurrently. But a k-DFA can match only \(\left({\begin{array}{*{20}c} k \\ 2 \\ \end{array}} \right)\) pairs. This result was first conjectured in 1965 by A. Rosenberg [IBM J. Res. Develop., 10(1966), 388–394]. The case k = 2 was settled by H. Sudborough [Inform. Contr., 30(1976), 1–20] and O.H. Ibarra and C.E. Kim [Acta Informatica, 4(1975), 193–200]. The case k > 2 took a decade to settle [A.C.C. Yao and R. Rivest, J. Assoc. Comp. Mach., 25(1978), 337–340; C.G. Nelson, Technical Report, 14–76(1976), Harvard University]. These proofs use more complicated counting arguments. The proof by Kolmogorov complexity is folklore; it can be found in Section 6.4.3 of the first edition of this book.

6.9.2. [40/O45] Refer to Exercise 6.9.1 for the definition of k-DFAs, prove the following. Let L = {x#y : x is a substring of y}.
  1. (a)

    No 2-DFA can do string-matching, that is, no 2-DFA accepts L.

  2. (b)

    No 3-DFA accepts L.

  3. (c)

    No k-DFA accepts L, for any integer k.

  4. (d)

    [Open] No k-DFA with sensing heads accepts L, for any k, where the term sensing means that the heads can detect each other when they meet.


Comments. The results in this exercise were motivated by a conjecture of Z. Galil and J. Seiferas [J. Comput. System Sci., 26:3(1983), 280–294] that no k-DFA can do string-matching, for any k. Galil and Seiferas proved that 6-head two-way DFA can do string-matching in linear time. Item (a) was first proved in [M. Li and Y. Yesha, Inform. Process. Lett, 22(1986), 231–235]; item (b) in [M. Geréb-Graus and M. Li, J. Comput. System Sci., 48(1994), 1–8]. Both of these papers provided useful tools for the final solution, item (c), by T. Jiang and M. Li [Proc. 25th ACM Symp. Theory Comput, 1993, pp. 62–70].

6.9.3. [38] A k-head PDA (k-PDA) is similar to a pushdown automaton except that it has k input heads. Prove that k + 1 heads are better than k heads for PDAs. That is, prove that there is a language that is accepted bya(k+ 1)-PDA but not by any k-PDA.

Comments. Conjectured by M.A. Harrison and O.H. Ibarra [Inform. Contr., 13(1968), 433–470] in analogy to Exercise 6.9.1. Partial solutions were obtained by S. Miyano [Acta Informatica, 17(1982), 63–67; J. Comput System Sci., 27(1983), 116–124]; and M. Chrobak [Theo-ret. Comput Sci., 48(1986), 153–181]. The complete solution, using in-compressibility, is in [M. Chrobak and M. Li, J. Comput System Sci., 37:2(1988), 144–155].

6.9.4. [35]
  1. (a)

    A k-pass DFA is just like a usual DFA except that the input head reads the input k times, from the first symbol to the last symbol, moving right only during each pass. Use incompressibility to show that a k-pass DFA is exponentially more succinct than a (k—1)-pass DFA. In other words, for each k, there is a language L k such that L k can be accepted by a k-pass DFA with O(kn) states, but the smallest (k—1)-pass DFA accepting L k requires Ω(2 n ) states.

  2. (b)

    A sweeping two-way DFA is again just like a usual finite automaton except that its input head may move in two directions with the restriction that it can reverse direction only at the two ends of the input. If a sweeping two-way DFA makes k—1 reversals during its computation, we call it a k-sweep two-way DFA. Show that there is a language R k that can be accepted by a 2k-sweep two-way DFA with p(k, n) states for some polynomial p, but the smallest (2k—1)-sweep two-way DFA accepting R k requires an exponential number of states in terms of k and n.


Comments. W.J. Sakoda and M. Sipser studied sweeping automata in [Proc. 10th ACM Symp. Theory Comput, 1978, pp. 275–286]. They called the k-pass DFA by the name ‘k-component series FA.’ Source: T. Jiang, e-mail, 1992.

6.9.5. [32] Consider a singly linked list L of n items, where the ith item has a pointer pointing to the (i + 1)st item, with the last pointer being nil. Let ϵ > 0, prove:
  1. (a)

    Every sequence of t(n) ≥ n steps of going backward on L can be done in O(t(n)nϵ) steps, without modifying L or using extra memory other than O(1) extra pointers or counters.

  2. (b)

    Any program using O(t(n)nϵ) steps to go back t(n) steps on L requires at least k—1 pointers.


Comments. Hint: Item (a) does not need Kolmogorov complexity. You can use O(n) initial start time. For Item (b), if a region passed by does not get visited by a pointer during this process, then it can be compressed. Source: A.M. Ben-Amram and H. Petersen, Proc. 31st ACM Symp. Theory Comput, pp. 780–786, 1999.

6.9.6. [31] Let I be an index structure supporting text search in O(l(P))-bit probes to find pattern P in text T as a substring.
  1. (a)

    If each query requires the location of P, then the size of I is Ω(l(T)).

  2. (b)

    Even if each query asks only whether a substring P is in T, the size of Iis still Ω(l(T)).


Comments. Item (a) is by E.D. Demaine and A. Lopez-Ortiz, J. Alg., 48:1(2003), 2–15. Item (b) is due to M. Li and P.M.B. Vitányi [Unpublished, 2006]. Hint for (b), use Lemma 6.9.2. An upper bound of Item (b) is O(wlog N) bits for w query words, by M.L. Fredman, J. Komlós, and E. Szemerédi [J. Assoc. Comp. Mach. 31:3(1984), 538–544].

6.18 Turing Machine Time Complexity

The incompressibility method has been quite successfully applied in solving open problems. We give one such example proving a lower bound on the time required to simulate a multitape Turing machine by a 1-(work)tape Turing machine (with a one-way input tape).

A k-tape Turing machine with one-way input, as shown in Figure 6.4, has k work tapes and a one-way read-only input tape that contains the input. Initially, the input is written on the leftmost tape segment, one symbol per tape square, and the input head scans the leftmost input symbol. The end of the input is delimited by a distinguished end marker. Observe that this model with k = 1 is far more powerful than the single-tape Turing machine model of Figure 6.1, page 443, where the single tape serves both as input tape and work tape. For instance, a Turing machine with one work tape apart from the input tape can recognize L = {w#w R : w ∈ {0,1}*} in real time, in contrast to the Ω(n2) lower bound required in Section 6.1.1 of Section 6.1. The additional marker # allows the positive result and does not change the lower bound result.

In the literature, an online model is also used. In this model, the input tape is one-way, and on every prefix of the input the Turing machine writes the output, accepting or rejecting the prefix, on a write-only oneway output tape. Proving lower bounds is easier with the online model than with the one-way input model we use in this section. We use the latter model and thus prove stronger results.

A basic question in Turing machine complexity is whether additional work tapes add power. It is known that one tape can online simulate n steps of k tapes in O(n2) steps. It has been a two-decade-long open question whether the known simulation is tight. Kolmogorov complexity has helped to settle this question by an Ω(n2) lower bound; see Exercise 6.10.2.

The tight Ω(n2) lower bound requires a lengthy proof. We provide a weaker form of the result whose simpler proof yet captures the central ideas of the full version.

6.18.1 Theorem 6.10.1

It requires Ω(n3/2/ log n) time to deterministically simulate a linear-time 2-tape Turing machine with one-way input by a 1-tape Turing machine with one-way input.

Proof. We first prove a useful lemma. Let T be a 1-tape Turing machine with input tape head h1 and work tape head h2. Let s be a segment of T's input, and R a tape segment on its work tape. We say that T maps s into R if h2 never leaves tape segment R while h1 is reading s, and T maps s onto R if h2 traverses the entire tape segment R while h1 reads s. The c.s. at position p of the work tape is a sequence of pairs of form

(state of T, position of h1),

which records the status of T when h2 enters p each time.

The following lemma states that a tape segment bordered by short c.s.'s cannot receive a lot of information without losing some. We assume the following situation: Let the input string start with x#, where x = x1x2x k with l(x i ) = l(x)/k for all i. Let R be a segment of T's storage tape such that T maps all blocks in \(S = \left\{ {x_{i_1 }, \ldots,x_{i_l } } \right\}\) into tape segment R, where S ⊆ {x i : 1 ≥ ik}.

6.18.2 Lemma 6.10.1

(Jamming lemma) The contents of the storage tape of T at the time when h1 moves to the # marker can be reconstructed using only the sequence of blocks S¯ = {x i : 1 ≥ ik}—S, the final contents of R, the two final c.s.'s on the left and right boundaries of R, a description ofT, and a description of this discussion.

Roughly speaking, if the number of missing bits \(\Sigma _{j = 1}^l \,l\left({x_{i_j } } \right)\) exceeds the number of added description bits (those for R and the two crossing sequences around R), then the jamming lemma implies that either x = x1x k is not incompressible or some information about x has been lost.

Proof. Let the two positions at the left boundary and the right boundary of R be l R and r R , respectively. Subdivide the input tape into c-sized slots with c = l(x)/k. Put the blocks x j of S¯ in their correct positions on the input tape in the following way: In the simulation, h1 reads the input from left to right without backing up. We have a list S¯ of c-sized blocks available. We put the consecutive blocks of S¯ on the c-sized slots on the input tape such that the slots, which are traversed by h1 with h2 all the time positioned on R, are left empty. This can be easily determined from the left and right crossing sequences of R.

Simulate T with h2 staying to the left of R using the c.s. at l R to construct the work tape contents to the left of R. Also simulate T with h2 staying to the right of R using the c.s. at r R to construct the work tape contents to the right of R. Such a simulation is standard.

We now have obtained the contents of T's work tape at the end of processing x#, apart from the contents of R. The final contents of R are given and put in position. Together, we now have T's work tape contents at the time when h1 reaches #.

Notice that although there are many unknown x i 's (in S), they are never read, since h1 skips over them because h2 never goes into R. ◻

To prove the theorem we use the witness language L defined by
$$L = \left\{ {x_1 @x_2 @ \ldots @x_k \# y_1 @ \ldots @y_l \# 0^i 1^j :x_i = y_j } \right\}.$$

Clearly, L can be accepted in linear time by a 2-tape machine. Assume, by way of contradiction, that a deterministic 1-tape machine T accepts L in T(n) > c−5n3/2 /logn time, for some fixed new constant c and n large enough. We derive a contradiction by showing that then some incompressible string must have too short a description.

Assume, without loss of generality, that T writes only 0's and 1's in its work squares and that l(T) = O(1) is the number of states of T. Fix the new constant c and take the word length n as large as needed to derive the desired contradictions below and such that the formulas in the sequel are meaningful.

First, choose an incompressible string x ∈ {0,1}* of length l(x) = n and C(x) > n. Let x consist of the concatenation of \(k = \sqrt n \) substrings, x1, x2,…, x k , each substring \(\sqrt n \) bits long. Let
$$x_1 @x_2 @ \ldots @x_k \# $$
be the initial input segment on T's input tape. Let time t# be the step at which h1 reads #. If more than k/2 of the x i 's are mapped onto a contiguous tape segment of size at least n/c3, then T requires Ω(n3/2/logn) time, which is a contradiction. Therefore, there is a set S consisting of k/2 blocks x i such that for every x i S there is a tape segment of ≤ n/c3 contiguous tape squares into which x i is mapped. In the remainder of the proof we restrict attention to the x i 's in this set S. Order the elements of S according to the order of the left boundaries of the tape segments into which they are mapped. Let x m be the median.

The idea of the remainder of the proof is as follows: Intuitively, the only thing T can do before input head h1 crosses # is somehow copy the x i 's onto its work tape, and afterward copy the y j 's onto the work tape. There must be a pair of these x i and y j that are separated by Ω(n) distance, since all these blocks together must occupy Ω(n) space. At this time, head h1 still has to read the 0 i 1 j part of the tape. Hence, we can force T to check whether x i = y j , which means that it has to spend about Ω(n3/2/ logn) time. To convert this intuition into a rigorous proof we distinguish two cases:

In the first case we assume that many x i 's in S are mapped (jammed) into a small tape segment R. That is, when h1 (the input tape head) is reading them, h2 (the work tape head) is always in this small tape segment R. We show that then, contrary to assumption, x can be compressed (by the jamming lemma). Intuitively, some information must have been lost.

In the second case, we assume there is no such jammed tape segment and that the records of the x i S are spread evenly over the work tape. In that case, we will arrange the y j 's so that there exists a pair (x i ,y j ) such that x i = y j and x i and y j are mapped into tape segments that are far apart, at distance Ω(n). Then we complete T's input with final index 0 i 1 j so as to force T to match x i against y j . As in Section 6.1.1, page 442, either T spends too much time or we can compress x again, yielding a second contradiction and proving, for large enough n,
$$T\left(n \right) \ge \frac{{n^{3/2} }}{{c^5 \log n}}.$$

Case 1 (Jammed) Assume there are k/c blocks x i S and a fixed tape segment R of length n/c2 on the work tape such that T maps all of these x i 's into R. Let S′ be the set of such blocks.

We will construct a short program that prints x. Consider the two tape segments of length l(R) to the left and to the right of R on the work tape. Call them R l and R r , respectively. Choose positions p l in R l and p r in R r with the shortest c.s.'s in their respective tape segments. Both c.s.'s must be shorter than \(\sqrt n /\left({c^2 \log \,n} \right).\). Namely, if the shortest c.s. in either tape segment is at least \(\sqrt n /\left({c^2 \log \,n} \right)\) long, then T uses at least
$$\frac{{\sqrt n }}{{c^2 \log n}}.\frac{n}{{c^2 }}$$
steps, and there is nothing to prove. Let tape segment R l′ (R r′ ) be the portion of R l (R r ) right (left) ofpl (p r ).
Now, using the description of
  • this discussion (including the text of the program below) and simulator T in O(1) bits;

  • the values of n, k, c, and the positions of pl,,p r in O(logn) bits;

  • the literal concatenated list {x1, …, x k }—S′, using nn/c bits;

  • the state of T and the position of h2 at time t# in O(logn) bits;

  • the two c.s.'s at positionsp r and pl at time t# in at most O(logn)) bits; and

  • the contents at time t# of tape segment RlRRr in at most 3n/c2 + O(logn) bits;

we can construct a program to check whether a candidate string y equals x by running T as follows:

Check whether l(y) = l(x). By the jamming lemma (using the above information related to T's processing of the initial input segment x1@ … @x k #), reconstruct the contents of T's work tape at time t#, the time h1 gets to the first # sign. Divide y into k equal pieces and form y1@ … @y k . Run T, initialized in the appropriate state, head positions, and work tape contents (at time t#), as the starting configuration, on each input suffix of the form y1@…@y k #021

By definition of L, the machine T can accept for all i iff y = x.

This description of x requires not more than
$$n - \frac{n}{c} + \frac{{3n}}{{c^2 }} + O\left({\sqrt n \log n} \right) + O\left({\log n} \right) \le {\rm{\gamma }}n$$
bits, for some constant 0 < γ < 1 and large enough c and n However, this contradicts the incompressibility of x (C(x) ≥ n).

Case 2 (Not Jammed) Assume that for each fixed tape segment R, with l(R) = n/c2, there are at most k/c blocks x i S mapped into R.

Fix a tape segment of length n/c2 into which the median x m is mapped. Call this segment R m . Then, at most k/c strings x i in set S are mapped into R m . Therefore, for large enough c (and c > 3), at least k/6 of the x i 's in S are mapped into the tape right of R m . Let the set of those x i 's be \(S_r = \left\{ {x_{i_1 }, \ldots x_{i_{{k \mathord{\left/ {\vphantom {k 6}} \right. } 6}} } } \right\}\)S. Similarly, let \(S_r = \left\{ {x_{i_1 }, \ldots x_{i_{{k \mathord{\left/ {\vphantom {k 6}} \right. } 6}} } } \right\}\)S consist of k/6 strings x i that are mapped into the tape left of R m . Without loss of generality, assume i1 < i2 < ⋯ < ik/6, and j l < j2 < ⋯<jk/6.

Set \(y_1 = x_{i_1,} \,y_2 = x_{j_1 },\,y_3 = x_{i_2 },y_4 = x_{j_2 },\) and so forth. In general, for all integers s, 1 ≥ sk/6,
$$y_{2s} = x_{js} \,{\rm{and}}\,y_{2s - 1} = x_{i_s }.$$
Using this relationship, we now define an input prefix for T to be
$$x_1 @ \ldots @x_k \# y_1 @ \ldots @y_{k/3} \#.$$
There exists a pair y2i-1, y2i that is mapped into a segment of size less than n/(4c2). Otherwise, T uses at least
$$\frac{k}{6}.\frac{n}{{4c^2 }} = \frac{{n^{3/2} }}{{24c^2 }}$$
steps, and there is nothing to prove. Now this pair y2i−1, y2i is mapped into a segment with distance at least n/c3 either to x is or to x js . Without loss of generality, let y2s−1, y2s be mapped n/c3 away from \(x_{i_s } \). So y2s−1 and \(x_{i_s } \) are separated by a region R of size n/c3. Attach suffix \(0^{i_s } 1^{2s - 1} \) to the initial input segment of Equation 6.20 to complete the input to T to
$$x_1 @ \ldots @x_k \# y_1 @ \ldots @y_{k/3} \# 1^{i_s } 1^{2s - 1}.$$

So at the time when T reads the second # sign, \(x_{i_s } \) is mapped into the tape left of R, and \(y_{2s - 1,} \) , which is equal to \(x_{i_s } \), is mapped into the tape right of R.

Determine position p in R that has the shortest c.s. of T's computation on the input of Equation 6.21. If this c.s. is longer than \(\sqrt n /\left({c^2 \log \,n} \right),\), then T uses at least
$$\frac{n}{{c^3 }}.\frac{{\sqrt n }}{{c^2 \log n}}$$
steps, and there is nothing to prove. Therefore, assume that the shortest c.s. has length at most \(\sqrt n /\left({c^2 \log \,n} \right).\) Then again we can construct a short program P to accept only x by a cut-and-paste argument, and show that it yields too short a description of x. Using the description of
  • this discussion (including the text of the program P below) and simulator T in O(1) bits;

  • the values of n, k, c, and the position p in O(log n) bits;

  • n - \(n - \sqrt n \,{\text{bits}}\,{\text{for}}\,S - \left\{ {x_{i_s } } \right\};\) bits for S\(\sqrt n \)

  • O(log n) bits for the index is of \(x_{i_s } \) to place it correctly on the input tape; and

  • \( {\underline <} \sqrt n /c\)bits to describe the c.s. of length \( {\underline <} \sqrt n /\left({c^2 \log n} \right)\)at p (assuming cl(T));

we can construct a program to reconstruct x as follows:

Construct the input of Equation 6.21 on T's input tape with the two blocks \(x_{i_s } \) and y2s−1filled with blanks. Now we search for \(x_{i_s } \) as follows. For each candidate z with l(z) = \(l\left(z \right) = \sqrt n \) put z in y2s−1's position and do the following simulation:

Using the c.s. at point p, we run T such that h2 always stays at the right of p (y2s−1's side). Whenever h2 encounters p, we check whether the current status matches the corresponding ID in the c.s. If it does, then we use the next ID of the c.s. to continue. If in the course of this simulation process T rejects or there is a mismatch (that is, when h2 gets to p, machine T is not in the same state or h1's position is not as indicated in the c.s.), then \(z \ne x_{i_s }.\). If the crossing sequence at p of T's computation for candidate z matches the prescribed c.s., then we know that T would accept the input of Equation 6.21 with y2s−1 replaced by z. Therefore, \(z \ne x_{i_s }.\).

The description of x requires not more than
$$n - \sqrt n + \frac{1}{c}\sqrt n + O\left({\log n} \right) \le n - \gamma \sqrt n $$
bits for some positive γ > 0 and large enough c and n. This contradicts the incompressibility of x (C(x) ≥ n) again.

Case 1 and Case 2 complete the proof that T(n) ≥ c−5n3/2/ log n. ◻

6.19 Exercises

6.10.1. [33] Consider the 1-tape Turing machine as in Section 6.1.1, page 442. Let the input be n/ log n integers each of size O(log n), separated by # signs. The element-distinctness problem is to decide whether all these integers are distinct. Prove that the element-distinctness problem requires Ω(n2/log n) time on such a 1-tape Turing machine.

Comments. A similar bound also holds for 1-tape nondeterministic Turing machines. Source: [A. López-Ortiz, Inform. Process. Lett., 51:6(1994), 311–314].

6.10.2. [42] Extend the proof of Theorem 6.10.1 to prove the following: Simulating a linear-time 2-tape deterministic Turing machine by a 1-tape deterministic Turing machine requires Ω(n2) time. (Both machines of the one-way input model.)

Comments. Hint: set the block size for x i to be a large constant, and modify the language to one that requires comparison of Ω(n) pairs of x i 's and yj's. The lower bound is optimal, since it meets the O(n2) upper bound of [J. Hartmanis and R. Stearns, Trans. Amer. Math. Soc, 117(1969), 285–306]. Source: W. Maass, Trans. Amer. Math. Soc., 292(1985), 675–693; M. Li and P.M.B. Vitányi, Inform. Comput, 78(1988), 56–85.

6.10.3. [38] A k-pushdown store machine is similar to a k-tape Turing machine with one-way input except that the k work tapes are replaced by k pushdown stores. Prove: simulating a linear-time 2-pushdown store deterministic machine with one-way input by a 1-tape nondeterministic Turing machine with one-way input requires Ω\(\Omega \left({n^{3/2} /\sqrt {\log \,n} } \right)\) time.

Comments. Source: M. Li and P.M.B. Vitanyi, Inform. Comput., 78(1988), 56–85. This bound is optimal, since it is known that simulating a linear-time 2-pushdown store deterministic machine with one-way input by a 1-tape nondeterministic Turing machine with one-way input can be done in O\(O\left({n^{3/2} \sqrt {\log \,n} } \right)\) time [M. Li, J. Comput. System Sci., 7:1(1988), 101–116].

6.10.4. [44] Show that simulating a linear-time 2-tape deterministic Turing machine with one-way input by a 1-tape nondeterministic Turing machine with one-way input requires Ω(n2/((log n)2 log logn)) time.

Comments. Hint: let S be a sequence of numbers from {0,…, k−1}, where k = 2l for some l. Assume that each number b ∈ {0,…, k−1} is somewhere in S adjacent to the number 2b (mod k) and 2b+1 (mod k). Then for every partition of {0,…, k—1} into two sets G and R such that d(G),d(R) > k/4 there are at least k/(clogk) (for some fixed c) elements of G that occur somewhere in S adjacent to a number from R. Subsequently prove the lower bound using the language L ⊆ {0,1}* defined as follows. Let u = u1u k , where the u i 's are of equal length. Form uu = u1u2k with uk+i = u i . Then inserting u i between u2i-1 and u2i for 1 ≥ ik results in a member in L. These are the only members of L. Source: W. Maass, Trans. Amer. Math. Soc, 292(1985), 675–693. The language L defined in this hint will not allow us to obtain an Ω(n2) lower bound. Define a graph G = {Z n ,E ab ), where Z n = {0,1,…,n—1}, E ab = {(i, j) : j ≡ (ai + b) mod n for iZ n }, and a and b are fixed positive integers. Then G has a separator, a set of nodes whose removal separates G into two disconnected, roughly equal-sized components of size O\(O\left({n/\sqrt {\log _a \,n} } \right).\) . Using such a separator, L can be accepted in subquadratic time by a 1-tape online deterministic machine [M. Li, J. Comput. System Sci., 7:1(1988), 101–116].

6.10.5. [46] Prove that simulating a linear-time 2-tape deterministic Turing machine with one-way input by a 1-tape nondeterministic Turing machine with one-way input requires Ω(n2/log(k) n) time, for any k, where log(k) = loglog…log is the k-fold iterated logarithm. This improves the result in Exercise 6.10.4.

Comments. Source: Z. Galil, R. Kannan, and E. Szemerédi [J. Comput. System Sci., 38(1989), 134–149; Combinatorica, 9(1989), 9–19].

6.10.6. [O47] Does simulating a linear-time 2-tape deterministic Turing machine with one-way input by a 1-tape nondeterministic Turing machine with one-way input require Ω(n2) time?

6.10.7. [46] A k-queue machine is similar to a k-tape Turing machine with one-way input except with the k work tapes replaced by k work queues. A queue is a first-in last-out (FIFO) device. Prove (with one-way input understood):
  1. (a)

    Simulating a linear-time 1-queue machine by a 1-tape Turing machine requires Ω(n2) time.

  2. (b)

    Simulating a linear-time 1-queue machine by a 1-tape nondeterministic Turing machine requires Ω(n4/3/log2/3 n) time.

  3. (c)

    Simulating a linear-time 1-pushdown store machine (which accepts precisely CFLs) by a 1-queue machine, deterministically or nondeter-ministically, requires Ω(n4/3/log n) time.


Comments. Items (a) and (b) are from [M. Li and P.M.B. Vitanyi, Inform. Comput, 78(1988), 56–85]; Item (c) is from [M. Li, L. Longpre, and P.M.B. Vitányi, SIAM J. Comput, 21:4(1992), 697–712]. The bound in Item (a) is tight. The bound in Item (b) is not tight; the best upper bound is \(O\left({n^{3/2} \sqrt {\log \,n} } \right),\), in [M. Li, J. Comput. System Sci., 7:1(1988), 101–116]. The bound in Item (c) is not tight; the upper bound is known to be O(n2) (also to simulate a 2-pushdown store machine).

6.10.8. [43] Use the terminology ofExercise 6.10.7, with one-way input understood.
  1. (a)

    Show that simulating a linear-time deterministic 2-queue machine by a deterministic 1-queue machine takes Ω(n2) time.

  2. (b)

    Show that simulating a linear-time deterministic 2-queue machine by a nondeterministic 1-queue machine takes Ω(n2/(log2 n log log n)) time.

  3. (c)

    Show that simulating a linear-time deterministic 2-tape Turing machine by nondeterministic 1-queue machine takes Ω(n2/log2 n log log n) time.


Comments. The upper bounds in all cases are O(n2) time. Source: M. Li, L. Longpré, and P.M.B. Vitanyi, SIAM J. Comput, 21:4(1992), 697–712. For additional results on simulating (k + 1)-queue machines and 2-tape or multitape machines by k-queue machines see [M. Hühne, Theoret. Comput. Sci., 113:1(1993), 75–91].

6.10.9. [38] Consider the stronger offline deterministic Turing machine model with a two-way read-only input tape. Given an l × l matrix A, with \(l = \sqrt {n/\log n} \)and element size O(logn), arranged in row-major order on the two-way (1-dimensional) input tape,
  1. (a)

    Show that one can transpose A (that is, write A T on a work tape in row-major form) in O(n log n) time on such a Turing machine with two work tapes.

  2. (b)

    Show that it requires Ω \(\Omega \left({n^{3/2} /\sqrt {\log \,n} } \right)\) time on such a Turing machine with one work tape to transpose A.

  3. (c)

    From Items (a) and (b), obtain a lower bound on simulating two work tapes by one work tape for the above machines.


Comments. Source: M. Dietzfelbinger, W. Maass, and G. Schnitger, Theoret. Comput. Sci., 82:1(1991), 113–129.

6.10.10. [37] We analyze the speed of copying strings for Turing machines with a two-way input tape and one or more work tapes.
  1. (a)

    Show that such a Turing machine with one work tape can copy a string of length s, initially positioned on the work tape, to a work tape segment that is d tape cells removed from the original position in O(d + sd/ logmin(n, d)) steps. Here n denotes the length of the input.

  2. (b)

    Show (by the incompressibility method) that the upper bound in Item (a) is optimal. For d = Ω(logn), such a Turing machine with one work tape requires Ω(sd/ logmin(n, d)) steps to copy a string of length s across d tape cells.

  3. (c)

    Use Item (a) to show that such Turing machines can simulate f(n)-time bounded multitape Turing machines in O(f(n)2/logn) steps. This is faster by a multiplicative factor logn than the straightforward simulations.


Comments. Source: M. Dietzfelbinger, Inform. Process. Lett, 33(1989/ 1990), 83–90.

6.10.11. [38] Show that it takes Θ(n5/4) time to transpose a Boolean matrix on a Turing machine with a two-way read-only input tape, a work tape, and a one-way write-only output tape. That is, the input is a \(\sqrt n \times \sqrt n \) matrix A that is initially given on the input tape in row-major order. The Turing machine must output A in column-major order on the output tape.

Comments. Hint: for the upper bound, partition columns of A into n¼ groups each with n¼ consecutive columns. Process each group separately (you need to put each group into a smaller region first). The lower-bound proof is harder. Fix a random matrix A. Let the simulation time be T. Split T into O(n¾) printing intervals. Within each interval, O(n¼) entries of A are printed and half such intervals last fewer than n½ steps. Also, split the work tape into (disjoint) intervals of size O(n½), so that one quarter of the printing intervals do not overlap with two work tape intervals. Say, an input bit is mapped to a work tape interval if while the input head is reading that bit, the work tape head is in this interval. A work tape interval is underinformed if many bits in the printing interval it corresponds to are not mapped into this work tape interval before they are printed. Show that if there are many underinformed work tape intervals, A is compressible. Then show that if there are not many underinformed intervals, there must be many over burdened intervals, that is, more bits than the length of such intervals are mapped in to each interval. This also implies the compressibility of A.

Source: M. Dietzfelbinger, W. Maass, Theoret. Comput. Sci., 108(1993), 271–290.

6.10.12. [O44] Obtain a tight bound for simulating two work tapes by one work tape for Turing machines with a two-way input tape.

Comments. W. Maass, G. Schnitger, E. Szemerédi, and G. Turan, [Computational Complexity, 3(1993), pp. 392-401] proved (not using Kolmogorov complexity) the following: Let L = {A#B : A = B t and a ij ≠ 0 only when i,j ≡ 0 mod (logm), where m = 2 k , for some k, is the size of matrices}. Accepting L requires Ω(n log n) time on a Turing machine with a two-way input tape and one work tape. Since L can be accepted in O(n) time by a similar machine with two work tapes, this result implies that two tapes are better than one for deterministic Turing machines with a two-way input tape. An upper bound to this question is given in Exercise 6.10.10, Item (c).

6.10.13. [40] Consider an online deterministic Turing machine with a one-way input tape, some work tapes/pushdown stores and a oneway output tape. The result of computation is written on the output tape. ‘Online simulation’ means that after reading a new input symbol the simulating machine must write down precisely the output of the simulated machine for the processed initial input segment before it goes on to read the next input symbol. Prove the following:
  1. (a)

    It requires Ω(n(log n)1/(k+1)) time to online simulate k+1 pushdown stores by k tapes.

  2. (b)

    Online simulating one tape plus k-1 pushdown stores by k pushdown stores requires Ω(n(logn)1/(k+1)) time.

  3. (c)

    Each of the above lower bounds holds also for a probabilistic simulation where the probabilistic simulator flips a random coin to decide the next move. (No error is allowed. The simulation time is the average taken over all coin-tossing sequences.)


Comments. Item (a) is from [W.J. Paul, Inform. Contr., 53(1982), 1–8]. Item (b) is due to P. Dŭriš, Z. Galil, W.J. Paul, and R. Reischuk [Inform. Contr., 60(1984), 1–11]. Item (c) is from [R. Paturi, J. Simon, R. Newman-Wolfe, and J. Seiferas, Inform. Comput, 88(1990), 88–104]. The last paper also includes proofs for Items (a) and (b).

6.10.14. [40] Consider the machine model in Exercise 6.10.13, except that the work tapes are two-dimensional. Such a machine works in real time if at each step it reads a new input symbol and is online. (Then it processes and decides each initial m-length segment in precisely m steps.) Show that for such machines, two work tapes with one head each cannot real-time simulate one work tape with two independent heads.

Comments. Source: W.J. Paul, Theoret. Comput. Sci., 28(1984), 1–12.

6.10.15. [48] As in Exercise 6.10.14, consider the Turing machine model of Exercise 6.10.13 but this time with 1-dimensional tapes. Show that a Turing machine with two single-head one-dimensional tapes cannot recognize the set {x2x′ : x ∊ {0,1}* and x′ is a prefix of x} in real time, although it can do so with three tapes, two two-dimensional tapes, or one two-head tape, or in linear time with just one tape.

Comments. This is considerably more difficult than the problem in Exercise 6.10.14. In particular, this settles the longstanding conjecture that a two-head Turing machine can recognize more languages in real time if its heads are on the same one-dimensional tape than if they are on separate one-dimensional tapes. Source: partial results in [W.J. Paul, Ibid.; P.M.B. Vitányi, J. Comput. System Sci., 29(1984), 303–311]. This forty-year open question was finally settled by T. Jiang, J.I. Seiferas, and P.M.B. Vitányi in [J. Assoc. Comp. Mach., 44:2(1997), 237–256].

6.10.16. [38] A tree work tape is a complete, infinite, rooted binary tree used as storage medium (instead of a linear tape). A work tape head starts at the root and can in each step move to the direct ancestor of the currently scanned node (if it is not the root) or to either one of the direct descendants. A multihead tree machine is a Turing machine with a one-way linear input tape, one-way linear output tape, and several tree work tapes each with k ≥ 1 heads. We assume that the finite control knows whether two work tape heads are on the same node or not. A d-dimensional work tape consists of nodes corresponding to d-tuples of integers, and a work tape head can in each step move from its current node to a node with each coordinate ±1 of the current coordinates. Each work tape head starts at the origin, which is the d-tuple with all zeros. A multihead d-dimensional machine is like the multihead tree machine but with d-dimensional work tapes.
  1. (a)

    Show that simulating a multihead tree machine online by a multihead d-dimensional machine requires time Ω(n1+1/d/ log n) in the worst case. Hint: prove this for a tree machine with one tree tape with a single head that runs in real time.

  2. (b)

    Prove the same lower bound as in Item (a), where the multihead d-dimensional machine is made more powerful by allowing the work tape heads also to move from their current node to the current node of any other work tape head in a single step.


Comments. Source: M.C. Loui, SIAM J. Comput., 12(1983), 463–472. The lower bound in Item (a) is optimal, since it can be shown that every multihead tree machine of time complexity t(n) can be simulated online by a multihead d-dimensional machine in time O(t(n)1+1/d/ log t(n)). It is known that every log-cost RAM (Exercise 6.10.17) can be simulated online in real time by a tree machine with one multihead tree tape [W.J. Paul and R. Reischuk, J. Comput. System Sci., 22(1981), 312–327]. Hence, we can simulate RAMs online by d-dimensional machines in time that is bounded above and below by the same bounds as the simulation of tree machines. See also [M.C. Loui, J. Comput. System Sci., 28(1984), 359–378].

6.10.17. [37] A log-cost random access machine (log-cost RAM) has the following components: an infinite number of registers each capable of holding an integer and a finite sequence of labeled instructions including ‘output,’ ‘branch,’ ‘load/store,’; ‘add/subtract between two registers.’

The time cost for execution of each instruction is the sum of the lengths of the integers involved.
  1. (a)

    Every tree machine with several tree tapes, each with one head, of time complexity t can be simulated online by a log-cost RAM of time complexity O(t log t/ log log t). Show that this is optimal.

  2. (b)

    Show that online simulating a linear-time log-cost RAM by a d-dimensional Turing machine requires Ω(n1+1/d log n(log log n)1+1/d).


Comments. Source: D.R. Luginbuhl, Ph.D. thesis, 1990, Univ. Illinois, Urbana-Champaign; M.C. Loui and D.R. Luginbuhl, [SIAM J. Comput. 21:5(1992), 959–971; Math. Systems Theory, 25:4(1992), 293–308].

6.10.18. [38] Consider the machine models in Exercise 6.10.13, Item (c). All machines below have one multidimensional tape with one head.
  1. (a)

    Show that an l-dimensional machine running in time T can be simulated by a probabilistic k-dimensional machine running in time O(T r (logT)1/k), where r = 1 + 1/k–1/l.

  2. (b)

    Show that a probabilistic k-dimensional machine requires time Ω(T r ) to simulate an l-dimensional machine running in time T, where r = 1 + 1/k–1/l.


Comments. Source: [N. Pippenger, Proc. 14th ACM Symp. Theory Comput., 1982, pp. 17–26]. Pippenger used Shannon's information measure to prove Item (b).

6.10.19. [30] Prove that if the number of states is fixed, then a 1-tape nondeterministic Turing machine with no separate input tape (with only one read/write two-way tape) can accept more sets within time bound a 2 n a than within a 1 n a , for 0 < a 1 < a 2 and 1 < a < 2.

Comments. Source: K. Kobayashi, Theoret. Comput. Sci., 40(1985), 175–193.

6.10.20. [30] A parallel random access machine (PRAM), also called a ‘concurrent-read and concurrent-write priority PRAM,’ consists of a finite number of processors, each with an infinite local memory and infinite computing power, indexed as P(1), P(2), P(3), …, and an infinite number of shared memory cells c(i), i = 1, 2, …, each capable of holding any integer. Initially, the input is contained in the first n memory cells. The number of processors is polynomial in n. Each step of the computation consists of all processors in parallel executing three phases as follows. Each processor (i) reads from a shared memory cell; (ii) performs any deterministic computation; and (iii) may attempt writing into some shared memory cell.

At each step, every processor is in some state. The actions and the next state of each processor at each step depend on the current state and the value read. In case of a write conflict, that is, more than one processor tries to write to the same memory cell, the processor with the minimum index succeeds in writing. By the end of computing, the shared memory contains the output. Prove that adding (or multiplying) n integers, each ≥ nϵ bits for a fixed ϵ > 0, requires Ω(log n) parallel steps on a PRAM.

Comments. Hint: Cut a random string x into segments x 1 …,x n , the segments being used as inputs for the processors. Define an input to be ‘not useful’ if it does not ‘influence’ the final output of the sum (in some precise way). Then show that there is an input x i that is not useful; hence we can compress x using the rest of the inputs and the output. Source: Independently proved by P. Beame [Inform. Comput, 76(1988), 13–28] without using Kolmogorov complexity and by M. Li and Y. Yesha [J. Assoc. Comp. Mach., 36:3(1989), 671–680]. Slightly weaker versions of Exercise 6.10.20 were proved by F. Meyer auf der Heide and A. Wigderson [SIAM J. Comput, 16(1987), 100–107] using a Ramsey theorem, by A. Israeli and S. Moran [private communication, 1985], and by I. Parberry [Ph.D. thesis, 1984, Warwick University, UK]. In the last three proofs, one needs to assume that the integers have arbitrarily (or exponentially) many bits.

6.10.21. [15] Consider the following ‘proof’ for Exercise 6.10.20 without using Kolmogorov complexity: Assume that a PRAM M adds n numbers in o(logn) time. Take any input x 1 …,x n . Then there is an input x k that is ‘not useful’ as in the hint in Exercise 6.10.20. If we change x k to x k + 1, then the output should still be the same, since x k is not useful, a contradiction. What is wrong with this proof?

6.10.22. [26] A function f(x1 …, x n ) is called invertible if for each i, argument x i can be computed from {x1 …, x n }-{x i } and f(x 1 …, x n ). Use the PRAM model with q processors defined in this section. Show that it requires Ω(min{log(b(n)/logq),logn}) time to compute any invertible function f(x 1 …,x n ), where l(x i ) ≥ b(n), for all i, and logn = o(b(n)).

Comments. Source: M. Li and Y. Yesha, J. Assoc. Comp. Mach., 36:3(19 89), 671–680.

6.10.23. [36] Computing the minimum index: Modify the PRAM model as follows. We now have n processors P(1),…, P(n) and only one shared memory cell, c(1). Each processor knows one input bit. If several processors attempt to write into c(1) at the same time, then they must all write the same data; otherwise each write fails. This PRAM version requires Ω(logn) time to find the smallest index i such that P(i) has input bit 1. Can you give two proofs, one using incompressibility arguments and the other not?

Comments. The original proof without using Kolmogorov complexity is due to F. Fich, P. Ragde, and A. Wigderson [SIAM J. Comput., 17:3(1988), 606–627].

6.20 Communication Complexity

Suppose Alice has input x, Bob has input y, and they want to compute a function f(x,y) by communicating information and by performing local computation according to a fixed protocol. Assume that Alice outputs f(x, y). Local computation costs are ignored; we are interested only in minimizing the number of bits communicated between Alice and Bob. Usually, one considers the worst-case or average-case over all inputs x, y of given length n. But in many situations, for example replicated file systems and cache coherence algorithms in multiprocessor systems and computer networks, the worst-case and average-case are not necessarily significant. From the individual communication complexities we can always obtain the worst-case complexity and the average-case complexity.

6.20.1 Definition 6.11.1

The individual communication complexity CC(x,y\P) is defined as the number of bits Alice with input x and Bob with input y need to exchange, both using a communication protocol P. We assume that the protocol is deterministic, possibly partial, and it knows parameter n. A protocol is total if it gives a definite result for all inputs, and it is partial if it computes correctly on input (x, y) (on other inputs P may output incorrect results or not halt). Note that a protocol implicitly specifies the function being computed in an operational manner.

Let f be a function defined on pairs of strings of the same length. Assume that Alice has x, Bob has y, and Alice wants to compute f(x,y). A (total) communication protocol P over domain X with range Z is a finite rooted binary tree, whose internal nodes are divided into two parts, A and B, called Alice's nodes and Bob's nodes. (They indicate the turn of move.) Each internal node v is labeled by a function r v : X → {0,1} and each leaf v is labeled by a function r v : XZ. A node reached by a protocol P on inputs x,y is the leaf reached by starting at the root of P and walking toward leaves, where in each encountered internal node we go left if r v (x) = 0, and we go right otherwise. This leaf is called the conversation on x,y. Using P on input x and y, Alice computes z 220A; Z if the leaf v reached on x and y satisfies z = r v (x). We say that a protocol computes a function f : XZ if Alice computes f(x,y) for all x, yX. The domain X of protocols considered is always equal to the set {0, 1} n of binary strings of certain length n. As Z we will take either {0,1} or {0,1} n .

The length of communication CC P (x,y) of the protocol P on inputs x,y is the length of the path from the root of P to the leaf reached on x,y. By the complexity of a protocol P we mean C(P|n). Formally, a partial protocol is a protocol, as defined above, but the functions r v may be partial. The complexity C(P|n) of a partial protocol P is defined as the minimal Kolmogorov complexity of a program that given n, v determines whether v is a leaf or Alice's internal node or Bob's internal node, and given n, v, x computes r v (x). If a partial protocol happens to be total (all r v are total functions) then the new definition of C(P|n) coincides with the old one.

6.20.2 Identity Function

Let I(x, y) = (x, y) be the identity function: Alice has to learn Bob's string. If Alice can compute I, then she can compute every computable function f. In the following theorem we consider protocols that compute I on all strings x, y of length n. By definition of the Kolmogorov complexity the lower bound for the number of bits to be transmitted is C(y|x,P). Since the number m of halting programs of length n + O(1) satisfies m ≥ 2n+o(1), we can determine the halting of all programs of length up to n + O(1) if we are given m: Run all programs up to that length dovetail fashion; by the time m of them have halted we know that the remainder will never halt. In this way we can determine the shortest program, given n, for every string y of length n, since C(y) ≤ n + O(1). Thus, Bob using a protocol P, containing m, can find the shortest program for y and send it to Alice, who computes y. Thus, CC(x,y|P) ≤ C(y|n) + O(1) with C(P) ≤ n + O(1). By Theorem 3.8.1 on page 242 we see that C(P) ≥n- O(logn) if it is supposed to work for every y of length n.

6.20.3 Theorem 6.11.1

For every protocol P for the identity function I, and every x, y, we have CC(x,y|P) ≥ C(y|P) − O(1) ≥ C(y|n) − C(P|n) − O(logC(P|n)).

Proof. Let c be the conversation between Alice and Bob on inputs x,y. It suffices to show that given P, c we can find y. We call a setRA×A a rectangle if whenever both (x 1 y l ) and (x 2 ,y 2 ) are in R, then so is (x 1 , y 2 ). The definition of a communication protocol implies that the set of all pairs (x′, y′) such that the conversation between Alice and Bob on input (x′, y′) is equal to c is a ‘rectangle’, that is, has the form X ×Y, for some X,Y ⊂ {0,1}n. The set Y is a one-element set, since for every y′ ∊ Y Alice outputs y also on the input (x,y′) (the output of Alice depends on c,P,x only). We can find Y given P,c, and since Y = {y}, we are done. ◻

6.20.4 Example 6.11.1

We look at the special case x = y. For every P, there are x, y with CC(x,y|P) ≥ C(y|x) + nO(1). This holds in particular for y = x with C(y)n. Then C(y\x) = O(1) and by Theorem 6.11.1 we have CC(x, y|P) ≥ C(y|P) − O(1) ≥ C(y|x) +n- O(1). ◻

6.20.5 Inner Product Function

x = x 1 , …,x n and Bob has a string y = y 1 ,…,y n with x,y∈ {0,1} n . Alice and Bob compute
$$f\left({x,y} \right) \equiv \left({\sum\limits_{i = 1}^n {x_i }.y_i } \right)\bmod,$$
with Alice ending up with the result.

6.20.6 Lemma 6.11.1

Every protocol P computing the inner product function f requires at least CC(x,y|P) ≥ C(x,y|P)-n-O(1) bits of communication for every pair x, y.

Proof. Fix a communication protocol P that computes the inner product. Let Alice's and Bob's input be as above. Run the communication protocol P on x, y and let c(x, y) be a record of the communication between Alice and Bob. Consider the set S = S(x, y) defined by

S := {(a, b) : c(a, b) = c(x, y), and Alice outputs

f(x,y) on conversation c(x,y) and input a}.

We claim that d(S) ≤ 2 n . To prove this, assume first that f(x,y) = 0. Let X = {a : (a, b) ∈ S} be the first projection of S and let Y = {b : (a, b) ∈ S} be the second projection of S. Since P computes f we know that f(a, b) = 0 for all (a, b) ∈ S. In other words, every element of X is orthogonal to every element in Y, and therefore rank(X) + rank(Y) ≤ n. Thus,
$$d\left(S \right) = d\left(X \right).d\left(Y \right) \le 2^{{\rm{rank}}\left(X \right) + {\rm{rank}}\left(Y \right)} \le 2^n.$$

Assume now that f(x,y) = 1. Again S = X × Y for some X,Y and f(a, b) = 1 for all (a, b) ∈ S. Subtracting x from the first component of all pairs in S, we obtain a rectangle S′ such that f(a, b) = 0 for all (a, b) ∈ S′. By the above argument, we have d(S′) ≤ 2 n . Since d(S′) = d(S) we are done. Given P, c(x,y), f(x,y), and the index of (x,y) in S we can compute (x,y). Padding the index of (x,y) up to length n, while n is known by the protocol, we observe that the index of (x, y) and c(x, y) can be concatenated without delimiters. Consequently, C(x,y|P)≤l(c(x,y)) + n + O(1). ◻

Since there are 22n pairs of n-length strings, we can choose x, y with C(x, y\P) ≥ 2 n . Thus, the worst-case communication complexity for the function f is nc. There are 22n−22ncl pairs x, y with C(x, y|P) ≥ 2nc1. Hence, the average-case communication complexity for the function fisn-O(1).

6.21 Exercises

6.11.1. [24] Assume that a function f : {0,1} n × {0,1} n → {0,1} satisfies C(f|n) ≥ 22nn: the truth table describing the outcomes of f for the 2 n possible inputs x (the rows) and the 2 n possible inputs for y (the columns) has high Kolmogorov complexity. If we flip the truth table for a prospective f using a fair coin, then with probability at least 1−2n it will satisfy this. Show that every deterministic protocol P computing such a function f requires at least CC(x, y|P) ≥ min{C(x|P), C(y|P)}-logn-O(1).

Comments. Source: H.M. Buhrman, H. Klauck, N.K. Vereshchagin, P.M. B. Vitányi, J. Comput. Syst. Sci. 73(2007), 973–985.

6.11.2. [24] Let f be the equality function, with f(x, y) = 1 if x = y and 0 otherwise. Show that for every deterministic protocol P computing f, we have CC(x, x|P) ≥ C(x|P) − O(1) for all x, y. On the other hand, there is a P of complexity O(1) such that there are x, y (xy) with C(x|P), C(y|P) ≥ n − 1 for which CC(x, y|P) = 2.

Comments. Source: H.M. Buhrman, H. Klauck, N.K. Vereshchagin, P.M. B. Vitányi, Ibid.

6.11.3. [35] Define the protocol-independent communication complexity TCC(x,y|C(P) ≤ i), of computing a function f(x,y), as the minimum CC(x,y|P) over all deterministic total protocols P computing f(x, y) for all pairs (x, y) (l(x) = l(y) = n) with C(P) ≤ i. For example, TCC(x, y|C(P) ≤n + O(1)) = 0 for all computable functions f and all x,y.
  1. (a)

    Show that for every computable function f(x, y) its TCC(x, y|C(P) ≤ i) is always at most the TCC(x,y|C(P) ≤ i) of the identity function I(x,y) = (x,y), for every x,y,i.

  2. (b)

    Show that for every computable function f we have TCC(x, y|C(P) ≤ i) ≥ C(f(x,y)|x) − iO(logi). For f = I this gives TCC(x, y|C(P) ≤ i) ≥C(y|x)-i-O(logi).

  3. (c)

    Show that for the identity function I, restricting the protocols to one-way (Bob sends a single message to Alice only) does not significantly alter the protocol-independent communication complexity for total protocols: TCC(x, y|C(P) ≤ i+O(1), Pis one-way) ≤TCC(x, y|C(P) ≤ i), where in the right-hand side P is allowed to be two-way.


Comments. Source: H.M. Buhrman, H. Klauck, N.K. Vereshchagin, P.M. B. Vitányi, Ibid.

6.11.4. [37] We continue Exercise 6.11.3. Let h x (i) be the structure function as in Definition 5.5.6 on page 405. Define, with P a protocol that computes the identity function I, the protocol-size function py(j) = min{i : TCC(x, y|C(P) ≤ i, P is one-way) ≤ j}. The function p y (j) gives the minimal number of bits of a protocol of the total deterministic type considered that transmits y ∈ {0,1}* in at most j bits of communication. By Exercise 6.11.3, Item (c), total one-way protocols are as powerful as total two-way protocols of about the same complexity. Note that the one-way protocol does not depend on x.
  1. (a)

    Show that p y (j) = h y (j) + O(logn) for all y and j.

  2. (b)

    Use item (a) to show the following: (i) For every string y of length n we have p y (n) = O(1) and 0 ≤ p y (j) − p y (k) ≤k-j + O(logn), for every j < kn. Conversely, if p is a function from {0,1,…, n} to the natural numbers satisfying the conditions in (i), with the O(1), O(logn) replaced by 0, then there is a string y of length n such that p y (j) = p(j) + O(logn + C(p)), where C(p) stands for the complexity of the set {(j,p(j)) :j∈{0,…,n}}.

  3. (c)

    Show that there exist noncommunicable strings in the following sense. Let k < n. Apply Item (b) to the function p defined as p(j) = k for jn-k and p(j) = n-j for jn-k. By Item (b) there exists a string y of length n such that p y (0) = k + O(logn) (thus C(y) =k + O(logn)) and the protocol-independent communication complexity for the identity function I is TCC(x, y|C(P) ≤ i, P is one-way) >n-i- O(logn) for every i <k-O(logn).


Comments. Item (c) shows that Bob can hold a highly compressible string y, but cannot use that fact to reduce the communication complexity significantly below l(y). Unless all information about y is hardwired into the protocol, the communication between Bob and Alice requires sending y almost completely literally. Indeed, For such y with, say, C(y) = logo(1) n, we have (irrespective of x) communication complexity that is exponential in the complexity of y for all protocols of complexity less than that of y. When the complexity of the protocol reaches the complexity of y, the communication complexity suddenly drops to 0. Source: H.M. Buhrman, H. Klauck, N.K. Vereshchagin, P.M.B. Vitányi, Ibid.

6.11.5. [33] Let the protocol-independent communication complexity PCC(x,y|C(P)≤i) stand for the minimum CC p(x,y) over all partial deterministic protocols P of complexity at most i computing f correctly on input (x, y) (on other inputs P may output incorrect results or not halt). Trivially, PCC(x,y|C(P)≤i) ≤ TCC(x,y|C(P) ≤ i) for every computable function.
  1. (a)

    Show that for the identity function I we have C(y|x) − iO(log i) ≤ PCC(x,y|C(P)≤i) ≤ PCC(x,y|C(P) ≤ i, P is one-way) ≤ C(y) for all x, y, i such that i is at least log C(y) + O(1). The addition ‘one-way’ means that Bob communicates with Alice but not vice versa.

  2. (b)

    Prove that for the identity function Iwe have PCC(x,y|C(P) = O(logn),P is one-way) ≥ C(y|x) + O(logn), for all x,y of length n.


Comments. Item (a) is obvious. Comparing Item (b) with Exercise 6.11.4, Item (c), we see that protocol-independent communication complexity for the identity function Iof one-way partial deterministic protocols is strictly less than that of one-way total deterministic protocols (there are no noncommunicable objects for protocol-independent communication complexity of partial protocols). Moreover, by Exercise 6.11.3, Item (c), the protocol-independent communication complexity for the identity function Iof one-way total deterministic protocols equals that of two-way ones. Hint for Item (b): use Muchnik's theorem, Theorem 8.3.7, on page 654. H.M. Buhrman, H. Klauck, N.K. Vereshchagin, P.M.B. Vitányi, Ibid.

6.11.6. [34] In Theorem 6.11.1 it was shown that for a deterministic protocol of say, complexity O(1), to compute the identity function Alice and Bob need to exchange about C(y) bits, even if the required information C(y|x) is much less than C(y). Show that for randomized protocols the communication complexity is close to Cy|x).

Comments. Source: H.M. Buhrman, M. Koucký, N.K. Vereshchagin, Randomized individual communication complexity, Manuscript, CWI, 2006.

6.22 Circuit Complexity

A key lemma in the study of circuit complexity is the so-called Haståd's switching lemma. It is used to separate depth-k and depth-(k + 1) circuit classes, and to construct oracles relative to which the polynomial hierarchy is infinite and properly contained in PSPACE. The traditional proof of this lemma uses sophisticated probabilistic arguments. We describe a simple elementary proof using the incompressibility method.

According to Definition 5.3.3 on page 376, a k-DNF formula is a disjunction of conjunctions with each conjunct (or term) containing at most k literals. A k-CNF is a conjunction of disjunctions with each disjunct (or clause) containing at most k literals.

6.22.1 Definition 6.12.1

A restriction ρ is a function from a set of variables to {0,1,✯}. Given a Boolean function f, f| p is the restriction of f in the natural way: x i is free if ρ(x i ) = ✯ and x i takes on the value ρ(x i ) otherwise. The domain of a restriction ρ, dom(ρ), is the set of variables mapped to 0 or 1 by ρ

We can also naturally view a restriction ρ as a term of f if fρ = 1. A minterm is a restriction such that no proper subset of the variables set by the restriction forms a term. Let R t be the set of restrictions on n variables that leave l variables free. Obviously, \(d\left({R_l } \right)\, = \,\left({_l^n } \right)2^{n - l}.\)

6.22.2 Lemma 6.12.1

(Switching lemma) Let f be a t-CNF on n variables, ρ a random restriction ρ ∈ R l and α = 12tl/n ≥ 1. Then the probability that f|ρ is ans-DNFis at least 1 − α s .

Proof. Fix a t-CNF f on n variables, and integers s and l < n. Note that f|ρ is l-DNF; we can assume sl. In this proof, we will use conditional complexity C(∙x), where x denotes the list of fixed values of f,t, l, n, s and several (fixed) programs needed later.

6.22.3 Claim 6.12.1

For any ρ ∈ R l such that f|ρ is not s-DNF, ρ can be effectively described by some ρ′ ∈ Rls, a string σ ∈ {0,1,✯} st such that σ has s non✯ positions, and x. That is, C(ρ|ρ′,σ,x) = O(1).

Before proving Claim 6.12.1, we show that it implies the switching lemma. Fix a random restriction ρ ∈ R l with
$$C\left({\rho \left| {\rm{X}} \right.} \right) \ge \log \left({d\left({R_l } \right)\alpha ^s } \right),$$

where α = 12tl/n ≥ 1. If we show that f|ρ is an s-DNF, then since there are at least d(R l )(1 − α s ) ρ's in R l satisfying Equation 6.22 by Theorem 2.2.1, this will imply the lemma.

Assume that f|ρ is not an s-DNF and ρ′ ∈ Rl-s as in Claim 6.12.1. Obviously C(ρ′|x) ≥ log d(Rl-s). Since l(σ) = st and σ has s non✯ positions, we have
$$C\left({\sigma \left| {\rm{X}} \right.} \right) \le \log \left({\begin{array}{*{20}c} {st} \\ s \\\end{array}} \right) + s \le s\log et + s = s\log 2et,$$
by standard estimation (Stirling's approximation), where e = 2.718.… By Claim 6.12.1, we have
$$C\left({\rho \left| {\rm{x}} \right.} \right) \le C\left({\rho '\left| {\rm{x}} \right.} \right) + C\left({\sigma \left| {\rm{x}} \right.} \right) \le \log d\left({R_{l - s} } \right) + s\log 2et.$$
By Equations 6.22 and 6.23, we have
$$d\left({R_l } \right)\alpha ^s \le d\left({R_{l - s} } \right)2^{s\,\log 2et}.$$
Substituting \(\left({_l^n } \right)2^{n - l} {\text{for}}\,d\left({R_l } \right)\) and \(\left({_{l - s}^n } \right)2^{n - l + s} {\text{for}}\,d\left({R_{l - s} } \right),\) and using the fact \(\left({_l^n } \right)/\left({_{l - s}^n } \right) {\underline >} \,\left({\left({n - l + s} \right)/l} \right)^s,\), we obtain
$$\frac{{12tl}}{n} \le \frac{{4etl}}{{n - l + s}}.$$

But the above formula cannot hold simultaneously with 12tl/n ≥ 1, a contradiction. This proves the switching lemma.

Proof. (For Claim 6.12.1) Given a t-CNF f on n variables and a restriction ρ∈R l such that f|ρ is not s-DNF. Let
$$f = \mathop \wedge \limits_{j = 1}^k D_j,$$

where each D j is a disjunct of size at most t. We also can write f|ρ as a DNF: f|ρ = V j C j , where each C j (the corresponding restriction) is a minterm of f|ρ. Since f|ρ is not an s-DNF, there must be a minterm π that contains at least s + 1 variables. We will extend ρ to ρ′ using s of the variables of π.

First we split π into subrestrictions. Assume that π1, …,πi-1 have already been defined and dom(π) − dom(π1 …π i -1) ≠ Ø. Choose the first disjunct D j in Equation 6.24 that is not already 1 under restriction ρπ1 … πi-1. Let S be the set of variables that appear both in D j and in dom(π) − dom(π1 …πi-1). Define π i as
$${\rm{\pi }}_i \left(x \right) = \left\{ {\begin{array}{*{20}c} {{\rm{\pi }}\left(x \right)} \\ \star \\\end{array}\begin{array}{*{20}c} {{\rm{if}}\,x \in S,} \\ {{\rm{otherwise}}{\rm{.}}} \\\end{array}} \right.$$

Because π is a minterm, it must force each disjunct to 1, and no subrestriction of π (namely π1 … πi-1) will. Thus the above process is always possible. Let k be the least integer such that π1 …πk,. sets at least s variables. Trim π k so that π1 …π k sets exactly s variables.

Change π i to π̃ i : for each variable x ∈ dom(π i ), if it appears in the corresponding D j as x then π̃ i (x) = 0; if it appears in D j as ̄ then π̃ i (x) = 1. Thus π i ≠ π̃ i , since ρπ1 …πi-1π i forces D j to be 1 but ρπ1 … πi-1π̃ i does not. If x is the mth variable in D j , the mth digit of σ(i) is
$$\sigma _m^{\left(i \right)} = \left\{ {\begin{array}{*{20}c} {{\rm{\pi }}_i \left(x \right)} \\ \star \\ \end{array}} \right.\begin{array}{*{20}c} {{\rm{if}}\,x \in {\rm{dom}}\left({{\rm{\pi }}_i } \right)\left({ = {\rm{dom}}\left({{\rm{\tilde \pi }}_i } \right)} \right),} \\ {{\rm{otherwise}}{\rm{.}}} \\ \end{array}$$
Since D j is of size at most t, l(i)) = l(D j ) ≤ t. Let
$$\rho ' = \rho {\rm{\tilde \pi }}_1 \ldots {\rm{\tilde \pi }}_k,\,\sigma = \sigma ^{\left(1 \right)} \ldots \sigma ^{\left(k \right)} \star ^{st - kt}.$$

Pad σ with ✯'s so that l(σ) = st. Since π1…π k sets exactly s variables, σ has s non✯ positions.

Now we show how to recover π1, …, π k , hence ρ, from σ and ρ′ (given x). Assume that we have already recovered π1, …,πi-1., from which we can infer ρπ1 πi-1π̃i …π̃ k , using ρ′. Recall that π i was defined by choosing the first clause D j not already forced to 1 by ρπ1 … π i-1 Since π̃i does not force D j to be 1 and π̃i+1 … π̃ k are defined on variables not contained in D j , we simply identify D j from f|ρπ1…πi-1π…π̃ k , as the first non-1 clause. Given D j , recover π i using σ(i). With ρπ …∙π k and π1…π k , we can recover ρ. This proves Claim 6.12.1, and hence the theorem. ◻ ◻

Let us summarize the central ideas in the above proof. When we choose a random ρ, the number of bits needed to specify each extra variable is roughly O(logn). However, the fact that f|ρ is not an s-DNF implies that it has a long minterm, and this allows us to construct a σ, together with ρ′, specifying s extra variables at the expense of roughly log 2et bits per variable. So a large term is a kind of regularity a random restriction ρ does not produce.

6.22.4 Example 6.12.1

Lemma 6.12.1 is a powerful lemma in circuit complexity. Let's define a depth-k (unbounded fan-in Boolean) circuit as follows: The input to the circuit is I= {x1,…, x n , ̄,1…, ̄ n }. The circuit has k alternating levels of AND and OR gates, each with unbounded fan-in. The kth (top) level contains just one gate, which gives the output of the circuit. Each gate in the ith level gets an arbitrary number of inputs from the outputs of the (i − 1)st level, assuming that Iis at the zeroth level. A parity function f(x1x n ) equals 1 if and only if an odd number of x i 's are 1's. It is easy to show that a polynomial-size depth-2 circuit cannot compute parity. Assume that this is the case for k − 1. For a depth-k circuit, if we apply a random restriction to it, then by Lemma 6.12.1, with high probability, we can switch the bottom two levels, say, from AND-OR to OR-AND. Then the second-level OR can merge with the third-level OR, hence reducing the circuit depth to k − 1. Note that a restriction of a parity function remains a parity function. Making this kind of induction precise, one can prove the following: there is a constant c> 0 such that a depth-k circuit with at most \(2^{c^{k/k - 1} n^{1/k - 1} } \)gates cannot compute parity. ◻

6.23 History and References

Apparently, W.J. Paul [Proc. Int. Conf. Fund. Comput. Theory, L. Budach, ed., 1979, pp. 325–334] is the pioneer of using the incompressibility method, and he proved several lower bounds with it. R.V. Freivalds [“On the running time of deterministic and nondeterministic Turing machines,” Latv. Mat. Ezhegodnik, 23(1979) 158–165 (in Russian)] proved a lower bound on the time of Turing machine computations for a certain problem, implicitly using a Kolmogorov complexity argument in a veiled form of ‘optimal enumerations,’ justified by the invariance theorem, Theorem 2.1.1. He did not use the incompressibility method.

The initially most influential paper is probably the paper by W.J. Paul, J. Seiferas, and J. Simon, [J. Comput. System Sci., 23:2(1981), 108–126]. This was partly because the paper by W.J. Paul, which contains the example of Section 6.1.1, was not widely circulated.

The aim of the paper by Paul, Seiferas, and Simon was “to promote the approach” of applying Kolmogorov complexity to obtain lower bounds. In that paper, using incompressibility arguments, the authors greatly simplified the proof of a difficult theorem proved by S.O. Aanderaa [pp. 75–96 in: Complexity of Computation, R. Karp, ed., Amer. Math. Soc., 1974], which proves that real time simulation of k tapes by k -1 tapes is impossible for deterministic Turing machines. Earlier, M.O. Rabin [Israel J. Math., 1(1963), 203–211] proved the particular case k = 2 of this result. In 1982, W.J. Paul [Inform. Contr., 53(1982), 1–8] further improved Aanderaa's result from real-time to nonlinear lower bounds by incompressibility arguments. In the same year, S. Reisch and G. Schnitger [Proc. 23rd IEEE Found. Comput. Sci., 1982, pp. 45–52] published a paper giving three applications of incompressibility in areas other than Turing machine computational complexity. (The authors later lost contact with each other and they have never written up a journal version of this paper.) Subsequently, incompressibility arguments started to be applied to an ever-increasing variety of problems.

Lemma 6.1.1 in Section 6.1.1 was first proved by F.C. Hennie using a counting argument in [Inform. Contr., 8:6(1965), 553–578]. The proof we give here is due to W.J. Paul. Section 6.1.2 is based on [R. Beigel, W. Gasarch, M. Li, and L. Zhang, Theoret. Comput. Sci., 191(1998), 245–248]. The original probabilistic analysis is in [A.W. Burks, H.H. Goldstine, and J. von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument,” Institute for Advanced Studies, Report (1946). Reprinted in John von Neumann Collected Works, Vol. 5, 1961]. Improved probabilistic analysis can be found in [B.E. Briley, IEEE Trans. Computers, C-22:5(1973)] and [G. Schay, Amer. Math. Monthly, 102:8(1995), 725–730]. Background material on adder design can be found in [K. Hwang, Computer Arithmetic: Principles, Architecture, and Design, Wiley, New York, 1979]. Lemma 6.1.3 in Section 6.1.3 is due to J. Seiferas and Y. Yesha [Personal communication, 1986]. The idea of proving a lower time bound for palindrome recognition by a probabilistic Turing machine, as mentioned in the comment at the end of Section 6.1 and in Exercise 6.10.13, Item (c), is due to R. Paturi, J. Simon, R. Newman-Wolfe, and J. Seiferas, [Inform. Comput., 88(1990), 88–104].

The discussion in Section 6.2 on the quantitative relation between high-probability properties of finite objects and individual randomness of finite objects is taken from [H.M. Buhrman, M. Li, J.T. Tromp, and P.M.B. Vitányi, SIAM J. Comput., 29:2(1999), 590–599]. With respect to infinite binary sequences the distinction between laws of probability (that hold with probability one) and individual random sequences is discussed in [M. van Lambalgen, Random Sequences, PhD thesis, Universiteit van Amsterdam, Amsterdam, 1987] and [V.V. Vyugin, Theory Probab. Appl., 42:1(1996), 39–50].

Section 6.3, on combinatorics, follows [M. Li and P.M.B. Vitányi, J. Comb. Theory, Ser. A, 66:2(1994), 226–236]. The problems on tournaments in Section 6.3 are from [P. Erdős and J.H. Spencer, Probabilistic Methods in Combinatorics, Academic Press, 1974]. See also [N. Alon, J.H. Spencer, and P. Erdős, The Probabilistic Method, Wiley, 1992] for the probabilistic method. Section 6.3.3 was suggested by W. Gasarch; the lower bound on the Ramsey numbers was originally proved in [P. Erdős, Bull. Amer. Math. Soc., 53(1947), 292–294]; see [P. Erdős and J.H. Spencer, Ibid.]. The lower bound for the coin-weighing problem in Theorem 6.3.4 was established, using probabilistic or information-theoretic methods, by P. Erdős and A. Rényi [Publ. Hungar. Acad. Sci., 8(1963), 241–254], L. Moser [Combinatorial Structures and Their Applications, Gordon and Breach, 1970, pp. 283–384], and N. Pippenger [J. Comb. Theory, Ser. A, 23(1977), 105–115]. The last paper contains proofs, by entropy methods, of Theorem 6.3.4 on page 455 and Exercise 6.3.4 on page 457. Recently, entropy methods have also been used quite successfully in proving lower bounds on parallel sorting [J. Kahn and J. Kim, Proc. 24th ACM Symp. Theory Comput., 1992, pp. 178–187], perfect hashing [I. Newman, P. Ragde, and A. Wigderson, 5th IEEE Conf. Structure in Complexity Theory, 1990, pp. 78–87], and lower bounds on parallel computation [R. Boppana, Proc. 21st ACM Symp. Theory Comput., 1989, pp. 320–326]. On Exercise 6.3.11 on page 460 Ramsey-type results, that were earlier obtained using Lovász's local lemma, are obtained by incompressibility.

Section 6.4 on graphs is primarily based on [H.M. Buhrman, M. Li, J.T. Tromp, and P.M.B. Vitányi, SIAM J. Comput., 29:2(1999), 590–599]. For random graphs in a probabilistic sense see for example [B. Bollobás, Random Graphs, Academic Press, 1985]. The statistics of subgraphs of high-complexity graphs, Theorem 6.4.1, has a corresponding counterpart in quasirandom graphs, and a similar expression is satisfied almost surely by random graphs [N. Alon, J.H. Spencer, and P. Erdős, The Probabilistic Method, Wiley, 1992] pp. 125–140; see especially Property P1(s) on page 126. The latter property may be weaker in terms of quantification of ‘almost surely’ and the o(∙) and O(∙) estimates involved than the result we present here.

The results in Section 6.5 on routing tables are from [H.M. Buhrman, J.H. Hoepman, and P.M.B. Vitányi, SIAM J. Comput., 28:4(1999), 1414–1432]. Related research (see exercises) appears in [E. Kranakis and D. Krizanc [Proc. 3nd Int. Colloq. Structure Inform. Communication Complexity, Siena, Italy, 1996, pp. 119–124; Proc. 13th Symp. Theoret. Aspects Comput. Sci., 1996, pp. 529–540], and E. Kranakis, D. Krizanc, and F. Luccio, Proc. 13th Symp. Math. Found. Comput. Sci., 1995, pp. 392–401].

Heapsort was originally discovered by J.W.J. Williams [Comm. Assoc. Comp. Mach., 7(1964), 347–348]. R.W. Floyd [Comm. Assoc. Comp. Mach., 7(1964), 701] subsequently improved the algorithm. Researchers had previously tried to analyze the precise average-case complexity of Heapsort with no success. For example, the analysis typically works only for the first step; after one step, the heap changes and certain properties such as that all heaps are equally likely no longer hold. Section 6.6.1 is based on an explanation by I. Munro on a summer evening in 1992. The solution to the average-case complexity of Heapsort was first obtained by R. Schaffer and R. Sedgewick [J. Algorithms, 15(1993), 76–100]. The proof in the form given in Section 6.6.1 is due to I. Munro. Claim 6.6.1, that the Heapify procedure produces a random heap from a random input, was observed by T. Jiang, at a 1993 Dagstuhl seminar, and I. Munro.

Shellsort was discovered by D.L. Shell [Comm. Assoc. Comp. Mach., 2:7(1959), 30–32)]. Section 6.6.2 on Shellsort is based on [T. Jiang, M. Li, and P.M.B. Vitányi [Proc. Int. Colloq. Aut. Lang. Progr., Lect. Notes Comp. Sci., Vol 1644, Springer-Verlag, Berlin, 1999, 453–462; J. Assoc. Comp. Mach. 47:5(2000), 905–911], where the reader can also find papers on worst-case analysis of Shellsort not discussed here. Previously, D.E. Knuth [The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, 1973, 1998] showed that the average running time for two-pass Shellsort is Θ(n5/3) for the best choice of increments; A.C.C. Yao [J. Alg., 1(1980), 14–50] analyzed the three-pass case without giving a definite running time; Yao's analysis was extended by S. Janson and D.E. Knuth [Random Struct. Alg. 10(1997), 125–142] to an O(n23/15) upper bound on the average running time for three-pass Shellsort.

Section 6.7 on longest common subsequences is from [T. Jiang and M. Li, SIAM J. Comput., 24:5(1995), 1122–1139]. Section 6.8 on formal language theory follows [M. Li and P.M.B. Vitálnyi, SIAM J. Comput., 24:2(1995), 398–410]. For the history of and an introduction to formal language theory, see [M.A. Harrison, Introduction to Formal Language Theory, Addison-Wesley, 1978; J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, 1979].

The proof in Section 6.9 of the lower bound on the time required for linear context-free language recognition is due to J. Seiferas [Inform. Contr., 69(1986), 255–260], simplifying the original proof of H. Gallaire [Inform. Contr., 15(1969), 288-295]. The latter paper improved a logn multiplicative factor weaker result by F.C. Hennie. Gallaire's proofuses a complicated counting argument and de Bruijn sequences. T. Kasami [Inform. Contr., 10(1967), 209–214] proved that linear context-free languages can be online recognized in O(n2) by a one-work-tape Turing machine.

It is fair to say that the solutions to the conjectures on k-PDA in Exercise 6.9.3, string-matching in Exercise 6.9.2, Theorem 6.10.1, and many exercises in Section 6.10 would not have been possible, at least not proved in such a short period of time, without the use of incompressibility arguments. Recently, T. Jurdziński and K. Loryś [Proc. 29th Int. Colloq. Aut., Lang., Prog., 2002, pp. 147–158] used the incompressibility method to prove a 1988 McNaughton-Narendran-Otto conjecture that the Church—Rosser languages do not contain the set of palindromes, hence not CFL f| coCFL. The results in Section 6.10 concerning whether an extra tape adds computational power in various Turing machine models and especially the ‘two heads are better than two tapes’ result in Exercise 6.10.15, could probably not be proven without the incompressibility method. These results were open for decades before they were solved using the incompressibility method and some other Kolmogorov complexity-related techniques, and no other proofs are known.

The results and methods of Section 6.10 on lower bounds for time complexity of Turing machines were instrumental in initiating large-scale use of the incompressibility method. It is well known and easy that if a k-tape Turing machine runs in O(T(n)) time, then it can be simulated by a 1-tape Turing machine in O(T2(n)) time [J. Hartmanis and R. Stearns, Trans. Amer. Math. Soc, 117(1969), 285–306] and by a 2-tape Turing machine in O(T(n) logT(n)) time [F.C. Hennie and R. Stearns, J. Assoc. Comp. Mach., 4(1966), 533–546]. For years, only several weak lower bounds were known with complicated proofs, such as M.O. Rabin's paper from 1963 and S.O. Aanderaa's paper of 1974 above. These papers consider the restricted online model with an extra output tape. For the more general model used in Theorem 6.10.1, P. Dŭriš, Z. Galil, W.J. Paul, and R. Reischuk [Inform. Contr., 60(1984), 1–11] proved that it requires Ω(nlogn) time to simulate two tapes by one. Research advanced quickly only after the incompressibility argument was invented. W.J. Paul [Inform. Contr., 53(1982), 1–8] proved Exercise 6.10.13, Item (a), on page 512, improving Aanderaa's result. Around 1983/1984, independently and in chronological order, Wolfgang Maass at UC Berkeley, one of us [ML] at Cornell, and the other one [PV] at CWIAmsterdam, obtained an Ω(n2) lower bound on the time to simulate two tapes by one tape (deterministically), and thereby closed the gap between 1 tape versus k ≥ 2 tapes (Exercise 6.10.2 on page 508). All three relied on Kolmogorov complexity, and actually proved more in various ways. (One of us [PV], at first not realizing how to use incompressibility, reported in [P.M.B. Vitányi, Theoret. Comput. Sci., 34(1984), 157–168] an Ω(n3/2) lower bound on the time to simulate a single pushdown store online by one oblivious tape unit. However, after being enlightened by J. Seiferas about how to use incompressibility with respect to another result, he realized how to apply it to the 1-tape versus 2-tape problem without the oblivious restriction [P.M.B. Vitányi, Inform. Process. Lett., 21(1985), 87–91 and 147–152], and the optimal results cited below.) W. Maass also obtained a nearly optimal (almost square) lower bound for nondeterministic simulation (Exercise 6.10.4 on page 508) [W. Maass, Trans. Amer. Math. Soc, 292(1985), 675–693]. Maass's lower bound on nondeterministic simulation was improved bc Z. Galil, R. Kannan, and E. Szemerédi [Proc. 18th ACM Symp. Theory Comput, 1986, pp. 39–49] to Ω(n2/log(k) n ) by constructing a language whose computation graph does not have small separators (Exercise 6.10.5 on page 509). The exercises contain many more lower bounds which were proved in this direction. Section 6.10 is based on [M. Li and P.M.B. Vitányi, Inform. Comput, 78(1988), 56–85], which also contains results on tapes versus stacks and queues. Many lower bounds for various models of computation, such as machines with extra two-way input tapes, machines with queues, random access machines, machines with many heads on a tape, machines with tree tapes, machines with k-dimensional tapes, and probabilistic machines, have since been proved using Kolmogorov complexity. We have tried to cover these results in the exercises, where also the references are given.

Communication complexity was invented by A.C.C. Yao, [Proc. 11th ACM Symp. Theory Comput, 1979, 209–213]. The main reference is [E. Kushilevitz, N. Nisan, Communication Complexity, Cambridge Univ. Press, 1997]. These works consider the worst-case or average-case complexity. Section 6.11, on individual communication complexity, is based on [H.M. Buhrman, H. Klauck, N.K. Vereshchagin, P.M.B. Vitányi, J. Comput. Syst. Sci. 73(2007), 973–985].

The proof of Lemma 6.12.1 in Section 6.12 is based on a paper by L. Fortnow and S. Laplante [Inform. Comput, 123(1995), 121–126], which in turn was based on a proof by A. Razborov [pp. 344–386 in Feasible Mathematics II, P. Clote, J. Remmel, eds., 1995]. This lemma was originally proved by J. Håståd [pp. 143–170 in Randomness and Computation, S. Micali, ed, JAIPress, 1989] for the purpose of simplifying and improving Yao's lower bound on unbounded circuits [A.C.C. Yao, Proc. 26th IEEE Symp. Found. Comput. Sci., 1985, pp. 1–10]. Note that in Lemma 6.12.1, in order to simplify the proof, we have α = 12tl/n instead of Håståd's α = 5tl/n or Fortnow and Laplante's α = 5.44tl/n. See also [M. Agrawal, E. Allender, and S. Rudich, J. Comput. Syst. Sci., 57:2(1998), 127–143] for more circuit lower bounds by incompressibility.

There are applications of the incompressibility method and Kolmogorov complexity we do not cover. This is because there are by now simply too many of them. Also, some applications require lengthy discussions of computational models and preliminary facts; and some others are indirect applications. A.M. Ben-Amram and Z. Galil [J. Assoc. Comp. Mach., 39:3(1992), 617–648] use Kolmogorov complexity to formalize the concept of incompressibility for general data types and prove a general lower bound for incompressible data types. J. Shallit and his coauthors in a series of papers study a variation of descriptional complexity, ‘automaticity’ where the description device is restricted to finite automata; see [J. Shallit and Y. Breitbart, Proc. 11th Symp. Theoret. Aspects Comput. Sci., 1994, pp. 619–630 and J. Comput. System Sci. 53:1(1996), 10–25]. U. Vazirani and V. Vazirani [Theoret. Comput. Sci., 24(1983), 291–300] studied probabilistic polynomial-time reductions. It is possible to do their reduction by Kolmogorov complexity. Kolmogorov complexity has also been studied in relation to the tradeoff of table size and number of probes in hashing by H.G. Mairson [Proc. 24th IEEE Found. Comput. Sci., 1983, pp. 40–47]. See also [K. Mehlhorn, Proc. 23rd IEEE Found. Comput. Sci., 1982, 170–175]. D. Hammer and A.K. Shen [A strange application of Kolmogorov complexity, Theor. Comput. Syst, 31:1(1998), 1–4] use complexity to derive a geometric relation, and a geometric relation to derive a property of complexity. Namely, from 2C(a, b, c) ≤ C(a, b) + C(b, c) + C(c, a) + O(logn) one can derive ǁVII2 ≤ ǁS xy ǁ ∙ ǁS yz ǁ ∙ ǁS zx ǁ. Here V is a set in three-dimensional space, S xy ,S yz ,S zx are its two-dimensional projections, and ǁ ∙ ǁ denotes volume. Moreover, from the well-known Cauchy-Schwarz inequality one can derive 2K(a, b, c) ≤ K(a, b) + K(b, c) + K(c, a) + O(1). The incompressibility method has been applied to logical definability by M. Zimand [Inform. Process. Lett, 57(1996), 59–64] and to finite-model theory and database query languages by J. Tyszkiewicz [Inf. Comput, 135:2(1997), 113–135; Proc. 8th Int. Conf Database Theory, Lect Notes Comput Sci., Vol. 893, Springer-Verlag, 1995, pp. 97–110]. M. Zimand [Ibid.] studies a ‘high-low Kolmogorov complexity law’ equivalent to a 0–1 law in logic. See also [R. Book, SIAM J. Comput, 23(1994), 1275–1282]. K.W. Regan [Proc. 10th IEEE Conf Structure in Complexity Theory, 1995, pp. 50–64] uses Kolmorogov complexity to prove superlinear lower bounds for some problems in a type of hierarchical memory model that charges higher cost for nonlocal communication.

Copyright information

© Springer Science + Business Media, LLC 2008

Authors and Affiliations

  1. 1.Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada
  2. 2.Centrum voor Wiskunde en InformaticaAmsterdamThe Netherlands

Personalised recommendations