Theorems of 2009

  • Bogdan GrechukEmail author


This chapter contains descriptions of 14 great theorems published in the Annals of Mathematics in 2009.

9.1 On De Giorgi’s Conjecture in Dimension at Most 8

The Derivative and Differential Equations

One of the central concepts in the whole of mathematics is the derivative,  which for a function \(u:{\mathbb R}\rightarrow {\mathbb R}\) is defined as \(u'(x)=\lim \limits _{\varepsilon \rightarrow 0} (u(x+\varepsilon )-u(x))/\varepsilon \), provided that the limit exists. There are numerous useful rules which help us to calculate derivatives for various functions, for example \((u+v)'(x)=u'(x)+v'(x)\), \((uv)'(x)=u'(x)v(x)+u(x)v'(x)\), \(u'(x)=f'(g(x))g'(x)\) if \(u=f(g(x))\), etc. We also have tables of derivatives for various functions, e.g. \((x^n)'=nx^{n-1}\) for any \(n\ne 0\), \((\ln x)'=\frac{1}{x}, \, x>0\), etc.

If we know the derivative of a function, we can recover it (up to an additive constant) using integration:  if \(u'(x)=f(x)\) then \(u(x)=\int f(x)dx + C\) for some \(C\in {\mathbb R}\). For example, \(u'(x)=2x\) implies that \(u(x)=\int 2x dx + C = x^2+C\). An equation of the form \(u'(x)=f(x)\) is the simplest example of a differential equation,  that is, an equation involving derivatives. Differential equations arise in numerous application of mathematics, like physics, biology, and economics, and they are extremely important in pure mathematics as well.

A Slightly More Complicated Differential Equation

A slightly more complicated differential equation is of the form \(u'(x)=f(u)\), for example
$$\begin{aligned} u'(x)=1-u^2. \end{aligned}$$
To solve such an equation, we use the notation \(\frac{du}{dx}\) for the derivative, rewrite equation \(\frac{du}{dx}=f(u)\) in the form \(\frac{du}{f(u)}=dx\), and take the integral of both sides. For example, Eq. (9.1) can be written as \(\frac{du}{1-u^2}=dx\), or \(\int \frac{du}{1-u^2}=\int dx\). A table of integrals tells us that \(\int \frac{du}{1-u^2} = -\frac{1}{2}\ln \frac{1-u}{1+u}+C_1\) for \(|u|<1\), and \(\int dx = x+C_2\), so the equation reduces to \(\ln \frac{1-u}{1+u}=-2(x+C)\), where \(C=C_2-C_1\), or \(\frac{1-u}{1+u}=e^{-2(x+C)}\), where e is the base of the natural logarithm. This results in \(u(x)= \tanh (x+C)\), \(C\in {\mathbb R}\), where \(\tanh \) is the function defined by \(\tanh (x):=\frac{e^x-e^{-x}}{e^x+e^{-x}}\).

Differential Equations Involving the Second Derivative

A differential equation may also include the second derivative \(u''(x)=(u'(x))'\). For example, if \(u(x)=x^3\) then \(u'(x)=3x^2\), and \(u''(x)=(3x^2)'=6x\). Another notation for the second derivative is \(\frac{d^2u}{dx^2}\), so the differential equation \(\frac{d^2u}{dx^2}=6x\) has a solution \(u(x)=x^3\) (and many other solutions as well).

How could we find at least one non-trivial solution (the trivial solution is \(u(x)=0,\, \forall x\)) of a more complicated equation involving the second derivative, like
$$\begin{aligned} \frac{d^2u}{dx^2}=u^3-u \,\,\, ? \end{aligned}$$
Because the second derivative is a polynomial in u, one could naturally guess that the first derivative \(u'\) might also be written as P(u) for some polynomial P. If it were a linear polynomial, that is, \(u'=Au+B\) for some \(A, B \in {\mathbb R}\), then \(u''=(Au+B)'=Au'=A(Au+B)\), again a linear polynomial, not a cubic one as in (9.2).

Next, let us try a quadratic polynomial P , that is, assume that \(u'=Au^2+Bu+C\), where \(A,B, C \in {\mathbb R}\) are some unknown coefficients to be found from (9.2)—this is called the method of undetermined coefficients.  The second derivative is \(u''=(Au^2+Bu+C)'=A(u^2)'+Bu'=A(2uu')+Bu'=(2Au+B)u'=(2Au+B)(Au^2+Bu+C)=(2A^2)u^3+(3AB)u^2+(2AC+B^2)u+BC\). By (9.2), this expression should be equal to \(u^3-u\), which leads us to the system of equations \(2A^2=1\), \(3AB=0\), \(2AC+B^2=-1\), \(BC=0\), which has 2 solutions, one of which is \(A=-1/\sqrt{2}\), \(B=0\), \(C=1/\sqrt{2}\), in which case \(u'=Au^2+Bu+C=(1/\sqrt{2})(1-u^2)\).

The last equation, up to the constant factor \(1/\sqrt{2}\), coincides with (9.1), and, using exactly the same argument as above, we get a family of solutions
$$\begin{aligned} u(x)= \tanh \left( \frac{x+C}{\sqrt{2}}\right) , \,\, \quad C\in {\mathbb R}. \end{aligned}$$
Note that each function from family (9.3) satisfies two additional conditions: (i) \(|u(x)|<1\) for every \(x\in {\mathbb R}\) (this follows from the definition of \(\tanh (x)\)) and (ii) \(u'(x)>0\) for every \(x\in {\mathbb R}\) (this follows from \(u'(x)=1-u^2\) and (i)). This implies that the functions (9.3) are increasing, see Fig. 9.1.
Fig. 9.1

Some solutions (9.3) to Eq. (9.2)

Differential Equations with Functions of Two Variables

Derivatives and differential equations can also be studied for functions of several variables,  such as \(u=u(x, y)\). The partial derivative \(\frac{\partial u}{\partial x}\) for such a function is the derivative with respect to x if y is treated as a constant. The derivative \(\frac{\partial u}{\partial y}\) with respect to y is defined similarly, the second-order derivative \(\frac{\partial ^2 u}{\partial x^2}\) is \(\frac{\partial }{\partial x}\left( \frac{\partial u}{\partial x}\right) \), etc. For example, if \(u=x^3y\), then \(\frac{\partial u}{\partial y}=x^3\), \(\frac{\partial u}{\partial x}=3x^2y\), \(\frac{\partial ^2 u}{\partial x^2}=6xy\), and so on. Would you be able to find at least one non-trivial solution of an equation involving second partial derivatives, like
$$\begin{aligned} \frac{\partial ^2 u}{\partial x^2} + \frac{\partial ^2 u}{\partial y^2}=u^3-u \,\,\, ? \end{aligned}$$
One of the simplest ideas is to try to combine x and y in a linear way, so that \(u(x, y)=g(Ax+By+C)\) for some \(A,B, C \in {\mathbb R}\) and function \(g:{\mathbb R} \rightarrow {\mathbb R}\). In this case, \(\frac{\partial u}{\partial x} = g'(Ax+By+C)\cdot \frac{\partial (Ax+By+C)}{\partial x}=Ag'(Ax+By+C)\), and therefore \(\frac{\partial ^2 u}{\partial x^2} = A g''(Ax+By+C)\cdot \frac{\partial (Ax+By+C)}{\partial x}=A^2g''(Ax+By+C)\). Similarly, \(\frac{\partial ^2 u}{\partial y^2} =B^2g''(Ax+By+C)\). Substituting this back into the equation and defining \(z=Ax+By+C\), we get \((A^2+B^2)g''(z)=g^3(z)-g(z)\). If \(A^2+B^2=1\), this is exactly the Eq. (9.2) above, which has solutions in the form \(g(z)= \tanh \left( \frac{z+C}{\sqrt{2}}\right) \). Hence, for every \(C\in {\mathbb R}\), and every A and B such that \(A^2+B^2=1\), the function \( u(x, y) = \tanh \left( \frac{Ax+By+C}{\sqrt{2}}\right) \) is a solution to (9.4). Note that if we choose \(B>0\), this solution satisfies the condition (i) \(|u(x, y)|<1\) and (ii) \(\frac{\partial u}{\partial y}>0\) for every \(x, y\in {\mathbb R}\), and also has a special geometric structure: if we fix any \(\lambda \in (-1,1)\), the set of points (xy) such that \(u(x, y)=\lambda \) (such sets are called the level sets)  forms a line \(Ax+By+C=z\), where z is such that \(\tanh (z)=\lambda \).

Differential Equations with Functions of n Variables

We can write an equation similar to (9.4) for functions \(u=u(x_1, x_2, \dots , x_n)\) in any number of variables, that is,
$$\begin{aligned} \frac{\partial ^2 u}{\partial x_1^2} + \frac{\partial ^2 u}{\partial x_2^2} + \dots + \frac{\partial ^2 u}{\partial x_n^2}=u^3-u \,\,\, \end{aligned}$$
and ask to find solutions satisfying the same conditions (i) \(|u|<1\) and (ii) \(\frac{\partial u}{\partial x_n}>0\) for every \(x=(x_1, \dots , x_n)\in {\mathbb R}^n\). In fact, this particular differential equation originates in the theory of phase transition, and is very important and well-studied. By a similar argument as above we may write a solution
$$\begin{aligned} u(x) = \tanh \left( \frac{(x-p)\cdot b}{\sqrt{2}}\right) , \end{aligned}$$
where \(p=(p_1, p_2, \dots , p_n)\in {\mathbb R}^n\), \(b=(b_1, b_2, \dots , b_n)\in {\mathbb R}^n\) is such that \(\sum _{i=1}^n b_i^2 = 1\) and \(b_n>0\), and \(\cdot \) denotes the scalar product in \({\mathbb R}^n\) (that is \((x-p)\cdot b = \sum \nolimits _{i=1}^n (x_i-p_i)b_i\)). Again, for any fixed \(\lambda \), the level set of points \(x\in {\mathbb R}^n\) such that \(u(x)=\lambda \) is described by a linear equation of the form \((x-p)\cdot b = z\) and is called a hyperplane.  However, guessing one family of solutions to (9.5) does not mean finding all the solutions.

The De Giorgi Conjecture

In 1978, De Giorgi [111] made a conjecture  that, for \(n\le 8\), every solution to (9.5) satisfying (i) and (ii) on the whole of \({\mathbb R}^n\) has the property that all level sets are hyperplanes, and therefore is given by (9.6). By 2009, the conjecture has been proved only for \(n=2,3\). The following theorem, proved in [333], is a big advance for \(4 \le n \le 8\).

Theorem 9.1

Suppose u is a solution to (9.5) such that (i) \(|u|<1\); (ii) \(\frac{\partial u}{\partial x_n}>0\) for every \(x=(x_1, \dots , x_n)\in {\mathbb R}^n\); and (iii) \(\lim \limits _{x_n\rightarrow \pm \infty } u(x_1, \dots , x_{n-1}, x_n)=\pm 1\) for every fixed \(x_1, \dots , x_{n-1}\). If \(n \le 8\), then all level sets of u are hyperplanes.

In other words, Theorem 9.1 proves the De Giorgi conjecture for all \(n\le 8\), under the additional condition (iii). In a later work, Manuel del Pino, Michal Kowalczyk and Juncheng Wei  [113] found a counterexample to the De Giorgi conjecture in all dimensions \(n \ge 9\), hence the condition \(n \le 8\) in Theorem 9.1 cannot be removed.


O. Savin, Regularity of flat level sets in phase transitions. Annals of Mathematics 169-1, (2009), 41–78.

9.2 An Efficient Algorithm for Fitting a Smooth Function to Data

Looking for a “Nice” Function Which Fits the Given Data Set

In Sect.  5.2, we discussed the following question. Assume that we have performed some measurements, like the radiation level along a street, at a finite number of points. Can we then “guess” the result of the measurement at any other point?

Mathematically, let \(F(\cdot )\) be the (unknown) function such that F(x) represents the level of radiation at any point x. Let \(x_1, x_2, \dots , x_N\) be the points at which we have performed the measurements, and \(y_1, y_2, \dots , y_N\) be the corresponding results. Then we need to find a function F(x) such that \(F(x_i)=y_i, \, i=1,\dots , N\), see Fig. 9.2a.

Of course, there are infinitely many ways of doing this, for example, we can put \(F(x_i)=y_i, \, i=1,\dots , N\) and \(F(x)=0\) for all other x. However, it is unlikely that the true function F is anything like this. At the very least, we would expect it to be continuous and “smooth”. So, the true question is to find F such that (a) \(F(x_i)=y_i, \, i=1,\dots , N\), and (b) F is a function which is as “nice” as possible.

Which Functions are “Nice”?

However, how do we define which functions are “nice” and which are not, and moreover, measure this “niceness” numerically? One standard approach is to require that
  1. (i)

    The function F does not take too large or too small values;

  2. (ii)

    The function F is smooth and does not increase or decrease too fast. Mathematically, this means that the derivative \(F'\) exists and does not take too large or too small values;

  3. (iii)

    In turn, the rate of increase/decrease of F does not change too suddenly. This means that \(F'\) does not change its values too fast, or, equivalently, that the second derivative of F, denoted \(F^{(2)}\), does not take too large or too small values;

  4. (iv)

    And so on.

Based on this intuition, one natural way to measure the “niceness” of a function \(F:{\mathbb R}\rightarrow {\mathbb R}\) is its \(C^m\) norm,  defined as
$$ \Vert F\Vert _{C^m} := \max \{\,\sup \limits _{x\in \mathbb R} |F(x)|, \,\sup \limits _{x\in \mathbb R} |F'(x)|\,\dots ,\,\sup \limits _{x\in \mathbb R} |F^{(m)}(x)|\}, $$
where \(F^{(m)}\) is the m-th derivative of F. Our problem can then be formalized as
$$ \min \limits _F \Vert F\Vert _{C^m}, \quad \text {s.t.} \quad F(x_i)=y_i, \quad i=1,\dots , N. $$
“Nice” Functions of Several Variables

If we measure the radiation in a city, not along a street, then \(x_i \in {\mathbb R}^2\), \(i=1,\dots , N\), are points in the plane, and the aim is to find the “nicest” function \(F:{\mathbb R}^2 \rightarrow {\mathbb R}\) such that \(F(x_i)=y_i\), \(i=1,\dots , N\). For measurements in space, \(x_i \in {\mathbb R}^3\), \(i=1,\dots , N\), and \(F:{\mathbb R}^3 \rightarrow {\mathbb R}\). More generally, the result of a measurement can depend on n parameters, and, in this case, each \(x_i, \, i=1,2,\dots , N\) is a point in \({\mathbb R}^n\), and the aim is to find the “nicest” function \(F:{\mathbb R}^n \rightarrow {\mathbb R}\) such that \(F(x_i)=y_i\), \(i=1,\dots , N\).

The “niceness” can be defined similarly as above, because the definition of the \(C^m\) norm can be extended to functions \(F:{\mathbb R}^n \rightarrow {\mathbb R}\). If \(F(z_1, z_2, \dots , z_n)\) is such a function, we can assume that the variables \(z_1, \dots , z_{i-1}, z_{i+1}, \dots , z_n\) are fixed, and treat F as a function g of one variable \(z_i\). The derivative of g is called the partial derivative  of F with respect to \(z_i\). This derivative can then be differentiated again with respect to some other variable, and so on. Let us assume that we are allowed to perform m such differentiations, and then evaluate the resulting derivative at any point we wish. For example, let \(n=m=3\), and the function \(F(x,y, z)=xy^2z^3\). We can first differentiate it with respect to, say, z, to get \(3xy^2z^2\), then with respect to x to get \(3y^2z^2\), and then with respect to z again to get \(6y^2z\). Finally, we can substitute any values, say, \(x=1\), \(y=2\), \(z=3\), to get a numerical value \(6\cdot 2^2 \cdot 3 = 72\). For any function \(F:{\mathbb R}^n \rightarrow {\mathbb R}\), its \(C^m\) norm \(\Vert F\Vert _{C^m({\mathbb R}^n)}\) is the maximal possible absolute value of the number which we can get in this way after up to m differentiations. We can then minimize such a norm subject to \(F(x_i)=y_i\), \(i=1,\dots , N\).

Approximate Fitting to the Dataset

In fact, we can also assume that the measurements can be done with some error, and relax the requirement \(F(x_i)=y_i, \, i=1,\dots , N\) to \(|F(x_i)-y_i| \le M\sigma _i, \, i=1,\dots , N\), see Fig. 9.2b. Here, \(\sigma _i, \, i=1,\dots , N\) are some given non-negative numbers, and M is a variable which should be made as small as possible—the smaller M, the better the function F approximates our data \(y_i\).
Fig. 9.2

Exact and approximate fitting of a function to data

Finally, our problem becomes
$$\begin{aligned} \min \limits _F M, \quad \text {s.t.} \quad \Vert F\Vert _{C^m({\mathbb R}^n)} \le M \quad \text {and} \quad |F(x_i)-y_i| \le M\sigma _i, \quad i=1,\dots , N. \end{aligned}$$
Can We Solve Problem ( 9.7 ) Efficiently?

In applications, the dimension n and parameter m are fixed, but N can be very large. Can we solve the optimization problem (9.7) efficiently? A theorem of Fefferman,  which we discussed in Sect.  5.2, implies the existence of an algorithm solving (9.7) in time proportional to \(N^k\), where k is a large constant, which depends on n and m. Of course, even for \(k=10\) or \(k=15\), the running time \(N^k\) becomes impractical already for \(N=100\), while in applications N can be measured in millions.

It is very difficult, and most probably impossible, to solve problem (9.7) both efficiently and exactly. For practical purposes, an approximate solution is often sufficient. We say that an algorithm computes the order of magnitude  of the solution to (9.7) if it returns output \(M'\) such that \(aM' \le M^* \le bM'\), where \(M^*\) is the optimal solution of (9.7), and a and b are some constants, depending only on n and m.

An Efficient Algorithm for the Order of Magnitude

The following theorem of Fefferman and Klartag [146] states that the order of magnitude of the solution to (9.7) can be computed very efficiently.

Theorem 9.2

There exists an algorithm which computes the order of magnitude of the optimal value in (9.7) using at most \(CN \ln N\) operations and at most CN memory, where C is a constant which depends only on m and n.

In Theorem 9.2, “operations” means the usual operations with real numbers, such as addition, subtraction, multiplication, division, or comparison. It is assumed that all these operations are performed with perfect accuracy. “CN memory” means CN memory cells, each of which can store a real number, again with perfect accuracy. Of course, in reality any irrational real number has an infinite amount of digits, hence we need to round such numbers to store them, and all operations are subject to rounding errors. However, these issues are minor and were addressed in subsequent publications by the authors.

Also, Theorem 9.2 just computes (the order of magnitude of) the optimal value in the optimization problem (9.7). Of course, what we really need is to construct a function F for which this optimal value is achieved. In subsequent work [147], the authors developed an algorithm for this task as well.

At first, it is not clear how an algorithm can return a function. After all, a function is defined by its values at every point, and there are infinitely many points. In fact, Fefferman and Klartag’s algorithm works in two stages. At stage one, it takes input data (\(m,n,x_i, y_i,\sigma _i\)) and does some preprocessing. At stage 2, it takes any point \(x_0 \in {\mathbb R}^n\) as an input, and returns a polynomial which approximates F in a neighbourhood of the point \(x_0\).


C. Fefferman and B. Klartag, Fitting a \(C^m\)-smooth function to data I, Annals of Mathematics 169-1, (2009), 315–346.

9.3 A Helicoid-Like Surface with a Hole

How “Curved” is a Curve?

Given a curve, how do we measure how “curved” it is? For this, the concept of curvature  is used. Intuitively, the curvature of any curve at any point is just the “speed of rotation” at this point, while you are travelling along the curve at unit speed. As a simple example, imagine you are travelling along a circle of radius R with unit speed. Then it is clear that your “speed of rotation” is the same throughout your journey. Because it takes you time \(2\pi R\) to rotate around the full angle of \(360^{\circ }\), or \(2\pi \) radians, your “speed of rotation” per unit of time is \(\frac{2\pi }{2\pi R}=\frac{1}{R}\). In other words, the curvature of a circle is the same at every point and is equal to \(\frac{1}{R}\). As another simple example, when you are travelling along a straight line, there is no rotation at all, hence the curvature is 0.

The Curvature of a Parabola

In general, of course, a curve may be straight, or almost straight, at some places, but be “curved a lot” elsewhere, so its curvature may vary from point to point. For example, let us estimate the curvature of the parabola \(y=x^2\) near some point \(x=x_0\). After some short time, you would travel from point \((x_0,x_0^2)\) to point \((x, x^2)\), where \(x=x_0+\varepsilon \) for some small \(\varepsilon \). Then \(x^2=(x_0+\varepsilon )^2 =x_0^2+2x_0\varepsilon +\varepsilon ^2 \approx x_0^2+2x_0\varepsilon = 2x_0 x - x_0^2\). Hence, the direction of your movement is along the line \(y=2x_0 x - x_0^2\), which is parallel to \(y=2x_0 x\). In other words, you move with angle \(\alpha \) with respect the x-axis, so that \(\tan \alpha = 2x_0\).

By the same logic, at the final point \((x_0+\varepsilon ,(x_0+\varepsilon )^2)\) your angle of movement \(\beta \) is such that \(\tan \beta = 2(x_0+\varepsilon )\). By the trigonometric formula, \(\tan (\beta - \alpha ) = \frac{\tan \beta - \tan \alpha }{1 + \tan \beta \tan \alpha } = \frac{2\varepsilon }{1+2(x_0+\varepsilon )\cdot 2x_0} \approx \frac{2\varepsilon }{1+4x_0^2}\). Because \(\beta -\alpha \) is small, \(\beta - \alpha \approx \tan (\beta - \alpha ) \approx \frac{2\varepsilon }{1+4x_0^2}\). This is how much you have rotated yourself.

How much time do you need for this? The distance you have travelled is
$$ \sqrt{(x-x_0)^2+(x^2-x_0^2)^2}=\sqrt{(x-x_0)^2+(x-x_0)^2(x+x_0)^2} \approx \varepsilon \sqrt{1+4x_0^2}. $$
Because your speed was 1, you needed time \(\varepsilon \sqrt{1+4x_0^2}\). Hence, your rotation speed is \(\frac{2\varepsilon }{1+4x_0^2} : (\varepsilon \sqrt{1+4x_0^2}) = \frac{2}{(1+4x_0^2)^{3/2}}\). Note that the curvature is maximal at \(x_0=0\) and is almost 0 for large \(x_0\), which agrees well with our visual impression that the parabola is most “curved” at 0 and looks almost like a straight line “further away” from 0, see Fig. 9.3a.

If we travel along the curve \(y=x^2\) from left to right, the direction of travel rotates counter-clockwise. If we travel along the curve \(y=-x^2\), the rotation at \(x=x_0\) has the same magnitude \(\left( \frac{2}{(1+4x_0^2)^{3/2}}\right) \) but opposite direction, clockwise, and, to emphasize this fact, we can say that in this case the curvature is negative and is equal to \(-\frac{2}{(1+4x_0^2)^{3/2}}\).

The Curvature of Any Curve

In general, the direction of movement at the point \(x=x_0\) along a curve \(y=f(x)\) is parallel to the line \(y=bx\), where \(b=f'(x_0)\) is the derivative  of f at \(x_0\), see Sect.  3.1. A calculation very similar to the one above suggests that the “speed of rotation” is given by the formula
$$\begin{aligned} k = \frac{f''(x_0)}{(1+(f'(x_0))^2)^{3/2}}, \end{aligned}$$
where \(f''(x_0)\) denotes the second derivative of f (that is, the derivative of the function \(f'(x)\)). In particular, for \(f(x)=x^2\) we have \(f'(x)=2x\) and \(f''(x)=2\), so that \(k=\frac{2}{(1+4x_0^2)^{3/2}}\), confirming the calculation above. In fact, we can now forget about the initial semi-formal discussion and just use Eq. (9.8) as the definition of the curvature of any curve which is the graph of a twice differentiable function f. For example, the catenary curve  is defined by the equation
$$ y = a \cosh \left( \frac{x}{a}\right) = \frac{a}{2}(e^{x/a}+e^{-x/a}), $$
where \(\cosh (t) = \frac{1}{2}(e^{t}+e^{-t})\) is called the hyperbolic cosine  of t, and \(a>0\) is the parameter, see Fig. 9.3b for some examples of graphs of catenary curves. The catenary has a physical interpretation as “the curve that an idealized cable assumes under its own weight when supported only at its ends”. Substitution of \(f(x)=a \cosh \left( \frac{x}{a}\right) \) into (9.8) yields the curvature \(k=\frac{1}{a \cosh (x/a)} = \frac{1}{f(x)}\).
Fig. 9.3

a Curvature of a parabola; b Catenary curves; c Principal curvatures of a cylinder; d Catenoid; e Helicoid; f Helicoid with a hole

How “Curved” Is a Surface?

How do we measure “how curved” a two-dimensional surface S is in \({\mathbb R}^3\)? Well, at any point X of the surface S, we can build a vector perpendicular to it, choose a plane containing this vector (called a “normal plane”),  and measure the curvature at X of the curve which is the intersection of the surface and the plane. For example, if S is a sphere with radius R, then the intersection of any normal plane with S is just a circle of radius R, and its curvature is 1 / R.

In general, however, the answer depends on the choice of normal plane. For example, if S is an infinite cylinder with base circle having radius R, and S is any point on S, then one normal plane intersects S on a circle with radius R and curvature 1 / R, while another one intersects S on two parallel lines with curvature 0. We can also construct “intermediate” normal planes intersecting the cylinder in an ellipse, see Fig. 9.3c, and the curvature at X would be between 0 and 1 / R.

Principal, Mean, Gaussian, and Total Curvatures

In general, the minimal and maximal curvatures at X over all choices of normal plains are denoted \(k_1\) and \(k_2\) and called the principal curvatures  of S at X. Their mean \(H=(k_1+k_2)/2\) is called the mean curvature,  while their product \(K=k_1k_2\) is called the Gaussian curvature  of S at X. The integral of the Gaussian curvature over the whole surface is called the total curvature  of S. For example, if S is a sphere of radius R, the principal curvatures are \(k_1=k_2=1/R\), the mean curvature is also 1 / R, the Gaussian curvature is \(1/R^2\), and the total curvature is (Gaussian curvature)\(\cdot \)(Surface volume)\(\,=(1/R^2)(4\pi R^2)=4\pi \). If S is a cylinder, the principal curvatures are 0 and 1 / R, the mean curvature is 1 / 2R, the Gaussian curvature is 0, and hence the total curvature is 0 as well.

To obtain a cylinder, we can take a line \(y=R\) in the x-y coordinate plane, and rotate it in three-dimensional space around the x-axis. If, instead of a line, we rotate a catenary curve \(y = a \cosh \left( \frac{x}{a}\right) \), the corresponding surface is called a catenoid,  see Fig. 9.3d. For a point \(X=(x,a \cosh (x/a), 0)\) on the catenoid S, one normal plane is the x-y coordinate plane, which intersects S at a catenary curve, whose curvature at X is \(\frac{1}{a \cosh (x/a)}\). One can show that this is the maximal possible, and the minimal possible is \(-\frac{1}{a \cosh (x/a)}\). Hence, in this case, the principal curvatures are \(\pm \frac{1}{a \cosh (x/a)}\), hence the mean curvature is identically 0. A surface with mean curvature identically 0 is called a minimal surface,  see Sect.  5.3 for an alternative definition of this concept and a detailed discussion. A trivial example of a minimal surface is the plane, while the catenoid is the first non-trivial example, found by Euler in 1744. The Gaussian curvature of the catenoid is \(-\frac{1}{a^2 \cosh ^2(x/a)}\), and its total curvature turns out to be \(-4\pi \).

Properly Embedded Curves and Surfaces

A plane curve is a set of points (xy) in the Euclidean plane \({\mathbb R}^2\) such that \(x=x(t), y=y(t), t\in I\), where x(t) and y(t) are continuous functions, and I is some (finite or infinite) interval of the real numbers. An example of a curve is the parabola \(x=t,\, y=t^2, \, t \in {\mathbb R}\). A curve is called simple  if it has no self-intersection, that is, for all \(t_1,t_2 \in I\), if \(x(t_1)=x(t_2)\) and \(y(t_1)=y(t_2)\) then \(t_1=t_2\). For example, the parabola is a simple curve, while the curve \(x(t)=t^3-t,\, y(t)=t^2, \, t \in {\mathbb R}\) is not simple, because \((x(-1), y(-1))=(x(1), y(1))=(0,1)\). Another example of a simple curve is \(x(t)=\frac{\sin t}{t}, \, y(t)=\frac{\cos t}{t}, \, 0<t<+\infty \), known as a hyperbolic spiral.  If \(t\rightarrow +\infty \), this curve winds around (0, 0), approaches it, but never reaches it. The part of the curve corresponding to the infinite interval \(1\le t < +\infty \) is contained in the bounded closed unit disk \(\{(x, y):x^2+y^2\le 1\}\).

A curve \(C \subset {\mathbb R}^2\) (or a surface \(S \subset {\mathbb R}^3\)) is called properly embedded  in \({\mathbb R}^2\) (respectively, \({\mathbb R}^3\)), if it has no self-intersections, and its intersection with any compact subset of \({\mathbb R}^2\) (respectively, \({\mathbb R}^3\)) is compact. Intuitively, this means that no “infinite” part of the curve or surface is contained in any finite region. For example, the parabola is properly embedded in \({\mathbb R}^2\), while the hyperbolic spiral is not, because the ‘infinitely long” part of the spiral is contained in a small region around (0, 0). Planes and catenoids are examples of properly embedded surfaces in \({\mathbb R}^3\).

The Helicoid and Its “Generalizations”

After the catenoid, the next discovered example of a properly embedded minimal surface in \({\mathbb R}^3\) was the helicoid.  This is the surface given by
$$ x = s \cos (\alpha t), \quad y = s \sin (\alpha t), \quad z=t, $$
where \(\alpha \) is a constant, and st are real parameters, ranging from \(-\infty \) to \(\infty \), see Fig. 9.3e, and also Sect.  5.3. Unlike planes and cateniods, the helicoid has infinite total curvature.

A surface S is said to have finite topology  if it is homeomorphic to a compact surface with a finite number of points removed (that is, it can be obtained from such a surface via a continuous transformation, see Sect.  8.1 for more details). Since the proof that the helicoid is a minimal surface in 1776, a lot of minimal surfaces have been discovered, but none of them had finite topology and infinite total curvature, and it was an important open question whether such a surface exists, besides the helicoid. This question was resolved positively in 2009.

Theorem 9.3

([398]) There exists a properly embedded minimal surface in \(\mathbb {R}^3\) with finite topology and infinite total curvature, which is not a helicoid.

An example of a surface satisfying the conditions of Theorem 9.3 has got a name: the “embedded genus-one helicoid”.  It looks like a helicoid with a hole, see Fig. 9.3f.


M. Weber, D. Hoffman and M. Wolf, An embedded genus-one helicoid, Annals of Mathematics 169-2, (2009), 347–448.

9.4 Bounding the Condition Number of Random Discrete Matrices

Linear Functions of One and Two Variables

A function \(f:{\mathbb R} \rightarrow {\mathbb R}\) is called linear  if \(f(x+y)=f(x)+f(y)\) for all \(x, y \in \mathbb R\). With \(y=0\), this implies \(f(x+0)=f(x)+f(0)\), hence \(f(0)=0\). With \(y=-x\), we get \(0=f(0)=f(x+(-x))=f(x)+f(-x)=0\), hence \(f(-x)=-f(x)\) for all x. Such functions are called odd functions.

With \(y=x\), we get \(f(2x)=f(x)+f(x)=2f(x)\). Then \(f(3x)=f(2x)+f(x)=3f(x)\), and, by induction, \(f(nx)=nf(x)\) for all x and all non-negative integers n. Because f is an odd function, this implies that in fact \(f(nx)=nf(x), \, \forall x \in {\mathbb R}\), for all integers n.

If \(f(1)=a\), and \(m, n\ne 0\) are any integers, then \(f(m)=f(m \cdot 1)=m \cdot f(1)=ma\), hence \(ma = f(m) = f\left( n\cdot \frac{m}{n}\right) = n f\left( \frac{m}{n}\right) \), hence \(f\left( \frac{m}{n}\right) = \frac{m}{n} a\). In other words, \(f(x)=ax\) for all rational numbers x. If we also assume that f is continuous, this implies that \(f(x)=ax\) for all \(x \in {\mathbb R}\). For example, \(f(x)=2x\) and \(f(x)=x/2\) are linear functions.

Similarly, a function \(f:{\mathbb R}^2 \rightarrow {\mathbb R}^2\), transforming a pair of real numbers (xy) into another pair (uv), is called linear if \(f(x_1+x_2, y_1+y_2)=f(x_1,y_1)+f(x_2,y_2)\). By an argument similar to the one above, one can prove that any linear continuous f has the form \(f(x, y)=(ax+by, cx+dy)\) for some real coefficients abcd. For example, \(f(x, y)=(x+y, -x-y)\) and \(f(x, y)=(x+y, x-y)\) are linear functions.
Fig. 9.4

Geometry of transformations a \(f(x, y)=(x+y, -x-y)\), b \(f(x, y)=(x+y, x-y)\)

Stretching and Contraction

A linear function f is called a stretching if \(|f(x)|>|x|\) for all x, and a contraction if \(|f(x)|<|x|\) for all x. In the one-variable case, \(f(x)=ax\) is a stretching if \(|a|>1\) and a contraction if \(|a|<1\). For functions of two variables, the situation may be more involved. For example, the function \(f(x, y)=(x+y, -x-y)\) is, geometrically, a composition of a projection, clockwise rotation, and a homothetic transformation  with coefficient 2, see Fig. 9.4a, and it can stretch some vectors and contract others. For example, it sends the vector (1, 1) to \((1+1,-1-1)=(2,-2)\). The length of (1, 1) is \(|(1,1)|=\sqrt{1^2+1^2}=\sqrt{2}\), while the length of f(1, 1) is \(|f(1,1)|=\sqrt{2^2+(-2)^2}=2\sqrt{2}\), hence the vector (1, 1) has been stretched twice. On the other hand, \(f(1,-1)=(1-1, -1-(-1))=(0,0)\), that is, the non-zero vector (1, 1) has been contracted to 0.

Let \(\sigma (f)\) denote the minimal possible ratio of \(\frac{|f(z)|}{|z|}\) over all \(z=(x, y)\) with \(|z|=\sqrt{x^2+y^2} \ne 0\).

As we have seen above, \(\sigma (f)=0\) for \(f(x, y)=(x+y, -x-y)\). On the other hand, for the function \(f(x, y)=(x+y, x-y)\) we have
$$ \frac{|f(z)|}{|z|} = \frac{\sqrt{(x+y)^2+(x-y)^2}}{\sqrt{x^2+y^2}}=\frac{\sqrt{2x^2+2y^2}}{\sqrt{x^2+y^2}}=\sqrt{2}, $$
hence \(\sigma (f)=\sqrt{2}\). Geometrically, this function is a composition of a homothetic transformation  (with \(k=\sqrt{2}\)) and a rotation, see Fig. 9.4b, and therefore stretches every vector in the same way.

The Role of \(\sigma (f)\) in the Task of Inverting f

Given a function f and the value f(xy), can we uniquely determine x and y? For the function \(f(x, y)=(x+y, x-y)\), this is always possible. For example, if \(f(x, y)=(3,-1)\), then \(x+y=3\) and \(x-y=-1\), which can be easily solved to get \(x=1\), \(y=2\). In general, it is easy to prove that uniquely “restoring” (xy) given f(xy) is always possible if \(\sigma (f)>0\). However, for a function with \(\sigma (f)=0\) this procedure does not work. For example, for the function \(f(x, y)=(x+y, -x-y)\) assume that \(f(x, y)=(3,-1)\). Then \(x+y=3\) and \(-x-y=-1\). However, \(x+y=3\) implies \(-x-y=-3 \ne -1\), a contradiction. On the other hand, if \(f(x, y)=(1,-1)\), then \(x+y=1\) and \(-x-y=-1\) which is possible for many different pairs xy, for example, \(x=0\) and \(y=1\) or \(x=2\) and \(y=-1\), etc.

For this reason, functions f with \(\sigma (f)=0\) are “unpleasant” in applications. Also, if \(\sigma (f)>0\) but is very small, then the “restoring” procedure above is possible, but may be difficult to compute. Hence, the ideal situation is when we can prove that \(\sigma (f)\) is not 0, and moreover is “reasonably far away” from 0.

Estimating the Proportion of “Bad” Functions

How many “good” and “bad” functions are there? For simplicity, assume that the coefficients abcd in the formula \(f(x, y)=(ax+by, cx+dy)\) can be either \(+1\) or \(-1\). Because there are two options for each coefficient, there are \(2^4=16\) such functions in total. An easy (but boring) computation shows that exactly 8 of them (including \(f(x, y)=(x+y, -x-y)\) discussed above) have \(\sigma (f)=0\), and another 8 (including \(f(x, y)=(x+y, x-y)\)) have \(\sigma (f)=\sqrt{2}\). In other words, if we select such a function f at random, we get \(\sigma (f)=0\) with probability \(\frac{8}{16}=0.5\), and \(\sigma (f)=\sqrt{2}\) with probability \(\frac{8}{16}=0.5\) as well.

A proportion of \(50\%\) of bad functions looks discouraging, but the situation improves if we consider more general functions \(A_n:{\mathbb R}^n \rightarrow {\mathbb R}^n\), transforming n-tuples \((x_1, x_2, \dots , x_n)\) into \((y_1, y_2, \dots , y_n)\). If such \(A_n\) is linear and continuous, then
$$ y_j = a_{1j}x_1 + a_{2j}x_2 + \dots + a_{nj}x_n, \quad j=1,2,\dots , n, $$
where \(a_{ij}, \, i=1,2,\dots , n, \, j=1,2,\dots , n\) are real coefficients. It is standard to write the coefficients in an \(n \times n\) table, with \(a_{ij}\) being at the intersection of the i-th row and j-th column, and then \(A_n\) is called an \(n \times n\) matrix.  Once again, assume that each \(a_{ij}\) is either \(+1\) or \(-1\). Because there are \(n^2\) coefficients, we have \(2^{n^2}\) such functions/matrices \(A_n\) in total. A famous result of Kahn et al [213] states that the proportion of those matrices having \(\sigma (A_n)=0\) is at most \(0.999^n\). This result is not useful for small n (for \(n=2\), it gives the estimate \(0.999^2 \approx 0.998\) while we know that the true proportion is 0.5), but, for large n, the expression \(0.999^n\) decreases rapidly. For example, while the bound \(0.999^{1{,}000} \approx 0.37\) is still not very useful, the bound \(0.999^{10{,}000} \approx 0.00005\) for \(n=10{,}000\) is already good, while the bound \(0.999^{100{,}000} \approx 3.5 \cdot 10^{-44}\) for \(n=100{,}000\) is much better than needed for any practical purposes. In fact, in later work [377] the bound \(0.999^n\) has been improved to approximately \(0.5^n\), which already gives an excellent estimate \(0.5^{30} \approx 9.3 \cdot 10^{-10}\) for \(n=30\).

How Often Is \(\sigma (f)\) Small?

However, as mentioned above, a function/matrix \(A_n\) is practically unpleasant for the inversion procedure even if \(\sigma (A_n)\) is positive but small. For concreteness, let us agree that by “small” we mean smaller that \(\frac{1}{n^B}=n^{-B}\) for some constant \(B>0\). So, the problem is to find a good upper bound for the proportion of matrices \(A_n\) with \(\sigma (A_n) < n^{-B}\), or, equivalently, for the probability \(P(\sigma (A_n) < n^{-B})\) that a randomly selected \(A_n\) has small \(\sigma (A_n)\).

A theorem of Rudelson [329], discussed in Sect.  8.11, implies that \(P(\sigma (A_n) < C \varepsilon n^{-3/2}) \le \varepsilon \) holds for sufficiently large n and for any \(\varepsilon >c/\sqrt{n}\), where C and c are some constants. This is significant progress, but the condition \(\varepsilon >c/\sqrt{n}\) is restrictive in some important applications. Even for large n like \(n=10{,}000\), \(1/\sqrt{n}\) is 0.01, hence (assuming for simplicity that \(c=1\)) the above theorem works only for \(\varepsilon >0.01\), and provides no more than \(99\%\) warranty that \(\sigma (A_n)\) is small.

This was the state of the art before the following theorem was proved by Terence Tao and Van Vu  [372].

Theorem 9.4

For any positive constant A, there is a positive constant B such that for any sufficiently large n
$$ P(\sigma (A_n) < n^{-B}) \le n^{-A}. $$

The importance of Theorem 9.4 is that it works for any \(A>0\). For example, selecting \(A=10\) we get an estimate \(n^{-10}\) for the proportion of “unpleasant” matrices \(A_n\). For \(n=10{,}000\), this gives1 a chance of just \(10{,}000^{-10}=10^{-40}\) for \(\sigma (A_n)\) to be “small”, which is a much better probability guarantee than in Rudelson’s result.


T. Tao and V. Vu, Inverse Littlewood-Offord theorems and the condition number of random discrete matrices, Annals of Mathematics 169-2, (2009), 595–632. 

9.5 Characterizing the Legendre Transform of Convex Analysis

Convex Sets and Functions

A region S in the plane is called convex  if it contains the straight line segment AB whenever points A and B belong to S, see Fig. 9.5a. For example, any disk or the area bounded by a triangle is a convex region. On the other hand, if ABC is a triangle, and X is any point strictly inside it, then the area bounded by the quadrilateral ABXC is non-convex, because it contains points B and C but not the line segment BC, see Fig. 9.5b.

A more complicated example of a convex region is the set of all points (xy) in the coordinate plane such that \(y \ge x^2\). In general, for any function \(f:{\mathbb R}\rightarrow {\mathbb R}\), the set of all points (xy) such that \(y \ge f(x)\) is called the epigraph  of f, and a function f is called convex  if its epigraph is a convex set. Equivalently, a function f is convex if
$$\begin{aligned} f(\lambda x + (1-\lambda )y) \le \lambda f(x) + (1-\lambda ) f(y) \end{aligned}$$
for all \(x, y \in {\mathbb R}\) and all \(\lambda \in [0,1]\). For \(f(x)=x^2\) this reduces to \((\lambda x + (1-\lambda )y)^2 \le \lambda x^2 + (1-\lambda ) y^2\), which simplifies to \(\lambda (1-\lambda )(x^2-2xy+y^2) \ge 0\), or equivalently \(\lambda (1-\lambda )(x-y)^2 \ge 0\).
Fig. 9.5

a A convex set, b A non-convex set, c and d Convex sets as intersections of half-planes

For readers familiar with the concept of ‘derivative’  there is a much simpler proof that \(f(x)=x^2\) is a convex function. The derivative of f is \(f'(x)=2x\), and the second derivative is \(f''(x)=2\). There is a theorem that if \(f''(x)\) exists and is positive for all \(x \in {\mathbb R}\), then f is a convex function.

Convex Sets as Intersections of Half-Planes

Another example of a convex region is the set of all points (xy) in the coordinate plane such that \(y \ge |x|\), where \(|\cdot |\) denotes the absolute value. This is the epigraph of the convex function \(f(x)=|x|\). This function is not differentiable at 0, but its convexity easily follows directly from (9.9).

The inequality \(y\ge |x|\) can be equivalently written as “\(y\ge x\) and \(y \ge -x\)”. Geometrically, the set of points (xy) satisfying an inequality of the form \(y \ge ax+b\) for some constants ab is a half-plane. Hence, the epigraph of the function \(f(x)=|x|\) is the intersection of the half-planes \(y\ge x\) and \(y \ge -x\), see Fig. 9.5c. Representing a set as an intersection of half-planes is extremely convenient in optimization, where linear inequalities are the easiest to deal with.

Can the epigraph \(S=\{(x,y)\,|\, y\ge x^2\}\) of the function \(f(x)=x^2\), see Fig. 9.5d, be written as an intersection of half-planes? This looks unlikely, because the intersection of any finite number of half-planes has a piecewise linear boundary, while in our case the boundary is smooth. However, what if we allow an infinite number of half-planes? A half-plane \(H(a,b)=\{(x,y)\,|\, y\ge ax+b\}\) contains S if \(x^2\ge ax+b\) for all x. For example, the half-plane \(H(1,0)=\{(x,y)\,|\, y\ge x\}\) does not contain S because the inequality \(f(x)=x^2 \ge x\) does not hold for, say \(x=0.5\). On the other hand, \(H(1,-1/4)=\{(x,y)\,|\, y\ge x\}\) contains S, because the inequality \(x^2\ge x-1/4\) is equivalent to \((x-1/2)^2 \ge 0\) and is valid for all x. In fact, H(1, b) contains S if and only if \(b \le -1/4\). More generally, H(ab) contains S if and only if \(b \le -a^2/4\). Indeed, if \(b \le -a^2/4\), or \(0 \le -b-a^2/4\), then the inequality \(x^2\ge ax+b\) can be written as \(x^2-ax+a^2/4-a^2/4-b \ge 0\), or \((x-a/2)^2+(-b-a^2/4) \ge 0\), which clearly holds for all x. On the other hand, if \(b > -a^2/4\), the inequality \(x^2\ge ax+b\) fails for \(x=-a/2\). In summary, the set \(S=\{(x,y)\,|\, y\ge x^2\}\) is the intersection of half-planes \(H(a,-a^2/4)\), and the inequality \(y \ge x^2\) is equivalent to an infinite number of linear (in x) inequalities \(y \ge ax+(-a^2/4)\), \(a \in {\mathbb R}\).

The Legendre Transform and Its Properties

In general, the half-plane \(H(a,b)=\{(x,y)\,|\, y\ge ax+b\}\) contains an epigraph \(S=\{(x,y)\,|\, y\ge f(x)\}\) of a function f(x) if and only if
$$ f(x) \ge ax+b, \quad \forall x. $$
This holds if and only if \(b \le -(ax-f(x))\) for all x, or, equivalently, if and only if \(b \le -\phi (a)\), where the function \(\phi :{\mathbb R}\rightarrow {\mathbb R}\) is defined by
$$ \phi (a) := \max \limits _{x \in {\mathbb R}} (ax-f(x)). $$
If f is a convex function, S is the intersection of half-planes \(H(a,-\phi (a))\), \(a \in {\mathbb R}\), and the non-linear inequality \(y \ge f(x)\) is equivalent to an infinite number of linear (in x) inequalities \(y \ge ax - \phi (a)\), \(a \in {\mathbb R}\).
The function \(\phi \) is called the Legendre transform  of the function f, and we write \(\phi =Lf\). Unfortunately, the Legendre transform is not always well-defined as a finite-valued function. For example, if \(f(x)=x\), and \(a=3\), then the expression \(ax-f(x)=3x-x=2x\) can be arbitrarily large for large x. In this case, we put \(\phi (3)=+\infty \). In general, we may allow our functions f and \(\phi \) to take infinite values, and then the Legendre transform is always well-defined. Moreover, the Legendre transform Lf of any convex function f is always a convex function itself. In addition, the Legendre transform has some other useful properties, for example
  1. (P1)

    \(LLf=f\) for any convex function f;

  2. (P2)

    \(f \le g\) implies \(Lf \ge Lg\).


In (P1), by LLf we mean “the Legendre transform of the Legendre transform of f”, while in (P2) by \(f \le g\) we mean \(f(x) \le g(x)\) for all x. For example, we have proved above that the Legendre transform of the function \(f(x)=x^2\) is the function \(\phi (a)=a^2/4\). By absolutely the same argument, we can prove that the Legendre transform of \(f(x)=Cx^2\) is \(\phi (a)=a^2/4C\) for any constant C. With \(C=1/4\), this implies that the Legendre transform of \(f(x)=x^2/4\) is \(\phi (a)=a^2\), in agreement with (P1). With \(C=2\), this implies that the Legendre transform of \(g(x)=2x^2\) is \(\psi (a)=a^2/8\). Note that we have \(x^2 \le 2x^2\) for all x, but \(a^2/4 \ge a^2/8\) for all a, in agreement with (P2).

The Legendre Transform of Multivariate Convex Functions

The definition of convexity (9.9) works equally well for functions of several variables, for example, \(f(x, y)=x^2+y^2\), and, more generally, \(f(x_1,x_2,\dots , x_n)=x_1^2+x_2^2+\dots +x_n^2\) are convex functions. A function \(f:{\mathbb R}^n\rightarrow {\mathbb R}\) is convex  if and only if its epigraph \(S=\{(x_1,\dots , x_n, y)\,|\, y\ge f(x_1,x_2,\dots , x_n)\}\) is a convex subset of \({\mathbb R}^{n+1}\). The inequality \(y\ge f(x_1,x_2,\dots , x_n)\) can again be represented as an infinite number of linear inequalities \(y \ge \langle a, x \rangle - \phi (a)\), \(a=(a_1, \dots , a_n) \in {\mathbb R}^n\), where \(\langle a, x \rangle \) is the inner product  defined as \(\langle a, x \rangle = a_1x_1+a_2x_2+\dots +a_nx_n\), and the function \(\phi :{\mathbb R}^n\rightarrow {\mathbb R}\) is defined by
$$ \phi (a) := \max \limits _{x \in {\mathbb R}^n} (\langle a, x \rangle -f(x)) $$
and is called the Legendre transform of f. This trick is extremely useful in convex  optimization.

The Characterization of the Legendre Transform

The theorem below, proved in [20], shows that the Legendre transform  is, up to linear terms, the only transformation which has the useful properties (P1) and (P2). To formulate it, we need a few more definitions. A set \(S \subset {\mathbb R}^n\) is called closed  if \(x_n \in S, \, \forall n\) and \(\lim \limits _{n \rightarrow \infty } x_n = x\) implies that \(x \in S\). For example, [0, 1] is a closed set, while (0, 1] is not, because it contains a sequence \(x_n=1/n, \, n=1,2,\dots \), but not its limit point 0. A function \(f:{\mathbb R}^n\rightarrow {\mathbb R}\cup \{\pm \infty \}\) is called lower-semicontinuous  if its epigraph is a closed set in \({\mathbb R}^{n+1}\). Let \({\mathscr {C}}({\mathbb R}^n)\) be the set of all lower-semi-continuous convex functions \(f:{\mathbb R}^n\rightarrow {\mathbb R}\cup \{\pm \infty \}\). A transformation \(B:{\mathbb R}^n \rightarrow {\mathbb R}^n\), sending \((x_1, x_2, \dots x_n)\) to \((y_1, y_2, \dots , y_n)\), is called linear  if \( y_j = b_{1j}x_1 + b_{2j}x_2 + \dots + b_{nj}x_n\), \(j=1,2,\dots , n, \) for some real coefficients \(b_{ij}, \, i=1,2,\dots , n, \, j=1,2,\dots , n\), symmetric  if \(b_{ij}=b_{ji}, \, \forall i, j\), and invertible  if \(B(x)\ne 0\) whenever \(x \ne ~0\).

Theorem 9.5

Assume that a transform \(T :{\mathscr {C}}({\mathbb R}^n) \rightarrow {\mathscr {C}}({\mathbb R}^n)\) (defined on the whole domain \({\mathscr {C}}({\mathbb R}^n)\)) satisfies (P1) and (P2), that is, \(TTf=f\) and \(f \le g\) implies \(Tf \ge Tg\). Then T is essentially the Legendre transform L. Namely, there exists a constant \(C_0 \in {\mathbb R}\), a vector \(v_0\in {\mathbb R}^n\), and an invertible symmetric linear transformation B such that
$$ (T f)(x) = (L f)(Bx+v_0) + \langle x, v_0 \rangle + C_0, \, \quad \forall x \in {\mathbb R}^n. $$


S. Artstein-Avidan and V. Milman, The concept of duality in convex analysis, and the characterization of the Legendre transform, Annals of Mathematics 169-2, (2009), 661–674.

9.6 The Solution of the Ten Martini Problem

Operators and Operations Between Them

Familiar functions like \(f(x)=x\), \(f(x)=2x\), or \(f(x)=x^2\), map real numbers to real numbers. In geometry, we study motions of the plane, like rotations or reflections, which can be viewed as functions which map points of the plane to other points. If each point is given by two real coordinates, such functions map pairs of real numbers into pairs. For example, the function \(f(x, y)=(-x,-y)\) represents reflection with respect to the point (0, 0).

Here, we consider functions that map infinite sequences of numbers to infinite sequences. Such functions are called operators.  Every operator T takes an infinite sequence \(x=(x_1, x_2, x_3, \dots )\) as an input, and transforms it into another infinite sequence \(y=(y_1, y_2, y_3, \dots )\), which we will also denote by T(x). The simplest operator, usually denoted by I, is the identity operator,  which sends every sequence to itself, that is, \(I(x)=x\) for all sequences x. A little less trivial is the multiplication by constant operator,  which just multiplies every term of the sequence by the same constant \(\lambda \), that is, transforms every infinite sequence \(x=(x_1, x_2, x_3, \dots )\) into the sequence \(\lambda x = (\lambda x_1, \lambda x_2, \lambda x_3, \dots )\). Another example is the shift operator,  which we denote by S, which transforms every sequence \(x=(x_1, x_2, x_3, \dots )\) into the sequence \(S(x)=(0, x_1, x_2, x_3, \dots )\).

Operators can be added together and multiplied by constants. The product of any operator T and constant \(\lambda \in {\mathbb R}\) is an operator, denoted \(\lambda T\), which maps every sequence x into the sequence \(\lambda T(x)\). For example, \(\lambda I\) is just the “multiplication by \(\lambda \)” operator, while \(\lambda S\) is the operator transforming every sequence \((x_1, x_2, x_3, \dots )\) into the sequence \((0, \lambda x_1, \lambda x_2, \lambda x_3, \dots )\). The sum of two operators \(T_1\) and \(T_2\), denoted \(T_1+T_2\), is the operator which maps every sequence x to the sequence \(T_1(x)+T_2(x)\), where the sequences are added element-wise. The difference \(T_1-T_2\) is just \(T_1+(-1)\cdot T_2\). For example, the operator \(\lambda I - S\) maps every sequence \((x_1, x_2, x_3, \dots )\) to the sequence \((\lambda x_1, \lambda x_2 - x_1, \lambda x_3 - x_2, \dots )\).

The Norm of an Infinite Sequence

For any vector in the plane with coordinates (xy) its length is \(\sqrt{x^2+y^2}\); for a vector (xyz) in three-dimensional space the length is \(\sqrt{x^2+y^2+z^2}\). Can we define the “length” of an infinite sequence in a similar way? For some sequences, like \((1,2,3,\dots )\), this seems to be difficult, but for others, like \((1, 1/2, 1/4, 1/8, \dots )\), a similar formula works. If we square all the “coordinates” and add them together, we get the expression \(1^2+(1/2)^2+(1/4)^2+(1/8)^2+\dots \), which is the same as \(1+1/4+(1/4)^2+(1/4)^3+\dots \). In general, for any q, the sum \(1+q+q^2+\dots +q^n\) is equal to \(\frac{1}{1-q}-\frac{q^n}{1-q}\). If \(q\in (0,1)\), the term \(\frac{q^n}{1-q}\) becomes smaller and smaller, hence the whole sum becomes closer and closer to \(\frac{1}{1-q}\). In this case, we say that the infinite sum \(1+q+q^2+\dots \) converges  to \(\frac{1}{1-q}\) and write \(1+q+q^2+\dots = \frac{1}{1-q}\). In our case, \(q=4\), and \(1+1/4+(1/4)^2+(1/4)^3+\dots = \frac{1}{1-1/4}=\frac{4}{3}\), hence the infinite sequence \((1, 1/2, 1/4, 1/8, \dots )\) has finite “length” \(\sqrt{\frac{4}{3}}\). In general, the set of all sequences \(x=(x_1, x_2, x_3, \dots )\) with finite sum \(x_1^2+x_2^2+x_3^2+\dots \) is denoted \(l^2\),  and the square root of this sum is denoted \(\Vert x\Vert \) and is called the norm  of x. For example, the sequence \((1, 1/2, 1/4, 1/8, \dots )\) belongs to the set \(l^2\), while the sequence \((1,2,3,\dots )\) does not.

Bounded Linear Operators on \(l^2\)

All the operators T considered above have the property that if the input x belongs to \(l^2\), then so does the output T(x). For example, if the sum \(x_1^2+x_2^2+x_3^2+\dots \) converges to some finite number A, then the sum \((\lambda x_1)^2+(\lambda x_2)^2+(\lambda x_3)^2+\dots \) converges to a finite number \(\lambda ^2 A\), hence \(x \in l^2\) implies that \(\lambda x \in l^2\). Also, if \(x_1^2+x_2^2+x_3^2+\dots \) converges to A, then \(0^2+x_1^2+x_2^2+x_3^2+\dots \) converges to A as well. In other words, \(x \in l^2\) implies that \(S(x) \in l^2\). From now on, we consider only operators T such that \(T(x) \in l^2\) whenever \(x \in l^2\).

An operator T is called linear  if \(T(x+y)=T(x)+T(y)\) for all sequences \(x, y\in l^2\). It is easy to check that operators I, \(\lambda I\), and S are linear. A linear operator is called bounded  if there is a constant \(M>0\) such that \(\Vert T(x)\Vert \le M \Vert x\Vert \) for all sequences \(x \in l^2\). For example, operators \(\lambda I\) and S satisfy this property with \(M=|\lambda |\) and \(M=1\), respectively.

The Spectrum of a Bounded Operator

An operator T is called invertible if for any sequence \(y \in l^2\) there exists a unique sequence \(x \in l^2\) such that \(T(x)=y\). For example, the operator I is trivially invertible, with \(x=y\). More generally, the operator \(\lambda I\) is invertible for every \(\lambda \ne 0\), with \(x=(1/\lambda )y\). On the other hand, the operator S is not invertible, because for any sequence \(y=(y_1, y_2, y_3, \dots )\) with \(y_1 \ne 0\) there is no x with \(S(x)=y\).

What about the operator \(\lambda I - S\), mapping \((x_1, x_2, x_3, \dots )\) to \((\lambda x_1, \lambda x_2 - x_1, \lambda x_3 - x_2, \dots )\)? For \(\lambda = 0\), it reduces to \(-S\) and is not invertible. For \(\lambda \ne 0\) and any \(y=(y_1, y_2, y_3, \dots )\), the equation \(T(x)=y\) implies that \(\lambda x_1 = y_1\), \(\lambda x_2 - x_1 = y_2\), \(\lambda x_3 - x_2 = y_3\), and so on. From the first equation, \(x_1 = y_1 / \lambda \); from the second one, \(x_2=(y_2+x_1)/\lambda = y_2/\lambda + y_1/\lambda ^2\); from the third one, \(x_3=y_3/\lambda + y_2/\lambda ^2 + y_1/\lambda ^3\), and so on. Continuing in this way, we can restore \(x=(x_1, x_2, x_3, \dots )\) uniquely, hence \(\lambda I - S\) is invertible for any \(\lambda \ne 0\).

In general, the set of all real numbers \(\lambda \) such that the operator \(\lambda I - T\) is not invertible is called the spectrum  of a bounded operator T. For example, we have just proved that the spectrum of the shift operator S consists of one number \(\lambda = 0\). It is also easy to see that the spectrum of I is one number \(\lambda = 1\). In general, however, the spectrum can have a much more complicated structure, and the study of the spectral properties of linear operators is an important area of mathematical research.

The Almost Mathieu Operator and Its Spectrum

One important operator arising from applications in physics is the so-called almost Mathieu operator, which depends on three real parameters \(\lambda \ne 0\), \(\alpha \), and \(\theta \), and transforms every sequence \(x=(x_1, x_2, x_3, \dots )\) into the sequence \(y=(y_1, y_2, y_3, \dots )\) according to the formula
$$ y_n = x_{n+1} + x_{n-1} + 2 \lambda \cos (2\pi (\theta + n \alpha )) x_n. $$
Fig. 9.6

Spectra of the almost Mathieu operator for various \(\alpha \)

The study of the spectrum of this operator, motivated by physical applications, has kept mathematicians busy for several decades. Experiments shows that it has a complicated structure. For example, if one fixes \(\lambda \) and \(\theta \), and depicts the spectrum of the almost Mathieu operator for various \(\alpha \), one usually gets a fractal-like picture as in Fig. 9.6.

If \(\alpha \) is a rational number (that is, \(\alpha =p/q\) for some integers pq), then the spectrum consists of the union of q intervals. For irrational \(\alpha \), it was conjectured that the spectrum is a so-called Cantor set.  The most well-known example of a Cantor set is when you start with the interval [0, 1], remove the middle third (1 / 3, 2 / 3), then remove middle thirds (1 / 9, 2 / 9) and (7 / 9, 8 / 9) of the remaining intervals [0, 1 / 3] and [2 / 3, 1], then the middle thirds of the remaining four intervals, and so on, up to infinity. The set C of all points which survives is a Cantor set, see Sect.  1.4 for a more detailed discussion. Of course, we have some flexibility in this construction, e.g. we can remove some other fixed proportion of every interval at every step, or even different proportions at every step, etc. However, all the sets constructed in this way are homeomorphic,  that is, for every pair of them, there is a continuous invertible function which maps one set into the other. In general, a Cantor set is any set homeomorphic to the one described above.

The conjecture that, for every irrational \(\alpha \), the spectrum of the almost Mathieu operator is a Cantor set,  was proposed by Azbel [26] in 1964. In 1981, Mark Kac offered ten martinis for anyone who could prove or disprove it, and since then the problem has been known as “the Ten Martini Problem”. Despite many partial results for some special irrational \(\alpha \), the general case was open until 2009, when the final positive resolution by Artur Avila and Svetlana Jitomirskaya appeared [24].

Theorem 9.6

The spectrum of the almost Mathieu operator is a Cantor set for all irrational \(\alpha \) and for all \(\theta \) and all \(\lambda \ne 0\).


A. Avila and S. Jitomirskaya, The Ten Martini Problem, Annals of Mathematics 170-1, (2009), 303–342. 

9.7 A Linear Time Algorithm for Edge-Deletion Problems

The Party Organization Problem

In Sect.  5.11 we discussed the “party organization problem”: if some of your guests do not like each other, what is the minimal number of tables you need to be able to guarantee that no pair of enemies share a table? The correct mathematical language in which to study this problem is graph theory:  we can represent the guests as points in the plane (vertices),  and join any two vertices by a line (edge)  if and only if the corresponding guests are enemies. We can also represent the tables as colours, and ask what is the minimal number of colours we need to colour the vertices of the graph in such a way that no two vertices connected by an edge have the same colour. A set of vertices, some of which are connected by edges, is called a graph,  and the minimal number of colours in the problem described above is called the chromatic number  of the graph. A graph with chromatic number at most k is called k-colourable.

In most restaurants, however, we have no control over the number of available tables. The restaurant may inform you that they have k big tables, where k is a small fixed number such as \(k=2\), and, in this case, it may be impossible to seat every pair of enemies at different tables. For example, if you have \(k=2\) tables and \(n=4\) guests Anna, Bob, Claire, and David, such that Anna and Bob dislike everyone else, including each other (but Claire and David are friends), you can start by putting enemies Anna and Bob at different tables, but then you need to put Claire either near Anna or near Bob, creating an unhappy pair of enemies at the same table. Moreover, you then need to put David either near Anna or near Bob as well, creating another unhappy pair.

While a perfect solution in this case is impossible, you can still do better than the way described above. Namely, you can put Anna and Bob at table 1, and Claire and David at table 2. In this case you still have a pair of enemies (Anna and Bob) seating near the same table, but at least you have one such pair, not two!

Edge Removal and Graph Colouring

In general, our problem is to seat n guests at k tables such that the number of pairs of enemies at the same table is as small as possible. In the language of graph theory, we have a graph G with n vertices, and the problem is to colour the vertices in k colours such that the number of edges which join vertices of the same colour is minimal. The same question can be formulated slightly differently: what is the minimal number of edges we should remove from G to make it k-colourable? In the example above, we had a graph with vertices A, B, C, and D (A—Anna, B—Bob, C—Claire, D—David) and edges AB, AC, AD, BC, BD, see Fig. 9.7a. This graph is not 2-colourable, but, after removing just one edge AB, it becomes 2-colourable with vertices A, B coloured white, while C and D are coloured black. This colouring represents the way the guests should sit to create just one unhappy pair Anna-Bob—the pair whose edge was removed. In general, if the graph is k-colourable after removing m edges, then this colouring represents a guest distribution with exactly m unhappy pairs.
Fig. 9.7

Making the Anna-Bob-Claire-David graph 2-colourable and triangle-free

Removing Edges to “Kill” Triangles or Squares

Similar problems in the form “What is the minimal number of edges we should remove from a graph to make it (something)?” arise in many subareas of graph theory and its applications. For example, a “triangle” is a triple of vertices A, B, C, such that all of them are connected by edges (that is, the graph contains edges AB, BC, and CA). A graph G is called triangle-free  if it contains no triangles. For any graph G with triangles, we may ask what is the minimal number of edges we should remove from G to make it triangle-free. If a graph G contains m triangles, then removing m edges (one in each triangle) would surely work, but sometimes we can do better. For example, the Anna-Bob-Claire-David graph in Fig. 9.7b contains two triangles, ABC and ABD, but removing a single edge AB destroys them both, and makes it triangle-free after just one edge deletion.

In a similar way we may ask how many edges we should remove to make the graph G square-free, that is, containing no four vertices ABCD such that AB, BC, CD and DA are edges in G. More generally, we may aim to avoid any fixed configuration like this.

The General Edge Removal Problem for Monotone Properties

In general, a property P of a graph is called monotone  if it is preserved after removal of vertices and edges of G. For example, if a graph G is k colourable, then, after removing any vertex or any edge from it, it obviously remains k colourable, because the same colouring works. Similarly, if G is triangle-free, then after removing any vertex or edge from G it obviously remains triangle-free. Hence, the properties of being k colourable or being triangle-free are examples of monotone properties. The same is true for the property of being square-free, and for many other graph properties of theoretical and practical interest.

In this general setting, our problem is formulated as follows:
  1. (*)

    Given a monotone property P and arbitrary graph G, what is the minimal number of edge deletions needed to turn G into a graph satisfying P?


Problem (*) is very difficult to solve exactly. A naïve approach would be to just try all possible edge deletions, but this works only for small graphs. In a graph with m edges, there are m ways to delete the first edge, \(m-1\) ways to delete the second one, and so on, so there are not much less than \(m^k\) ways to delete k edges. For a large graph with \(m=1000\) edges, and \(k=10\), \(m^k=10^{30}\). Even for a supercomputer performing \(10^{16}\) operations per second, it would take more than three millions years to perform \(10^{30}\) operations. Moreover, for some graph properties P it is not easy even to check if the initial graph G satisfies P. For example, this is the case if P is the k-colourability property for \(k \ge 3\), because there are a huge number of possible colourings, and it could take ages to check if there is one that works.

An Approximate Solution

Because problem (*) is difficult to solve exactly, an important question is whether it is possible to efficiently find at least an approximate solution to it. This is what the following theorem of Noga Alon, Asaf Shapira, and Benny Sudakov  [11] is about.

Theorem 9.7

For any fixed \(\varepsilon > 0\) and any monotone property P, there is a constant C (depending on \(\varepsilon \) and P) and an algorithm which, given a graph G with n vertices and m edges, finds an approximate solution to (*) to within an additive error \(\varepsilon n^2\) after performing at most \(C(n+m)\) operations.

In other words, if the exact optimal answer to (*) is k(PG), the algorithm will return an answer \(k'(P, G)\) such that \(|k'(P,G) - k(P, G)| \le \varepsilon n^2\). In many cases, this is a reasonable approximation. Indeed, the number of pairs of vertices of G is a bit less than \(n^2/2\) (the exact formula is \(n(n-1)/2\)). If about half of all pairs are connected by edges, then there are about \(m \approx n^2/4\) edges. If \(\varepsilon =0.001\) or so, then the error \(\varepsilon n^2\) is much less than the total number of edges. For example, if \(n=1000\) and \(m=n^2/4=250{,}000\), then the solution to (*) may be anything between 0 and 250, 000, while the algorithm outputs the number \(k'\), and guarantees that the answer is between \(k'-1000\) and \(k'+1000\).

Can we develop an algorithm with an even better approximation guarantee, e.g. with additive error proportional to \(n^{1.99}\) instead of \(n^2\), or at least to \(n^{2-\delta }\) for some \(\delta >0\)? The authors prove that this is possible if there is a 2-colourable graph that does not satisfy P. For example, this is the case for the property of being square-free, because the square itself (a graph with four vertices ABCD such that AB, BC, CD and DA are edges) is clearly not square-free but is 2-colourable (to see this, colour A and C green and B and D blue).

On the other hand, if P is a property such that all 2-colourable graphs satisfy P (this is the case if P is the property of being k-colourable for \(k\ge 2\), or triangle-free), then the authors provide very strong evidence that, for any \(\delta >0\), no efficient algorithm with approximation guarantee \(n^{2-\delta }\) exists.


N. Alon, A. Shapira, and B. Sudakov, Additive approximation for edge-deletion problems, Annals of Mathematics 170-1, (2009), 371–411. 

9.8 A Characterization of Stability-Preserving Linear Operators

Polynomials and Their Roots

A (real) polynomial is any function of the form
$$ P(x) = a_n x^n + a_{n-1} x^{n-1} + \dots + a_2 x^2 + a_1 x + a_0, $$
where \(a_0, a_1, \dots , a_n\) are some real coefficients. If \(a_n \ne 0\), we say that the polynomial P(x) has degree n. For example, polynomials of degree 0 are just constant functions \(P(x)=a_0\), \(a_0\ne 0\), polynomials of degree 1 are linear functions \(P(x)=a_1 x + a_0\), \(a_1 \ne 0\), polynomials of degree 2 are quadratic functions, \(P(x)=a_2 x^2 + a_1 x + a_0\), \(a_2 \ne 0\), and so on.

A (real) root of a polynomial P(x) is a real solution to the equation \(P(x)=0\). If the polynomial P(x) can be written as \(P(x)=(x-a)Q(x)\), where Q(x) is another real polynomial, then \(P(a)=(a-a)Q(a)=0\), hence a is a root of P(x). If, moreover, P(x) can be written as \(P(x)=(x-a)^kQ(x)\), then we say that a is a root of P(x) of multiplicity  k, and then the convention is that the root a should be counted k times while counting the roots of P(x). For example, we say that the polynomial \(P(x)=(x-1)(x-3)^2\) has three roots: 1, 3, and 3 again.

Stable Polynomials

Polynomials of degree 0, 1, and 2 have at most 0, 1, and 2 real roots, respectively. This is not a coincidence. There is an (easy) mathematical theorem stating that any polynomial of degree n has at most n real roots. Polynomials of degree n which have n real roots (that is, the maximal possible number of roots), are called stable,  or hyperbolic. By convention, we consider the special polynomial \(P(x)=0\) to be stable as well. For example, the polynomial \(P(x)=2x-3\) is stable, because its degree is \(n=1\), and it has one root \(x=3/2\). The polynomial \(P(x)=x^2-3x+2\) has degree \(n=2\) and two roots \(x=1\) and \(x=2\), hence it is stable. The polynomial \(P(x)=x^2-2x+1\) has degree \(n=2\) and root \(x=1\) of multiplicity 2, which is counted as two roots, hence it is stable as well. However, the polynomial \(P(x)=x^2+1\) has degree \(n=2\) but no real roots at all, hence it is not stable.

Stability Under Differentiation

The derivative  of a polynomial P(x) is a polynomial, denoted \(P'(x)\), which can be characterized using the following rules
  1. (i)

    \((P+Q)'(x)=P'(x)+Q'(x)\) for all polynomials PQ,

  2. (ii)

    \((aP)'(x) = a P'(x)\) for every polynomial P and constant \(a \in {\mathbb R}\),

  3. (iii)

    \((x^k)'=kx^{k-1}\) for all \(k \ge 0\).


For example, let us calculate the derivative of the polynomial \(P(x)=x^2-3x+2\). Rules (i) and (ii) imply that \(P'(x)=(x^2-3x+2)'=(x^2)'+(-3x)'+(2)'=(x^2)'-3(x)'+2(1)'\). By (iii), \((x^2)'=2x\), \((x)'=1\), and \((1)'=(x^0)'=0\), hence \(P'(x)=2x-3\cdot 1+2\cdot 0=2x-3\). One easy but useful theorem is that the derivative of any stable polynomial is stable. In other words, we say that stability is preserved under differentiation. For example, \(P(x)=x^2-3x+2\) is a stable polynomial (degree 2, and 2 roots), and its derivative \(P'(x)=2x-3\) is stable as well (degree 1, and 1 root).

Stability After Multiplication

Any individual polynomial, say \(P(x)=x^2-3x+2\), is a function transforming real numbers into real numbers, e.g. the number \(x=4\) is transformed into \(P(4)=4^2-3\cdot 4 +2 = 6\). In contrast, differentiation is an example of an “operation” transforming polynomials into polynomials, e.g. \(x^2-3x+2\) is transformed into \(2x-3\). Another example of such an “operation” is multiplication by any fixed polynomial Q(x), say, \(Q(x)=x-5\). In this case, the polynomial \(P(x)=x^2-3x+2\) is transformed into \((x^2-3x+2)(x-5)=x^3-8x^2+17x-10\). The roots of the “transformed” polynomial are the same as the roots of the original one plus the roots of Q. In particular, this implies that \(Q(x)\cdot P(x)\) is a stable polynomial whenever P(x) and Q(x) are stable. In other words, stability is preserved after multiplication by a stable polynomial.

Linear Operators Transforming Polynomials

In general, a linear operator  is any “operation” T transforming polynomials into polynomials which satisfies properties (i) and (ii) above, that is, (i) \(T(P+Q)=T(P)+T(Q)\) for all polynomials PQ, and (ii) \(T(aP)=aT(P)\) for every polynomial P and constant \(a \in {\mathbb R}\). This implies that \(T(x^2-3x+2)=T(x^2)-3T(x)+2T(1)\), and, more generally,
$$ T(a_n x^n + \dots + a_1 x + a_0) = a_n T(x^n) + \dots a_1 T(x) + a_0 T(1), $$
that is, to define T, it suffices to define \(T(x^k)\) for all \(k \ge 0\). For example, if \(T(x^k)=kx^{k-1}\), \(k\ge 0\), then T is differentiation, while \(T(x^k)=x^k\cdot Q(x)\), \(k\ge 0\), implies that T is just multiplication by a fixed polynomial Q(x). In general, however, T can be arbitrarily complicated: for example, it may be that \(T(x^k)=x^3-4x\) for even k while \(T(x^k)=x^2-1\) for odd k. In this case, \(T(x^2-3x+2)=(x^3-4x)-3(x^2-1)+2(x^3-4x)=3x^3-3x^2-12x+3\), and, in general, for any polynomial P, T(P) has the form \(a(x^3-4x)+b(x^2-1)\) for some constants ab. It is not difficult to verify that the polynomial \(a(x^3-4x)+b(x^2-1)\) is stable for all ab, hence, in this case, T(P) is stable for all P.

Which Linear Operators Preserve Stability?

One of the long-standing fundamental problems in the theory of stable polynomials was to “characterize” all linear operators T which preserve stability, that is, such that T(P) is always a stable polynomial whenever P is stable. Here, by “characterization” we mean simple-to-check necessary and sufficient conditions. In 1914, such conditions were derived by Pólya and Schur  [308] for operators of the form \(T(x^k)=\lambda ^k x^k\), \(k\ge 0\), where \(\lambda _0, \lambda _1, \lambda _2, \dots \) is a given sequence of numbers. Since then, there have been many similar results covering very special transformations T, but almost no progress for general T, until the question was fully resolved [67] in 2009!

To formulate the result, we need some more definitions. We say that stable polynomials P(x) and Q(x) are interlacing  if either \(\alpha _1 \le \beta _1 \le \alpha _2 \le \beta _2 \le \dots \) or \(\beta _1 \le \alpha _1 \le \beta _2 \le \alpha _2 \le \dots \), where \(\alpha _1 \le \alpha _2 \le \dots \le \alpha _n\) and \(\beta _1 \le \beta _2 \le \dots \le \beta _m\) are roots of P(x) and Q(x), respectively. Note that this condition may be satisfied only if n and m differ by at most one. For example, polynomials \(P(x)=x^3-4x\) and \(Q(x)=x^2-1\) are interlacing, because their roots are \(-2,0,2\) and \(-1,1\), respectively. In contrast, polynomials \(x^2-4\) and \(x^2-1\) are not interlacing, see Fig. 9.8. Stable interlacing polynomials R and Q have the property that \(aR(x)+bQ(x)\) is stable for all ab. In particular, if T is a linear operator such that T(P) has the form \(aR(x)+bQ(x)\) for all P, then T(P) is stable for all P.
Fig. 9.8

Interlacing and non-interlacing polynomials

A polynomial P(xy) in two variables x,y is the sum of any finite number of terms of the form \(ax^ky^ m\), where \(a \in {\mathbb R}\) and km are non-negative integers. P(xy) is called stable  if \(Q(t)=P(a+bt, c+dt)\) is a stable polynomial in one variable t for any real abcd such that \(b>0\) and \(d>0\). For example, \(P(x, y)=x+y\) is stable because in this case \(Q(t)=(a+c)+(b+d)t\) has degree 1 and one root \(t=-(a+c)/(b+d)\).

For every linear operator T, let \(S_T\) be the linear operator transforming polynomials in two variables into other polynomials in two variables according to the rule \(S_T(x^ky^l) = T(x^k)y^l\). For example, if T is differentiation, then \(S_T\) is known as the partial derivative with respect to x, and is calculated by the rule \(S_T(x^ky^l) = k x^{k-1}y^l\), for example, \(S_T(x^3y+x^2y^2)=3x^2y+2xy^2\).

Theorem 9.8

A linear operator T, transforming polynomials into polynomials, preserves stability if and only if
  1. (a)

    \(T(x^k)=a_k P(x)+ b_k Q(x)\), where \(a_k, b_k\), \(k=0,1,2,\dots \) are real numbers, and P(x) and Q(x) are some fixed (independent of k) stable interlacing polynomials; or

  2. (b)

    \(S_T[(x+y)^k]\) is a stable polynomial (in 2 variables) for all \(k=0,1,2,\dots \); or

  3. (c)

    \(S_T[(x-y)^k]\) is a stable polynomial (in 2 variables) for all \(k=0,1,2,\dots \).



J. Borcea and P. Bra̋ndén, Pólya–Schur master theorems for circular domains and their boundaries, Annals of Mathematics 170-1, (2009), 465–492.

9.9 On the Gaps Between Primes

The Average Distance Between Consecutive Primes

Primes are natural numbers with exactly two divisors, like 2,3,5,7,11,13,17,19,23,\(\dots \). Because all even numbers n greater than 2 have at least three divisors (1, 2, and n), 2 is the only even prime number. This implies that the pair 2, 3 is the only pair of consecutive prime numbers.

The pairs \(p=3,q=5\), or \(p=5,q=7\), or \(p=11,q=13\), and so on, are examples of pairs of primes p and q such that \(q-p=2\). The famous twin primes conjecture states that there are infinitely many such pairs, and it is one of the oldest unsolved problems in mathematics. A “naïve” reason why this conjecture may be hard to prove is that, if we study the sequence of primes further and further, the average distance between consecutive primes becomes larger and larger. The famous prime number theorem states that, for any large N, there are about \(\frac{N}{\ln N}\) primes less than N. Hence, the average distance between consecutive primes is approximately \(\ln N\). For \(N=10^{100}\), this implies that the average distance between 100-digit primes is about 230. Of course, this does not mean that this distance is exactly 230 in all cases: for some pairs of consecutive primes it is larger, while for some pairs it is smaller.

Pairs of Primes at Distance Much Lower Than the Average

Let \(p_n\) denote the n-th prime, so that \(p_1=2\), \(p_2=3\), \(p_3=5\), \(p_4=7\), and so on. The prime number theorem states that \(p_{n+1}-p_n\) is, on average, about \(\ln p_n\). The twin primes conjecture states that \(p_{n+1}-p_n = 2\) for infinitely many values of n. To make progress towards it, can we at least prove that \(p_{n+1}-p_n\) is less than average infinitely often? That is, given some \(\varepsilon \in (0,1)\), can we prove that
  1. (*)

    \(p_{n+1}-p_n \le \varepsilon \ln p_n\) for infinitely many values of n?


In 1926, Hardy and Littlewood  proved (*) for \(\varepsilon = \frac{2}{3}\), assuming an unproven conjecture called the Generalized Riemann Hypothesis.  Unconditionally, Erdős [140] proved in 1940 that (*) holds for some \(\varepsilon \in (0,1)\), but he did not provide an explicit value. In 1954, Ricci [320] proved (*) for \(\varepsilon = \frac{15}{16}\), and then there was a long chain of improvements, with the best result before 2009 being a 1988 theorem of Maier,  [260] proving that (*) holds for \(\varepsilon \approx 0.2484\).

In 2009, Goldston, Pintz, and Yildirim [165] proved the following theorem.

Theorem 9.9

For any \(\varepsilon >0\), there exists infinitely many values of n such that
$$ p_{n+1}-p_n \le \varepsilon \ln p_n. $$

Theorem 9.9 states that (*) holds for any \(\varepsilon >0\), no matter how small. In the authors’ words, “there exist consecutive primes which are closer than any arbitrarily small multiple of the average spacing”.

While Theorem 9.9 is huge progress compared to the previous results, it is still far from confirming the twin primes conjecture.

Primes in Arithmetic Progressions: Dirichlet’s Theorem

In addition to proving Theorem 9.9, Goldston, Pintz, and Yildirim  provided an excellent idea for further progress, which is based on the distribution of primes in arithmetic progressions. An arithmetic progression  with first term a and difference q is a sequence of the form
$$\begin{aligned} a, a+q, a+2q, a+3q, \dots , a+kq, \dots . \end{aligned}$$
For example, \(4,10,16,22,28, \dots \) is an arithmetic progression with \(a=4\) and \(q=6\). This particular arithmetic progression contains no primes at all, because all terms in it are divisible by 2. In general, if there is a number \(r>1\) such that both a and q are divisible by r, then all terms in (9.10) are divisible by r, hence it contains no primes at all, or possibly one prime which is equal to r. If there are no such r, the numbers a and q are called relatively prime.  For example, 4 and 6 are not relatively prime, because they are both divisible by \(r=2\), while \(a=3\) and \(q=4\) are relatively prime. Dirichlet’s famous theorem  states that an arithmetic progression (9.10) contains infinitely many primes whenever a and q are relatively prime. For example, with \(a=3\) and \(q=4\), this implies that the sequence \(S_1=3,7,11,15,19,23,27,31,35,\dots \) contains infinitely many primes, while with \(a=1\) and \(q=4\), we conclude that the sequence \(S_2=1,5,9,13,17,21,25,29,33,\dots \) contains infinitely many primes as well.
Fig. 9.9

Primes of the form \(4k+1\) and \(4k+3\)

Primes in Arithmetic Progressions: The Elliott–Halberstam Conjecture

In fact, all primes except for 2 belong either to \(S_1\) or to \(S_2\), see Fig. 9.9, and we would expect that about half of them belong to each one. By the prime number theorem,  for any large N, there are about \(\frac{N}{\ln N}\) primes less than N, and we expect that about \(\frac{N}{2\ln N}\) of them belong to \(S_1\), and another \(\frac{N}{2\ln N}\) of them to \(S_2\). Also, by the prime number theorem,  the product \(\Pi (N)\) of all primes less than N (for example, \(\Pi (12)=2 \cdot 3 \cdot 5 \cdot 7 \cdot 11 = 2310\)) is approximately equal to \(e^N\), where \(e\approx 2.71828...\) is the base of the natural logarithm, and we expect that primes from \(S_1\) and \(S_2\) contribute approximately equally to this product, that is, \(p_1p_2 \dots p_k \approx p'_1p'_2 \dots p'_m \approx \sqrt{e^N}\), where \(p_1,p_2, \dots , p_k\) and \(p'_1,p'_2, \dots , p'_m\) are primes less than N from \(S_1\) and \(S_2\), respectively. Equivalently, \(\ln (p_1p_2 \dots p_k) \approx \ln (p'_1p'_2 \dots p'_m) \approx \ln (\sqrt{e^N}) = N/2\). Or \(g(N, 4,3) \approx g(N, 4,1) \approx N/2\), where g(Nqa) is the logarithm of the product of all primes less than N in the arithmetic progression (9.10). Similarly, for \(q=12\), we expect that primes are approximately uniformly distributed across four arithmetic progressions (9.10) with \(a=1,5,7,11\) (these are all values of a less than 12 which are relatively prime with 12), and \(g(N, 12,1) \approx g(N, 12,5) \approx g(N, 12,7) \approx g(N, 12,11) \approx N/4\). For general q, we expect that
$$\begin{aligned} g(N,q, a_1) \approx g(N,q, a_2) \approx \dots \approx g(N,q, a_{\phi (q)}) \approx \frac{N}{\phi (q)}, \end{aligned}$$
where \(a_1, a_2, \dots , a_{\phi (q)}\) are all values of a less than q which are relatively prime with q, and \(\phi (q)\) denotes the number of such a, for example, \(\phi (12)=4\). Equation (9.11) is equivalent to saying that \(|g(N,q, a_k)-\frac{N}{\phi (q)}|\) is “small” for \(k=1,2,\dots , \phi (q)\), or, equivalently, that the function
$$ h(N, q) := \max _{1\le k \le \phi (q)} \left| g(N,q, a_k) - \frac{N}{\phi (q)}\right| $$
is “small”. To guarantee that this happens for all q not exceeding some value Q, we need to have a good upper bound for the function \(H(N,Q):=\sum _{q \le Q} h(N, q)\).
Elliott and Halberstam [127] conjectured that for any \(v \le 1\), any \(A > 0\) and any \(\varepsilon > 0\) there is a constant C such that
$$\begin{aligned} H(N, N^{v-\varepsilon }) \le C\frac{N}{(\ln N)^A}, \quad \forall N. \end{aligned}$$
Because H(NQ) is an increasing function in Q, (9.12) becomes harder to prove as v increases. The best theorem in this direction is the famous Bombieri–Vinogradov theorem  proving (9.12) for \(v \le 0.5\).

Bounded Gaps Between Primes

Goldston, Pintz, and Yildirim  proved that if (9.12) holds for any \(v>0.5\), even for \(v=0.50000001\), then there exist infinitely many values of n such that
$$\begin{aligned} p_{n+1} - p_n \le B \end{aligned}$$
where B is a constant depending only on v. In particular, if one can prove (9.12) for \(v=0.971\), then one can choose \(B=16\). This result already looks close to the twin primes conjecture,  stating that the same statement holds with \(B=2\). However, (9.12) is currently known to hold only for \(v \le 0.5\), which is just a little bit less than needed!

In a later work, Zhang [408] observed that in fact an even weaker version of (9.12) implies (9.13), and was able to prove this weaker version, establishing (9.13) with \(B=70{,}000{,}000\). This was later improved by Maynard and others, and (9.13) is now known to hold with \(B=246\), see [309].


D. Goldston, J. Pintz, and C. Yildirim, Primes in tuples I, Annals of Mathematics 170-2, (2009), 819–862.

9.10 A Proof of the B. and M. Shapiro Conjecture in Real Algebraic Geometry

Bases and Linear Independence in \({\mathbb R}^2\)

Any point A in the coordinate plane \({\mathbb R}^2\) can be described by two coordinates, \(x_A\) and \(y_A\). Any two points B and A define a vector  \(\mathbf {BA}\), which is, geometrically, just an arrow connecting B with A. Algebraically, we say that the vector \(\mathbf {BA}\) has coordinates \((x_A-x_B, y_A-y_B)\), where \((x_B, y_B)\) and \((x_A, y_A)\) are coordinates of B and A, respectively. In particular, if \(O=(0,0)\) is the center of the coordinate plane, then the vector \(\mathbf {OA}\) has the same coordinates as A.

Vectors may be multiplied by constants using the rule \(\alpha (x, y)=(\alpha x, \alpha y)\). If \(A \ne O\), the set of all points M such that \(\mathbf {OM} = \alpha \mathbf {OA},\, \alpha \in {\mathbb R}\), is just a line passing through points O and A. If B is any point not on this line, then any vector \(\mathbf {OM}\) in the plane can be uniquely represented as a linear combination \(\alpha \mathbf {OA} + \beta \mathbf {OB}\) of \(\mathbf {OA}\) and \(\mathbf {OB}\), where addition is coordinate-wise. In this case, we say that the vectors \(\mathbf {OA}\) and \(\mathbf {OB}\) form a basis of the coordinate plane. For example, if A and B have coordinates (2, 0) and (1, 2), respectively, then any vector \(\mathbf {OM}\) with coordinates (xy) can be uniquely represented as \((x, y)=\alpha (2,0)+\beta (1,2)=(2\alpha +\beta , 2\beta )\), see Fig. 9.10a, and the coefficients \(\alpha \) and \(\beta \) in this representation are given by \(\alpha =x/2-y/4\) and \(\beta =y/2\).

The condition “B is not on the line OA” is equivalent to “\(\mathbf {OB} \ne \alpha \mathbf {OA}\) for any \(\alpha \in {\mathbb R}\)”, or, equivalently, to \(x_Ay_B - x_By_A \ne 0\). For example, for \((x_A, y_A)=(2,0)\) and \((x_B, y_B)=(1,2)\) this reduces to \(2\cdot 2 - 1\cdot 0 \ne 0\). In this case, vectors with coordinates \((x_A, y_A)\) and \((x_B, y_B)\) are called linearly independent.  In fact, two vectors in the plane form a basis if and only if they are linearly independent.

Bases and Linear Independence in \({\mathbb R}^3\)

Similarly, a vector in 3-dimensional space \({\mathbb R}^3\) is described by three coordinates (xyz). More generally, a (real) n-dimensional vector is just a set of n real coordinates \((x_1,x_2,\dots , x_n)\). k such vectors \(\mathbf {a_1}, \dots , \mathbf {a_k}\) are called linearly independent  if there are no real numbers \(\lambda _1, \dots , \lambda _k\), not all 0, such that \(\lambda _1 \mathbf {a_1} + \dots + \lambda _k \mathbf {a_k}=0\). For example, vectors \(\mathbf {a_1}=(2,1,0)\), \(\mathbf {a_2}=(-1,2,0)\) and \(\mathbf {a_3}=(-1,-1,2)\) are linearly independent, and form a basis of \({\mathbb R}^3\), see Fig. 9.10b, while vectors \(\mathbf {a_1}=(0,4,2)\), \(\mathbf {a_2}=(-2,1,2)\) and \(\mathbf {a_3}=(-2,3,3)\) are not linearly independent, because \(0.5\mathbf {a_1}+\mathbf {a_2}-\mathbf {a_3}=0\). In fact, all linear combinations of these vectors form a plane, see Fig. 9.10c, and these vectors do not form a basis of \({\mathbb R}^3\).
Fig. 9.10

Bases and linear independence in \({\mathbb R}^2\) and \({\mathbb R}^3\)

Polynomials in Real and Complex Variables

The notion of linearly independence can be studied not only for vectors, but for any mathematical “objects” which can be added and multiplied by constants, for example, polynomials. A real polynomial is any function of the form \( P(x) = a_n x^n + a_{n-1} x^{n-1} + \dots + a_2 x^2 + a_1 x + a_0, \) where \(a_0, a_1, \dots , a_n\) are some real coefficients. A root  of a polynomial is a solution to the equation \(P(x)=0\). For example, if \(n=2\), \(a_2 \ne 0\), the equation \(P(x)=0\) is a quadratic equation \(a_2 x^2 + a_1 x + a_0 = 0\), whose solutions are given by the formula \( x_{1,2} = \frac{-a_1 \pm \sqrt{a_1^2-4a_0a_2}}{2a_2}. \) In particular, real solution(s) exist if and only if \(a_1^2-4a_0a_2 \ge 0\). If \(a_1^2-4a_0a_2 = - D\) for some \(D>0\), then \( x_{1,2} = \frac{-a_1 \pm \sqrt{D}\sqrt{-1}}{2a_2} = \frac{-a_1 \pm \sqrt{D}i}{2a_2}, \) where i is just a notation for the square root of \(-1\) (which is not a real number). Numbers of the form \(z=a+bi\) for some real ab are called complex numbers,  see e.g. Sect.  1.7 for details. The set of all complex numbers is usually denoted by \({\mathbb C}\).

As we have seen above, any quadratic equation with real coefficients always has complex roots. The fundamental theorem of algebra states that this remains correct for any equation of the form \(P(z)=0\), where P(z) is a complex polynomial, that is, an expression of the form
$$ P(z) = a_n z^n + a_{n-1} z^{n-1} + \dots + a_2 z^2 + a_1 z + a_0, $$
where z is a complex variable and \(a_0, a_1, \dots , a_n\) are complex coefficients.

Linear Independence and Bases for Polynomials

Two polynomials P(z) and Q(z) are called linearly independent if \(P(z) \ne \alpha Q(z)\) and \(Q(z) \ne \alpha P(z)\) for any complex number \(\alpha \). For example, \(P(z)=iz^2+(1-i)\) and \(Q(z)=-z^2+(1+i)\) are not linearly independent because \(Q(z)=iP(z)\). In contrast, \(P(z)=iz^2+z\) and \(Q(z)=z^2+iz\) are linearly independent, because \(\alpha (iz^2+z)=z^2+iz\), or \((\alpha i - 1)z^2 + (\alpha - i)z=0\) implies that \(\alpha i - 1=0\) and \(\alpha - i=0\), hence \(\alpha =-i\) and \(\alpha =i\), a contradiction.

More generally, k polynomials \(P_1(z), P_2(z), \dots , P_k(z)\) are called linearly independent if there are no complex numbers \(\lambda _1, \dots , \lambda _k\), not all 0, such that \(\lambda _1 P_1(z) + \dots + \lambda _k P_k(z)=0\). Let S be the set of polynomials which can be written as a linear combination of \(P_1(z), P_2(z), \dots , P_k(z)\), that is,
$$\begin{aligned} S=\{P(z)\,|\, P(z)=\lambda _1 P_1(z) + \dots + \lambda _k P_k(z), \,\, \lambda _i \in {\mathbb C}, \, i=1,\dots , k\}. \end{aligned}$$
If \(Q_1(z), Q_2(z), \dots , Q_k(z)\) are any other k linearly independent polynomials belonging to S, then the set S can be equivalently written as
$$\begin{aligned} S=\{P(z)\,|\, P(z)=\lambda _1 Q_1(z) + \dots + \lambda _k Q_k(z), \,\, \lambda _i \in {\mathbb C}, \, i=1,\dots , k\}. \end{aligned}$$
Any such set \(Q_1(z), Q_2(z), \dots , Q_k(z)\) is called a basis  for S.

Looking for a Simpler Basis

The representation (9.15) can sometimes be much simpler than the original representation (9.14). For example, if \(k=2\), \(P_1(z)=(1+i)z^2+(1-i)z+2\), \(P_2(z)=(1-i)z^2+(1+i)z+2\), then the set S in (9.14) consists of polynomials of the form
$$\begin{aligned} \begin{aligned} P(z)&= \lambda _1 P_1(z) + \lambda _2 P_2(z) \\&= [\lambda _1(1+i)+\lambda _2(1-i)]z^2 + [\lambda _1(1-i)+\lambda _2(1+i)]z+[2\lambda _1+2\lambda _2]. \end{aligned} \end{aligned}$$
Defining \(\lambda '_1 := \lambda _1(1+i)+\lambda _2(1-i)\) and \(\lambda '_2 := \lambda _1(1-i)+\lambda _2(1+i)\), we see that \(2\lambda _1+2\lambda _2 = \lambda '_1 + \lambda '_2\), and \(P(z)=\lambda '_1 z^2 + \lambda '_2 z + \lambda '_1 + \lambda '_2\), hence \(S=\{P(z)\,|\, P(z)=\lambda '_1 (z^2+1) + \lambda '_2 (z+1), \,\, \lambda '_1, \lambda '_2 \in {\mathbb C}\}\). Such a simplified representation is possible because S has a simple basis: \(Q_1(z)=z^2+1\) and \(Q_2(z)=z+1\).

In general, it is convenient to represent S in (9.14) using as simple a basis as possible. In particular, what are sufficient conditions which guarantee the existence of a basis \(Q_1(z), Q_2(z), \dots , Q_k(z)\) such that all polynomials \(Q_i(z), \, i=1,\dots , k\) have only real coefficients?

Sufficient Conditions for the Existence of a Basis with Real Coefficients

To formulate the answer to this question, established in [285], we need more definitions. For any polynomial P(z), its derivative  is a polynomial \(P'(z)\), uniquely determined by the rules (a) \((P+Q)'(z)=P'(z)+Q'(z)\) for all polynomials PQ, (b) \((aP)'(z) = a P'(z)\) for every constant \(a \in {\mathbb C}\), and (c) \((z^k)'=kz^{k-1}\) for \(k=0,1,2,\dots \). The second derivative of P, denoted \(P^{(2)}(z)\), is the derivative of \(P'(z)\), and so on. For example, for \(P(z)=z^3+iz^2+2z-i\), \(P'(z)=3z^2+2iz+2\), \(P^{(2)}(z)=6z+2i\), \(P^{(3)}(z)=6\), and \(P^{(i)}(z)=0\) for all \(i \ge 4\).

For an arbitrary set of k polynomials \(P_1(z), P_2(z), \dots , P_k(z)\), a complex number \(z^*\) is called a root of its Wronskian if the vectors \(\mathbf {a_1}=(P_1(z^*), P_2(z^*), \dots , P_k(z^*))\), \(\mathbf {a_2}=(P'_1(z^*), P'_2(z^*), \dots , P'_k(z^*))\), \(\dots \), \(\mathbf {a_k}=(P^{(k-1)}_1(z^*), P^{(k-1)}_2(z^*), \dots , P^{(k-1)}_k(z^*))\) are linearly dependent, that is, \(\lambda _1 \mathbf {a_1} + \dots + \lambda _k \mathbf {a_k}=0\) for some complex numbers \(\lambda _1, \dots , \lambda _k\), not all 0.

Theorem 9.10

If all roots of the Wronskian of a set of polynomials \(P_1(z), P_2(z), \dots , P_k(z)\) are real, then the set S defined in (9.14) has a basis consisting of polynomials with real coefficients.

In the example above with \(k=2\), \(P_1(z)=(1+i)z^2+(1-i)z+2\), \(P_2(z)= (1-i)z^2+(1+i)z+2\), \(z^* \in {\mathbb C}\) is a root of the Wronskian if vectors \(\mathbf {a_1}=(P_1(z^*), P_2(z^*))\) and \(\mathbf {a_2}=(P'_1(z^*), P'_2(z^*))\) are linearly dependent, which is the case if \(P_1(z^*)P'_2(z^*) - P_2(z^*)P'_1(z^*) = 0\), where \(P'_1(z^*)=2(1+i)z^*+(1-i)\) and \(P'_2(z^*)=2(1-i)z^*+ (1+i)\). This simplifies to \(-4i(z^*)^2-8iz^*+4i=0\), or \((z^*)^2+2z^*-1=0\). Because this equation has only real roots, Theorem 9.14 guarantees that S in (9.14) has a basis consisting of polynomials with real coefficients. As we have seen above, this is indeed the case, and the basis is \(Q_1(z)=z^2+1\) and \(Q_2(z)=z+1\).

In fact, the \(k=2\) case of Theorem 9.14 was resolved in 2002, see Sect.  2.1, but the general case remained open until 2009. In its general form, Theorem 9.14 confirms a conjecture known as the “B. and M. Shapiro conjecture”,  which has number of equivalent formulations, and many important consequences, especially in the field of mathematics called “real algebraic geometry”.


E. Mukhin, V. Tarasov, and A. Varchenko, The B. and M. Shapiro conjecture in real algebraic geometry and the Bethe ansatz, Annals of Mathematics 170-2, (2009), 863–881.

9.11 Bounding Diagonal Ramsey Numbers

Looking for a Monochromatic Triangle

Assuming that there are six people in a room, can we always find either three people who all know each other or three people who all do not know each other? To analyse questions like this, it is convenient to represent people as points in the plane, and then connect the points by a blue line for any pair of people who know each other, and by a red line for any pair who do not know each other. Then we have 6 points, each pair connected by either a red or a blue line, and the question is can we always find either a red or a blue triangle?

Let us prove that the answer is “Yes, we always can”. From any point A we draw 5 lines, hence at least 3 of them should have the same colour, say, blue. Let A be connected by blue lines to points B, C, and D. If any of the lines BC, CD, or DB are blue, then we have a blue triangle (for example, if BC is blue, then the blue triangle is ABC, and so on). Otherwise all lines BC, CD, and DB are red, hence we have a red triangle BCD. In Fig. 9.11a you can see that you will get a monochromatic triangle after any colouring of BD.
Fig. 9.11

Illustrations for a \(R(3,3)\le 6\), b \(R(3,3)>5\), c and d \(R(4,3) \le 9\), e \(R(4,4)>17\)

What if we have just 5 people instead of 6? Then the answer to the same question is “No”. Let us label the people (and the corresponding points) A, B, C, D, and E, and let the lines AB, BC, CD, DE, and EA be blue, and the lines AC, CE, EB, BD and DA red, see Fig. 9.11b. It is easy to check that, in this case, neither a red nor a blue triangle exists. In fact, this colouring is “unique up to relabelling”, that is, in any set of 5 points connected by red or blue lines without red and blue triangles, we can always give the points names A, B, C, D, and E in such a way that the colouring becomes exactly as described above.

Looking for a Red Triangle or Blue Quadruple

A slightly more difficult problem is to prove that, in a group of 9 people, we can always find either three who do not know each other, or four who know each other. In other words, if 9 points are connected by red or blue lines, then either there exists a blue triangle, or there are 4 points all connected by red lines, which we will call a red quadruple. Indeed, if every point is adjacent to exactly 3 blue lines, then the total number of blue lines is \(9\cdot 3/2\), which is not an integer, a contradiction. Hence, there is a point A adjacent either to at least 4 blue lines or to at most 2. In the first case, let A be connected by blue lines to points B, C, D and E, see Fig. 9.11c. If any pair of them (say, B and C) is connected by a blue line as well, then we have a blue triangle (in this case, ABC). Otherwise points B, C, D and E form a red quadruple. In the second case, A is connected by blue lines to at most 2 points, hence there are 6 points to which it is connected by red lines, see Fig. 9.11d. As we have proved above, out of these 6 points we can always select a triangle, call it BCD, which is either red or blue. If it is blue, we have found a blue triangle. If it is red, then ABCD is a red quadruple.

Looking for a Monochromatic Quadruple

Similarly, we can prove that out of 18 points, connected by red or blue lines, we can always find either a red quadruple or a blue quadruple. Indeed, any point A is connected to 17 others, hence it is connected to at least 9 of them by lines of the same colour, say, blue. But we have just proved that in any set of 9 points we can always find either a blue triangle BCD (in which case ABCD is a blue quadruple), or a red quadruple.

What if we have just 17 points, can we always find either a red or a blue quadruple? It turns out, we cannot. Let us label the points \(A_1,A_2,\dots , A_{17}\), and position them in this order as the vertices of a regular 17-gon with unit side length. For any two points AB, let d(AB) be the distance of the “shortest path” between them while travelling along the 17-gon: for example, \(d(A_1,A_5)=4\) with shortest path \(A_1 \rightarrow A_2 \rightarrow A_3 \rightarrow A_4 \rightarrow A_5\), while \(d(A_2,A_{16})=3\) with shortest path \(A_2 \rightarrow A_1 \rightarrow A_{17} \rightarrow A_{16}\). Let us colour the line AB blue if d(AB) is either 1, or 2, or 4, or 8, and red otherwise. In Fig. 9.11e only blue lines are depicted. Let us prove that there are no blue quadruples. Imagine we have one, with vertices (counter-clockwise) being ABCD, and with \(d(A, B)=a\), \(d(B, C)=b\), \(d(C, D)=c\), and \(d(D, A)=d\). We can assume that \(\max {a,b,c, d}=d\). Then either \(a+b+c=d\) or \(a+b+c+d=17\). Because this is a blue quadruple, each of abcd are either 1, 2, 4, or 8, hence \(a+b+c+d=17\) is possible only if \(d=8\) and abc are (in some order) 1, 4, 4. But then either \(d(A, C)=a+b=5\) or \(d(B, D)=b+c=5\), contradicting the fact that AC and BD are blue. Similarly, \(a+b+c=d\) is possible if (i) \(d=4\) and abc are (in some order) 1, 1, 2 or (ii) \(d=8\) and abc are (in some order) 2, 2, 4. In case (i), either \(d(A, C)=3\) or \(d(B, D)=3\), while in (ii), either \(d(A, C)=6\) or \(d(B, D)=6\), each case leading to a contradiction. The proof that there are no red quadruples is similar.

In fact, Evans, Pulham and Sheehan [143] proved in 1981 that the colouring described above (which is called the Paley graph of order 17)  is “unique up to relabelling”. In any other red-blue colouring of lines between 17 points (and there are about \(2.46 \times 10^{26}\) such colourings) we can always find either a red or a blue quadruple.

Diagonal Ramsey Numbers and Alien Invasions

In general, Ramsey [314] proved in 1930 that, for every n, there exists an N such that, if N points are connected by red or blue lines, then there exists either n points all connected by red lines, or n points all connected by blue lines. The minimal number N with this property is called the diagonal Ramsey number  R(nn). It is trivial that \(R(2,2)=2\), and we have just proved that \(R(3,3)=6\), and \(R(4,4)=18\). One might guess that we can find R(5, 5) by a similar not-so-complicated argument, but in fact determining R(5, 5) remains an open problem despite all efforts, including an extensive computer search. The famous mathematician Paul Erdős said that, if an alien force, much-much more powerful than human civilization, contacted us and said that they will destroy the planet unless we tell them R(5, 5), then we could unite all mathematicians and all computer power in the world to solve the problem. However, if they asked us to determine R(6, 6), we would have a better chance to destroy the aliens...

A Superpolynomial Improvement

Given that the exact computation of R(nn) is so difficult, can we at least have some estimates? Erdős and Szekeres  [137] proved in 1935 that
$$ R(n+1,n+1) \le \frac{(2n)!}{n!\cdot n!} $$
For R(3, 3), this bound gives \(R(3,3)\le \frac{4!}{2! \cdot 2!}=\frac{1\cdot 2\cdot 3\cdot 4}{1\cdot 2\cdot 1\cdot 2}=6\), which is the exact value, while for R(4, 4), it gives \(R(4,4)\le \frac{6!}{3! \cdot 3!}=\frac{1\cdot 2\cdot 3\cdot 4\cdot 5\cdot 6}{1\cdot 2\cdot 3\cdot 1\cdot 2\cdot 3}=20\), which is close to the correct value \(R(4,4)=18\). However, as n grows, the gap between the bound and the exact value seems to grow, hence a better bound is desirable.
In 1987, Thomason [375] proved that
$$ R(n+1,n+1) \le n^{-1/2+A/\sqrt{\ln n}}\frac{(2n)!}{n!\cdot n!} $$
for some constant A. For large n, this bound is better than Erdős’ one by a factor of about \(\sqrt{n}\). After 1987, there were no further improvements for more than 20 years, until the following theorem [100] was proved in 2009.

Theorem 9.11

There exists a constant C such that
$$ R(n+1,n+1) \le n^{-C \ln n/\ln \ln n}\frac{(2n)!}{n!\cdot n!} $$

No matter what the value of the constant C is, we can find n large enough so that \(C \ln n/\ln \ln n\) is larger than, say, 100.5, and, for such values of n, the bound in Theorem 9.11 is better than Thomason’s one by a factor about \(n^{100}\), and the same is true if 100 is replaced by any other constant. As mathematicians say, the bound in Theorem 9.11 gives a superpolynomial improvement  compared to the previous ones.

Because it is known that \(\frac{(2n)!}{n!\cdot n!} \le C\frac{4^n}{\sqrt{n}}\) for some constant C, Erdős’ estimate can be rewritten as
$$ R(n+1,n+1) \le C\frac{4^n}{\sqrt{n}}, $$
Thomason’s theorem implies that
$$ R(n+1,n+1) \le C'\frac{4^n}{n} $$
for some constant \(C'\), while Theorem 9.11 implies that
$$ R(n+1,n+1) \le C_k\frac{4^n}{n^k} $$
for all k, where \(C_k\) is a constant depending on k.


D. Conlon, A new upper bound for diagonal Ramsey numbers, Annals of Mathematics 170-2, (2009), 941–960.

9.12 An Almost Optimal Upper Bound for Moments of the Riemann Zeta Function

A Short Paper of Riemann

In mathematics, seemingly unrelated areas can sometimes become interconnected in an unexpected way. This happened, for example, with number theory and the theory of functions of a complex variable.

In the middle of the 19th century, the mathematician Bernhard Riemann tried to understand a purely number theoretic question: how many prime numbers are there? Prime numbers,  those positive integers which have exactly two divisors, 1 and themselves, lie at the heart of number theory. If \(\pi (n)\) denotes the number of primes less than or equal to n, there was a conjecture that \(\pi (n) \approx \frac{n}{\ln n}\), or, more formally, that
$$\begin{aligned} \lim \limits _{n \rightarrow \infty } \frac{\pi (n)}{(n/\ln n)} = 1. \end{aligned}$$
In 1859, Riemann wrote a short paper [322] on this topic, in which he... did not prove the result. Despite this, the paper had a tremendous influence on the history of mathematics, and, in particular, suggested one of the most famous and important open problems we have ever had.

Functions of a Complex Variable

In this paper, Riemann suggested to attack conjecture (9.16) using methods from a completely different field, the theory of functions of a complex variable. Complex numbers are those of the form \(z=a+ib\), with a and b real, where i is an (imaginary) number such that \(i^2=-1\), see e.g. Sect.  1.7 for more details. These numbers were initially invented to solve equations like \(x^2+1=0\), which have no real solutions, but quickly arose in many other applications. Geometrically, a complex number \(z=a+ib\) can be represented as a point (ab) in the coordinate plane. The distance \(\sqrt{a^2+b^2}\) from this point to the coordinate center is called the absolute value  of z and denoted by |z|.

The function \(f(z)=z^2\) is an example of a function with a complex argument and complex output, in this case sending the number \(z=a+ib\) to the number \((a+ib)^2=a^2+2abi+(ib)^2=(a^2-b^2)+(2ab)i\). While \(f(z)=z^2\) is defined for all complex numbers, the function \(f(z)=1/z\) is an example of a function defined for all complex numbers except for \(z=0\). The set \(D_f\) of all points where f is defined is called the domain  of f. For any \(z_0 \in D_f\), the derivative  of f at \(z_0\), denoted by \(f'(z_0)\), is defined as
$$ f'(z_0) = \lim _{z \rightarrow z_0} \frac{f(z)-f(z_0)}{z-z_0}. $$
This definition is very similar to the definition of the derivative of “usual” functions on the real line, and similar formulas work, e.g. \((z^2)'=2z\) and \((1/z)'=-1/z^2, \, z \ne 0\). If a function f has a derivative at every point of its domain, it is called holomorphic  on \(D_f\). For example, \(f(z)=z^2\) and \(f(z)=1/z\) are holomorphic functions, while, e.g. the function \(f(z)=|z|\) is not, because it has no derivative at \(z=0\).

The Riemann Zeta Function

Before Riemann,  it was known that the distribution of the primes depends on the properties of the function
$$ \xi (s) = \sum _{n=1}^\infty \frac{1}{n^s}, $$
where \(s>1\) is a real number, and \(\sum _{n=1}^\infty \frac{1}{n^s}\) is understood as \(\lim \limits _{N\rightarrow \infty }\sum _{n=1}^N \frac{1}{n^s}\). For example, Euler proved in 1734 that \(\sum _{n=1}^\infty \frac{1}{n^2} = \frac{\pi ^2}{6}\), hence \(\xi (2)=\frac{\pi ^2}{6}\). However, the sum \(\sum _{n=1}^\infty \frac{1}{n^s}\) is undefined for all real numbers \(s<1\), for example, for \(s=0\) it reduces to \(1+1+1+\dots \), while for \(s=-1\) it reduces to \(1+2+3+4+\dots \).

Riemann noticed that there exists a unique function \(\zeta \) of a complex variable which (i) is defined for all complex numbers z except \(z=1\), (ii) is holomorphic, and (iii) satisfies \(\zeta (z)=\sum _{n=1}^\infty \frac{1}{n^z}\) for all z for which the infinite sum is well-defined. This function is called the Riemann zeta function.  In particular, \(\zeta (s)\) is well-defined for real \(s<1\), for example, \(\zeta (0)=-\frac{1}{2}\), \(\zeta (-1)=-\frac{1}{12}\), which allows mathematician to write various funny formulas like \(1+1+1+\dots =-\frac{1}{2}\), or \(1+2+3+4+\dots = -\frac{1}{12}\). Also, \(\zeta (-2)=\zeta (-4)=\zeta (-6)=\dots = 0\). Numbers of the form \(-2k\), \(k=1,2,3\dots \), are called trivial zeros  of \(\zeta \). All other complex numbers z such that \(\zeta (z)=0\) are called non-trivial zeros.

The Riemann Hypothesis

Riemann noticed that (9.16) would follow from the following statement
  1. (*)

    If \(z=a+ib\) is a non-trivial zero of \(\zeta \), then \(a=1/2\).

The set of all complex numbers \(z=1/2+ib\), \(b\in {\mathbb R}\), is called the critical line.  Hence, (*) can be reformulated as the conjecture that all non-trivial zeros of \(\zeta \) lie on the critical line. Figure 9.12a depicts the first few zeros of \(\zeta \), and the critical line is drawn as a dotted line. Figure 9.12b depicts the real and imaginary parts of \(\zeta \) on the critical line, while Fig. 9.12c depicts its absolute value.
Fig. 9.12

a The first few zeros of the Riemann zeta function \(\zeta \), b and c the function \(\zeta \) on the critical line

At first, (*) looked like a not-very-difficult-to-prove lemma, but Riemann was not able to find a rigorous justification. In 1896, Jacques Hadamard [182] proved a weaker statement “if \(z=a+ib\) is a non-trivial zero of \(\zeta \), then \(a \in [0,1]\)”, and was able to deduce (9.16) from it. However, it was clear that even better estimates for the distribution of primes would follow from (*). Statement (*) received the name Riemann hypothesis  and gained the status of an important open problem. In 1900, Hilbert included it in his list [201] of 23 problems for 20th century mathematics. In 2000, the Clay Mathematics Institute included it in its list of 7 problems,  offering a million-dollar prize for its solution. However, the problem is still open, and there is no sign that it will be solved in the near future.

How Large is the Riemann Zeta Function on the Critical Line?

Many other mathematical theorems have been proved in the form “if the Riemann hypothesis holds, then the desired result follows”. However, some other applications require a further understanding of the behaviour of \(\zeta \) on the critical line \(1/2+it\). In particular, how large is \(\zeta (1/2+it)\)? If “large” is understood in terms of absolute value, then the “average size” of \(|\zeta (1/2+it)|\) on the interval \(t\in [0,T]\) is given by \( \frac{1}{T}\int _0^T |\zeta (1/2+it)| dt. \) One may also be interested in the average size of the squared absolute value \(|\zeta (1/2+it)|^2\), and, more generally, in estimating
$$ M_k(T) := \int _0^T |\zeta (1/2+it)|^{2k} dt, $$
for all \(k>0\). \(M_k(T)\) is also called a k-th moment of \(\zeta \).

Lower and Upper Bounds for the k -th Moment

Estimating \(M_k(T)\) turned out to be a difficult problem, especially if we aim for unconditional results. The progress is better if we assume that the Riemann hypothesis (*) holds. In this case, Ramachandra [313] proved in 1978 that, for every \(k>0\), there is a constant \(C_k\) such that
$$\begin{aligned} C_kT(\ln T)^{k^2} \le M_k(T), \quad \forall T. \end{aligned}$$
For an upper bound, the best result (assuming the Riemann hypothesis)  before 2009 was
$$ M_k(T) \le C'_k e^{2kC \ln T/\ln \ln T}\quad \forall T, $$
where \(C'_k\) is a constant which depends on k, and C is an absolute constant.

The following theorem [354], also assuming the Riemann hypothesis, provides a much better upper bound for all values of k. In fact, the established bound is “\(\varepsilon \)-close” to the lower bound of Ramachandra.

Theorem 9.12

Assume that the Riemann hypothesis (*) holds. Then for every \(k>0\) and every \(\varepsilon >0\), there is a constant \(C_{k,\varepsilon }\) such that
$$ M_k(T) \le C_{k,\varepsilon } T(\ln T)^{k^2+\varepsilon }\quad \forall T. $$

In a later work, Adam Harper [189] improved the bound in Theorem 9.12 and proved that \(M_k(T) \le D_k T(\ln T)^{k^2}\) for some constant \(D_k\). Together with (9.17), this resolves the question of how large \(|\zeta |\) is on the critical line, up to a constant factor. The “only” problem is that its resolution is, like hundreds of other important theorems in the field, subject to the correctness of the Riemann hypothesis. If it turn out to be false, the conclusion of all such theorems could be false as well.


K. Soundararajan, Moments of the Riemann zeta function, Annals of Mathematics 170-2, (2009), 981–993.

9.13 Optimal Lattice Sphere Packing in Dimension 24

Vectors and Lattices

Vectors in the plane are, geometrically, directed line segments, connecting an initial point A with a terminal point B, and usually denoted \(\mathbf {AB}\). In the coordinate plane, we say that \(\mathbf {AB}\) has coordinates \((x_B-x_A, y_B-y_A)\), where \((x_A, y_A)\) and \((x_B, y_B)\) are coordinates of A and B, respectively. In particular, if \(O=(0,0)\) is the coordinate center, then \(\mathbf {OA}\) has the same coordinates as A. Vectors can be added and multiplied by constants using rules \((x_1,y_1)+(x_2,y_2)=(x_1+x_2, y_1+y_2)\) and \(\lambda (x, y)=(\lambda x, \lambda y)\).

A lattice  \({\mathscr {L}}={\mathscr {L}}_{A, B}\) in the plane is the set of points X such that \(\mathbf {OX} = k \mathbf {OA} + m \mathbf {OB}\), where A and B are some fixed points such that OA and B are not on the same line, and km are integers. For example, if A and B have coordinates (1, 0) and (0, 1), respectively, then \(k (1,0) + m (0,1) = (k, m)\), hence the lattice \({\mathscr {L}}_{A, B}\) consists of all points with integer coefficients.

Counting Lattice Points per Unit Area

If we draw a circle with center \(O=(0,0)\) and large radius R, how many points of \({\mathscr {L}}_{A, B}\) does it contain? To estimate this number, which we denote by \(N({\mathscr {L}}_{A,B}, R)\), let us associate to every lattice point \(X=(k, m)\) the unit square \(U_X\) for which X is the left bottom vertex, that is, the square with vertex coordinates \((k, m), (k+1,m), (k+1,m+1), (k, m+1)\), see Fig. 9.13a. In most cases, a point X is inside the circle if and only if \(U_X\) is inside it as well. This is not true if X is near the boundary of the circle, but, for large R, the number of lattice points near the boundary is much less than \(N({\mathscr {L}}_{A,B}, R)\), and this boundary effect can be ignored. Now, the number of unit squares inside the circle is, again up to boundary effects, equal to the ratio \(S(\text {Circle})/S(U_X)\), where \( S(\text {Circle})=\pi R^2\) and \(S(U_X)\) are the areas bounded by the circle and \(U_X\), respectively. Hence, the circle contains roughly about \(\pi R^2/S(U_X)\) lattice points, or, in other words, about \(1/S(U_X)\) lattice points on average per unit area. The quantity \(1/S(U_X)\) is called the density  of the lattice \({\mathscr {L}}\) in the plane. In our case, \(U_X\) is a unit square, \(S(U_X)=1\), hence its density \(1/S(U_X)\) is also equal to 1: our lattice contains on average 1 point per unit area.
Fig. 9.13

Lattices, the fundamental parallelogram, and sphere packing

The Fundamental Parallelogram, and the Density of a Lattice

If \(A=(1/2,0)\), \(B=(0,1/2)\), the lattice \({\mathscr {L}}_{A, B}\) consists of the points whose coordinates are either integers or half integers. In this case, every point \(X\in {\mathscr {L}}_{A, B}\) is a left bottom vertex of a square \(U_X\) with side length 0.5. Hence, \(S(U_X)=0.25\), and the density \(1/S(U_X)\) is equal to 4, that is, there are four points of the lattice per unit area. A slightly more complicated example is \(A=(3,0)\), \(B=(-2,1)\). Then \({\mathscr {L}}_{A, B}\) consists of points with coordinates \(k(3,0)+m(-2,1)=(3k-2m, m)=(3[k-m]+m, m)\), that is, of all points X with integer coordinates (xy), such that \(x-y\) is a multiple of 3. In this case, X is a left bottom vertex of the parallelogram \(U_X\) with vertex coordinates \((x, y), (x+3,y), (x+1,y+1), (x-2,y+1)\), see Fig. 9.13b. The area \(S(U_X)\) and density \(1/S(U_X)\) are then equal to 3 and 1 / 3, respectively.

In general, the fundamental parallelogram  U of a lattice \({\mathscr {L}}_{A, B}\) in the plane is the set of points X such that \(\mathbf {OX} = \alpha \mathbf {OA} + \beta \mathbf {OB}\), where \(\alpha \in [0,1]\) and \(\beta \in [0,1]\). In other words, U is the parallelogram with vertices OACB, where C has coordinates \((x_A+x_B, y_A+y_B)\). The whole lattice can be considered as the vertices of a tiling of the plane by copies of this parallelogram. The real number \(\rho ({\mathscr {L}}_{A, B}):=1/S(U)\) is called the density  of \({\mathscr {L}}_{A, B}\).

The Length of the Shortest Vector and Circle Packing

Another important parameter of any lattice \({\mathscr {L}}\) is the length of the shortest vector in it
$$ h({\mathscr {L}}) := \min _{X \in {\mathscr {L}}} |\mathbf {OX}| = \min _{k, m \in {\mathbb Z}} |k \mathbf {OA} + m \mathbf {OB}|, $$
where \(|\cdot |\) denotes the length of a vector, \({\mathbb Z}\) denotes the set of all integers, and the minimum is with respect to all possible integer values of km except for \(k=m=0\). For example, for the lattice \({\mathscr {L}}_{A, B}\) defined by \(A=(1,0)\) and \(B=(0,1)\), we have \(h({\mathscr {L}}_{A,B})=\min \limits _{k, m}\sqrt{k^2+m^2}=1\). Similarly, if \(A=(1/2,0)\), \(B=(0,1/2)\), then \(h({\mathscr {L}}_{A, B})=1/2\), while in the lattice with \(A=(3,0)\), \(B=(-2,1)\), \(h({\mathscr {L}}_{A,B})=\min \limits _{k, m}\sqrt{(3k-2m)^2+m^2}=\sqrt{2}\), with the minimum achieved, for example, for \(k=m=1\).
Geometrically, \(h({\mathscr {L}})\) is the minimal distance from the coordinate center \(O=(0,0)\) to any other point of the lattice. In fact, \(h({\mathscr {L}})\) is also the minimal distance between any two points of the lattice, see Fig. 9.13b. Indeed, if \(\mathbf {OX} = k_1 \mathbf {OA} + m_1 \mathbf {OB}\) and \(\mathbf {OY} = k_2 \mathbf {OA} + m_2 \mathbf {OB}\), then
$$|\mathbf {XY}|=|\mathbf {OY}-\mathbf {OX}|=|(k_2-k_1)\mathbf {OA} + (m_2-m_1)\mathbf {OB}| \ge h({\mathscr {L}})$$
provided that \(X \ne Y\). In particular, this implies that if we draw circles of radii \(h({\mathscr {L}})/2\) with centres at every point of \({\mathscr {L}}\), then these circles would not intersect, see Fig. 9.13c. In other words, we can draw \(\rho ({\mathscr {L}})\) non-intersecting circles (per unit area) of radii \(h({\mathscr {L}})/2\), or, equivalently, \(\rho ({\mathscr {L}})(h({\mathscr {L}})/2)^2=\frac{h^2({\mathscr {L}})}{4S(U)}\) non-intersecting circles (per unit area) of radii 1.
This motivates the search for lattices \({\mathscr {L}}\) with ratio
$$ r({\mathscr {L}}):=\frac{h^2({\mathscr {L}})}{4S(U)} $$
as large as possible. For example, \(r({\mathscr {L}}_{A, B})=1^2/(4 \cdot 1)=1/4\) if \(A=(1,0)\) and \(B=(0,1)\), while \(r({\mathscr {L}}_{A, B})=(\sqrt{2})^2/(4 \cdot 3)=1/6\). It turns out that \(r({\mathscr {L}})\) is maximal if the points OAB form an equilateral triangle, that is, \(A=(1,0)\), \(B=(1/2,\sqrt{3}/2)\). Then \(S(U)=\sqrt{3}/2\), \(h({\mathscr {L}})=1\), and \(r({\mathscr {L}})=1^2/(4 \cdot \sqrt{3}/2) = 1/2\sqrt{3} \approx 0.29\).

Lattice-Based Sphere Packings in Higher Dimensions

The same question can be asked in any dimension. In dimension n, points and vectors are described by n coordinates, \((x_1, x_2, \dots , x_n)\), e.g. the coordinate center O is \((0,0,\dots , 0)\). The length \(|\mathbf {a}|\) of a vector \(\mathbf {a}=(x_1, x_2, \dots , x_n)\) is \(|\mathbf {a}|=\sqrt{x_1^2+x_2^2+\dots +x_n^2}\). Any n vectors \(\mathbf {a}_1, \dots \mathbf {a}_n\) define a lattice \({\mathscr {L}}={\mathscr {L}}(\mathbf {a}_1, \dots \mathbf {a}_n)\), which is the set of all points X such that \(\mathbf {OX} = k_1\mathbf {a}_1 + k_2\mathbf {a}_2 \dots k_n\mathbf {a}_n\) for some integers \(k_1, k_2, \dots , k_n\). The fundamental parallelepiped \(U_{\mathscr {L}}\) of a lattice \({\mathscr {L}}\) is the set of points X such that \(\mathbf {OX} = \beta _1\mathbf {a}_1 + \beta _2\mathbf {a}_2+ \dots +\beta _n\mathbf {a}_n\), where each \(\beta _i\) is a real number such that \(0 \le \beta _i \le 1\), \(i=1,2,\dots , n\). If the volume \(V(U_{\mathscr {L}})\) of \(U_{\mathscr {L}}\) in n-dimensional space is non-zero, \(1/V(U_{\mathscr {L}})\) has the meaning of the average number of points in \({\mathscr {L}}\) per unit volume.

The length \(h({\mathscr {L}})\) of the shortest vector in \({\mathscr {L}}\) is
$$ h({\mathscr {L}}) := \min _{X \in {\mathscr {L}}} |\mathbf {OX}| = \min _{(k_1, k_2, \dots , k_n) \in {\mathbb Z}^n/0} |k_1\mathbf {a}_1 + k_2\mathbf {a}_2 \dots k_n\mathbf {a}_n|, $$
where the notation \({\mathbb Z}^n/0\) means that the minimum is with respect to all possible integer values of \(k_1, k_2, \dots , k_n\) except for \(k_1=k_2=\dots =k_n=0\). Using points in \({\mathscr {L}}\) as centres, we can locate \(1/V(U_{\mathscr {L}})\) n-dimensional spheres per unit volume, each of radius \(h({\mathscr {L}})/2\), such that the interiors of the spheres do not intersect. After scaling, this allows us to locate \((h({\mathscr {L}})/2)^n/V(U_{\mathscr {L}})\) non-intersecting spheres (per unit volume) of radius 1 each. This motivates the question of finding, for each dimension n, a lattice \({\mathscr {L}}\) with ratio
$$ r({\mathscr {L}}):=\frac{h^n({\mathscr {L}})}{2^n V(U_{\mathscr {L}})} $$
as large as possible. For \(n=3\) (the “usual” three-dimensional space we live in) this question was resolved by Gauss in 1831. Korkine and Zolotareff [228, 229] resolved the \(n=4\) case in 1873, and the \(n=5\) case in 1877, while Blichfeldt [59] resolved the cases \(6 \le n \le 8\) in 1935. However, no further case had been solved for 74 years, until the following theorem was proved in [95].

Theorem 9.13

In dimension \(n=24\), the maximal possible value of \(r({\mathscr {L}})\) is equal to 1.

It is interesting that, after dimensions \(1\le n \le 8\), the next resolved case is \(n=24\). The 24-dimensional lattice \({\mathscr {L}}\) with \(r({\mathscr {L}})=1\) was found by Leech in 1964, and has the name Leech lattice.  The contribution of Theorem 9.13 is the proof that no 24-dimensional lattice \({\mathscr {L}}\) has \(r({\mathscr {L}}) > 1\). Moreover, the authors also proved that the Leech lattice  is the only one with \(r({\mathscr {L}})=1\), up to scaling and isometries.

Theorem 9.13 implies that sphere packing using the Leech lattice is the densest possible one among all lattice-based sphere packings in dimension 24. In a later work [96], Cohn, Kumar, Miller, Radchenko, and Viazovska proved that this sphere packing is in fact the densest possible one among all packings in dimension 24, not necessary lattice-based ones.


H. Cohn and A. Kumar, Optimality and uniqueness of the Leech lattice among lattices, Annals of Mathematics 170-3, (2009), 1003–1050. 

9.14 A Waring-Type Theorem for Large Finite Simple Groups

Representing Integers as Sums of Perfect Powers

One of the oldest classical topics in mathematics is representing integers as a sum of some “special” integers. For example, the Greek mathematician Diophantus, who lived in the 3rd century, was interested in representing integers as a sum of perfect squares, e.g. \(1=1^2\), \(2=1^2+1^2\), \(3=1^2+1^2+1^2\), \(4=2^2\), \(5=2^2+1^2\), \(6=2^2+1^2+1^2\), \(7=2^2+1^2+1^2+1^2\), and so on. You can see that we need at least 4 squares to represent 7. Diophantus was interested in the question of whether there exists a positive integer which requires at least 5 squares for such a representation, or if 4 squares always suffice. This question was answered by Lagrange in 1770. His celebrated four squares theorem states that four squares suffice: every positive integer n can be written as \(n=a^2+b^2+c^2+d^2\) for some integers abcd.

In the same year as Lagrange proved his theorem, Edward Waring asked if similar results can be proved for cubes, fourth powers, and so on. That is, does there exists a positive integer \(N_3\) such that every positive integer n can be written as a sum of at most \(N_3\) cubes? More generally, for every k, does there exist an \(N_k\) such that every positive integer n can be written as a sum of at most \(N_k\) k-th powers? This question was answered positively by Hilbert [202] in 1909, and is known as the Hilbert–Waring theorem.

What is the Minimal Number of k -th Powers We Will Need?

It follows from Lagrange’s theorem that the statement “every positive integer n can be written as a sum of at most \(N_2\) squares” holds with \(N_2=4\), and the example of \(n=7\) shows that it does not hold for \(N_2=3\). In other words, 4 is the minimal number of squares sufficient to represent every integer. One may then ask for the minimal number of cubes, 4th powers, and so on. In general, let g(k) be the minimal number of k-th powers sufficient to represent every positive integer.

By 1912, Wieferich and Kempner [217, 403] had shown that every integer is the sum of 9 cubes. Because 23 cannot be represented as a sum of 8 cubes, this proves that \(g(3)=9\). Later, mathematicians proved that \(g(4)=19\), \(g(5)=37\), and so on. In fact, it is now known that \(g(k)=2^k+[(3/2)^k]-2\) for all values of k, except for possibly finitely many exceptions. Here, \([(3/2)^k]\) denotes the largest integer less than \((3/2)^k\).

While the representation of 23 requires 9 cubes, Linnik [246] proved in 1943 that all \(n>454\) can be represented as a sum of at most 7 cubes. Also, while \(g(4)=19\), it is known that all \(n>13792\) are the sums of only 16 4th powers. The question “for given k, what is the minimal number of k-th powers required to represent any sufficiently large n” remains an active area of research today.

Representing Rotations as a Composition of Some “Special” Rotations

Similar questions of the form “Can we represent an object using some “special” objects?” can be asked not only about integers, but in many areas of mathematics. In geometry, one may study rotations  of the plane around some fixed center O by some arbitrary angle \(\alpha \). If we perform any such rotation \(R'\), and then another rotation \(R''\), the result is again a rotation, which we denote by \(R'' \circ R'\), and call the composition  of \(R'\) and \(R''\). Let S be a set of rotations for which \(\alpha \) has n degrees for some integer n. Let us call “perfect squares” some special rotations from S, which can be written as \(R \circ R\) for some \(R \in S\). For example, if \(R_1\) is a rotation clockwise by angle \(1^{\circ }\), then \(R_2 = R_1 \circ R_1\) is a rotation clockwise by angle \(2^{\circ }\), and, by definition, \(R_2\) is a perfect square. Now, by analogy with Lagrange’s theorem,  one may ask if any rotation \(R \in S\) can be represented as a composition of such “perfect squares”. In fact, the answer is “no”. One can easily check that all “perfect squares” rotate the plane by an even number of degrees, and so do their compositions, hence any rotation by an odd number of degrees, such as \(R_1\), cannot be represented as such a composition.

Representing Permutations as a Composition of Some “Special” Permutations

As another example, let us consider functions from some finite set S to itself. If S has n elements, we can enumerate them, and write S as \(\{1,2,\dots , n\}\). Then any function \(f:S\rightarrow S\) can be described by listing its values: \(f=(f(1), f(2), \dots , f(n))\). For example, if \(n=3\), \(S=\{1,2,3\}\), then the function \(f(x)=1\) (constant function) is written as (1, 1, 1), while the function \(f(x)=x\) is written as (1, 2, 3). If all f(i), \(i=1,2,\dots , n\), are different, then f is called a permutation.  For example, (1, 1, 1) is not a permutation, while (1, 2, 3) is. In general, let \(S_n\) be the set of all permutations \(g:\{1,2,\dots , n\} \rightarrow \{1,2,\dots , n\}\).

The composition \(g \circ f\) of any functions f and g is the function h such that \(h(x)=g(f(x))\) for all x, and one can easily prove that the composition of any two permutations is again a permutation. Let us call a function \(g \in S_n\) a “perfect square” if \(g = f \circ f\) for some \(f \in S_n\). Can any \(h \in S_n\) be written as a composition of perfect squares? It turns out, not. For \(n=3\), there are exactly 6 permutations: \(a=(2,1,3), b=(1,3,2), c=(2,3,1), d=(3,1,2), e=(1,2,3)\), and \(f=(3,2,1)\). One can check that \(a \circ a = b \circ b = e \circ e = f \circ f = e\) while \(c \circ c = d\) and \(d \circ d = c\), see Fig. 9.14, hence the perfect squares are ec and d. Next, \(e \circ c = c \circ e = c\), \(e \circ d = d \circ e = d\), and \(c \circ d = d \circ c = e\), hence the composition of any two perfect squares is again a perfect square, and any permutation outside the set \(\{e,c, d\}\) cannot be represented in this way.
Fig. 9.14

Illustration for \(c \circ c = d\) and \(d \circ d = c\) in \(S_3\)

Even Permutations

In general, a permutation \(f \in S_n\) is called even  if the number of pairs (ij) such that \(i<j\) but \(f(i)>f(j)\) is even. In other words, a permutation is even if it exchanges the order of an even number of pairs (ij). For example, the permutation d in Fig. 9.14, sending (1, 2, 3) to (3, 1, 2), exchanges the order in the pair (2, 3) (2 was on the left of 3 before permutation, but on the right of 3 after permutation), and in the pair (1, 3), but does not change the order in the pair (1, 2) (1 was on the left of 2 before permutation, and stays on the left of 2 after permutation). Hence, the total number of pairs with exchanged order is 2, an even number, and this permutation is an even permutation.

The set of all even permutations is usually denoted by \(A_n\). One can prove that all perfect squares always belong to \(A_n\), and so do all their compositions. Hence, there is no hope of representing every permutation \(f \in S_n\) as a composition of perfect squares. However, one may ask if at least every even permutation \(f \in A_n\) is representable in this way, and if so, how many perfect squares we would need for such a representation. The same question can be asked for cubes, and, more generally, for k-th powers for arbitrary k.

Groups, Subgroups, and Simple Groups

In the above examples, we considered integers, rotations of the plane, and permutations. All these are example of groups.  A group is an arbitrary set G together with an operation \(\cdot \) such that (i) \(a\cdot b \in G\) for all \(a, b \in G\); (ii) \((a\cdot b)\cdot c = a\cdot (b\cdot c)\) for all \(a,b, c \in G\); (iii) there exists an \(e \in G\) (called the identity element of G) such that \(a\cdot e = e\cdot a = a\) for all \(a\in G\); and (iv) for every \(a\in G\), there exists an element \(a^{-1}\in G\) (called the inverse  of a), such that \(a\cdot a^{-1} = a^{-1}\cdot a = e\). The set of integers form a group (usually denoted by \({\mathbb Z}\)) with the addition operation \(+\), while rotations and permutations form a group with the composition operation \(\circ \).

A subset H of a group G is called a subgroup  of G if (i) \(a\cdot b \in H\) for all \(a, b \in H\); (ii) \(e \in H\), where e is the identity element of G; and (iii) \(a^{-1}\in H\) for every \(a\in H\). For example, the set of all even integers is a subgroup of \({\mathbb Z}\), while \(A_n\) is a subgroup of \(S_n\). A subgroup H of a group G is called trivial  if either \(H=G\), or \(H=\{e\}\), and non-trivial otherwise. A subgroup H of a group G is called normal  if \(g\cdot a\cdot g^{-1}\in H\) for any \(a \in H\) and \(g \in G\). One can check that \(A_n\) is a normal subgroup of \(S_n\).

A group G is called simple  if it does not have any non-trivial normal subgroups. For example, the group \(S_n\) is not simple, because it has a non-trivial normal subgroup \(A_n\). However, it turns out that, for \(n \ge 5\), \(A_n\) has no non-trivial normal subgroups, hence it is a simple group.

Representing Group Elements as a Composition of Some “Special” Elements

A square  in a group G is any element \(a \in G\) which can be written as \(a=b \cdot b\) for some \(b \in G\). Similarly, \(a \in G\) is called a k-th power,  if \(a=b \cdot b \cdot \dots \cdot b\) (k times) for some \(b \in G\). One may ask if every element \(a \in G\) can be written as a composition of squares, or, more generally, k-th powers. In general, the answer is “no”, because all squares (or k-th powers) can belong to some normal subgroup H of G, and so are all their compositions. This motivates us to study the same question for the case when G is a simple group. It turns out that in this case the answer is “yes”, and one may then look for a minimal number of squares (or k-th powers) needed for such a representation.

In fact, squares and k-th powers are just special cases of the general notion of group words. A word  w is any finite string of symbols, possibly with repetitions and with inverse symbol, like aaa or \(aabbbbc^{-1}a^{-1}a^{-1}c\). Let w be a word with d different symbols \(s_1, s_2, \dots , s_d\), G be a group, and \(g_1, g_2, \dots , g_d \in G\) be any d elements of G. Then we write \(w(g_1, g_2, \dots , g_d)\) to be the result of (i) substitution of \(g_1, g_2, \dots , g_d\) into w instead of \(s_1, s_2, \dots , s_d\), respectively, and (ii) performing the group operation. Let w(G) denote the set of all elements \(g \in G\) representable in the form \(g = w(g_1, g_2, \dots , g_d)\) for some \(g_1, g_2, \dots , g_d \in G\). For example, the set of all squares in G is just w(G) for \(w=aa\), while the set of all k-th powers is w(G) for \(w=aa\dots a\) (k times).

One can then ask if every element of a simple group G can be represented as a composition of elements from w(G), and if so, how many elements from w(G) we need for this. The following theorem, proved in [345], states that, for sufficiently large (but finite) G, every \(g \in G\) is in fact a composition of just three elements from w(G)!

Theorem 9.14

Let w be any non-empty word. Then there exists a positive integer N, depending only on w, such that for every finite simple group G with \(|G| \ge N\), every element \(g \in G\) can be represented as \(g=g_1 \cdot g_2 \cdot g_3\), where \(g_i \in w(G)\), \(i=1,2,3\).


A. Shalev, Word maps, conjugacy classes, and a noncommutative Waring-type theorem, Annals of Mathematics 170-3, (2009), 1383–1416. 


  1. 1.

    In fact, Theorem 9.4 works only for “sufficiently large” n, and there is no warranty that it works for \(n=10{,}000\). However, we think that this calculation is still useful for the purpose of illustration.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of LeicesterLeicesterUK

Personalised recommendations