Skip to main content

The First Mystery: Interference and Superpositions

  • Chapter
  • First Online:
  • 3270 Accesses

Abstract

Using the example of the spin, we explain the superposition principle and how the quantum formalism predicts its effects. We will see that the formalism introduces “measurements” as a basic notion. We then show that it is not easy to understand the formalism as describing what is going on outside of laboratories.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The first four sections of this chapter draw heavily on David Albert’s book Quantum Mechanics and Experience. We emphasize that the “experiments” here are meant to illustrate the theory rather than real experiments. The latter are generally carried out with photons, whose polarization plays a role similar to the spin here. But all the experiments described below correspond to what quantum mechanics predicts.

  2. 2.

    In Fig. 2.2 left, we put one hole to the right instead of downwards, because the particles exiting through that hole go into the box on the right of the figure.

  3. 3.

    This is discussed mathematically in more detail in Appendix 2.C. Bohr explained the “complementary, or reciprocal, mode of description” by emphasizing [71] “the relative meaning of every concept, or rather of every word, the meaning depending upon our arbitrary choice of view point, but also that we must, in general, be prepared to accept the fact that a complete elucidation of one and the same object may require diverse points of view which defy a unique description. Indeed, strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application.” The reader may be forgiven for not understanding exactly what this means. We try here to give a plausible interpretation of that idea. See [181] for a discussion of different interpretations of Bohr’s thinking. We will return to a discussion of Bohr’s views, in relation to his debate with Einstein in Sect. 7.1.

  4. 4.

    Bohr did not use spin as an example, but rather the descriptions in terms of wave and particle, which we will discuss in Appendix 2.E.

  5. 5.

    See Greenberger [239] for a more detailed discussion.

  6. 6.

    See [507, 509] for the theoretical proposal of such experiments by Wheeler , and [280] for experimental realizations.

  7. 7.

    This will be clarified in Sect. 5.1.2. See also [38], where Bell discusses the delayed-choice experiment from the viewpoint of the de Broglie–Bohm theory.

  8. 8.

    See [173] for the theory and [300] for experiments.

  9. 9.

    If one can modify the experiment so that a fraction p of particles follow the path \(2\downarrow \) and a fraction \(1-p\) follow the path \(2\uparrow \), then one can “save” a fraction \((1-p)/2\) of the bombs in one operation and, repeating this many times, one can eventually identify a fraction \((1-p)/(1+p)= \sum _{n=1}^\infty [(1-p)/2]^n\) of the active bombs, which is as close to 1 as one wants, for p small.

  10. 10.

    For an elementary introduction to the quantum formalism, see also Susskind and Friedman [467].

  11. 11.

    Or, to be precise, the Dirac or the Pauli equations in order to deal with spin.

  12. 12.

    Since \(\Psi (x,t)\) is in general a complex number, \(|\Psi (x,t)|^2 = \Psi (x,t)^*\Psi (x,t)\), where \(z^*\) is the complex conjugate of \(z \in {{\mathbb {C}}}\).

  13. 13.

    The support of a function is the closure of the set on which it is nonzero.

  14. 14.

    The same is true if one puts an active bomb (as opposed to a dud, which is the same as not putting anything), along the path \(2\downarrow \). Depending on whether the bomb explodes or not, one knows which path the particle took.

  15. 15.

    This section is in part an extension of [81].

  16. 16.

    The latter are discussed in Sect. 3.4.3.

  17. 17.

    In [43], Bell considers six “possible worlds of quantum mechanics”, i.e., six possible reactions with respect to the problems discussed here: the pragmatic attitude, the Bohr approach, introducing the mind into physics, the many-worlds interpretation, spontaneous collapse theories, and the de Broglie–Bohm theory. The last three theories pertain to what we call the fourth reaction and will be discussed in Chaps. 5 and 6. The pragmatic attitude and the Bohr approach both exemplify what we call the first reaction, while introducing the mind into physics could be considered as part of the fourth reaction, but we will not discuss it beyond a few words in Chap. 3.

  18. 18.

    As se saw in Sect. 2.2, we get 100 % \(1 \downarrow \) if we measure the spin after the black arrow in direction 1 when both paths are open, and 25 % \(1 \uparrow \), 25 % \(1 \downarrow \), when path \(2\downarrow \) is blocked.

  19. 19.

    This would make the collapse rule somewhat analogous to the phenomenon of entropy increase in statistical mechanics . We will discuss this analogy further in Sect. 5.1.7.

  20. 20.

    We will see in Sect. 5.1.6 that this is, in a sense, what happens in the de Broglie–Bohm theory; but it only works because, in that theory, one has a more complete description of the quantum system than the one in ordinary quantum mechanics.

  21. 21.

    See Maudlin [325] for a detailed discussion.

  22. 22.

    Of course, there are macroscopic phenomena like superconductivity or superfluidity whose explanations appeal directly to quantum mechanics, but these are different from the examples of the cat or the pointer .

  23. 23.

    This should not be confused with the familiar “mind–body” problem: how can the material body produce mental states, and in particular, conscious ones (see [345] for a good explanation of the problem)? Even those, like Colin McGinn [329], who regard the link between the body or the brain and the qualitative aspects of consciousness (e.g., pain) as being an unsolvable mystery, given the limitations of the human mind, admit that they can be caused by, or at least correlated with, physical events in the brain. But here, when we consider the possibility that the mind collapses the quantum state, we are envisaging a direct action of the mind on matter, and this entails a radical form of dualism, since the mind would then act totally independently of the brain!.

  24. 24.

    See the double-slit experiment described in Appendix 2.E for another example of interference between states of one particle.

  25. 25.

    This notion of decoherence will be explained in more detail in Sect. 5.1.6 and Appendix 5.E.

  26. 26.

    These quantities are called observables and are represented mathematically by matrices or operators acting on the quantum states. The eigenvalues of these operators are the possible results of the measurement of these observables, but we do not need these notions here. They are explained in Appendix 2.B.

  27. 27.

    We refer to Appendices 2.B, 2.C and 2. F for a more precise formulation of this theorem in terms of “observables ” and “operators ”, and for a definition of the sets \(\mathcal {O}\). The original version of this theorem is due to Bell [36] and to Kochen and Specker [291] for the first part (the proofs were based on a theorem of Gleason [215]), and to Clifton [98] for the second part. The version given here is simpler than the original ones and is due to Mermin (see [335] and reference therein) and Perez [392, 393] for the first part and to Myrvold for the second [344]. The proofs are given in Appendix 2.F. We will discuss another no hidden variables theorem in connection with nonlocality in Chap. 4. We will discuss other variants of this theorem in Sect. 5.3.4 and in Sect. 6.3 and Appendix 6.C. We will also discuss the famous but misleading no hidden variables theorem due to von Neumann in Sect. 7.4.

  28. 28.

    For an introduction to differential equations, see, e.g., [18, 267].

  29. 29.

    See, e.g., [267, Chap. 7] for more details.

  30. 30.

    For a discussion close to our point of view, see [152, Chap. 7].

  31. 31.

    Operators are linear functions that map “ordinary” functions into other functions. The space of functions on which they act is infinite dimensional. We will not give a rigorous or detailed treatment of these operators ; see, e.g., [152, Chaps. 13–15] or [412, Chaps. 7 and 8] for such a treatment.

  32. 32.

    See, for example, [412, 413] and [152, Chap.14].

  33. 33.

    See, for example, Dym and McKean [155] for the properties of Fourier series and integrals used here.

  34. 34.

    This space, denoted \(L^2([0,2\pi ],dx)\), is a Hilbert space and the family \((e^{inx}/\sqrt{2\pi })^{+\infty }_{n=-\infty }\) is a Hilbert basis, but we will not need any detailed property of such spaces in this book. The basis here is orthonormal , which will be implicit when we use the word basis.

  35. 35.

    See the definition of variance in Appendix 2.C, (2.C.1.1)–(2.C.1.3).

  36. 36.

    Ignoring here the issue of symmetry or antisymmetry, for bosons and fermions.

  37. 37.

    We follow here [268, Sect. 8.6] and proceed informally; for a more rigorous treatment, see [152, pp. 306–310].

  38. 38.

    That last formula comes from:

    $$ \Vert \text {state (t)}\rangle \Vert ^2= \langle \text {state (t)} |\text {state (t)}\rangle = \sum _{n, m=1}^N c^*_n(t) c_m(t)\langle e_n |e_m\rangle = \sum _{n=1}^N |c_n (t)|^2, $$

    since, by orthonormality of the basis vectors, \(\langle e_n |e_m\rangle = 0\) if \(n\ne m\) and equals 1 if \(n=m\).

  39. 39.

    This is automatic if we assume that A is self-adjoint. For matrices, this means that its matrix elements satisfy \(A_{ij} = A^*_{ji}\).

  40. 40.

    There is a more general notion associated with measurements, namely, the positive operator-valued measure (POVM), discussed further in [147] and [152, Chap. 12].

  41. 41.

    These are the usual Pauli matrices : \(\sigma _1 = \sigma _z\), \(\sigma _2 =\sigma _x\), while \(\sigma _y= \left( \begin{array}{cc}0 &{} -i \\ i &{} 0\end{array}\right) .\)

  42. 42.

    This fact is intuitively understandable since a set of functions defined on \(\mathbb {R}\) cannot be characterized by a finite set of parameters, which would be the case if the space was finite dimensional (the parameters would be the coefficients of the expansion of a function in a basis of the space).

  43. 43.

    The extension to more dimensions or more particles is straightforward: for M particles in a physical space of k dimensions, the wave functions are functions \(\Psi : \mathbb {R}^N \rightarrow {{\mathbb {C}}}\), where \(N=kM\), and the integrals are over \(\mathbb {R}^N\).

  44. 44.

    We proceed formally here; see, e.g., [152, Chap. 15] or [412] for more details on the definition of operators.

  45. 45.

    In the concrete example (2.A.2.15), (2.A.2.16), we get the result \(k^2/2m\) with probability \(|c_k|^2\) and, after the measurement, the wave function becomes \(e^{ikx}/\sqrt{2\pi }\) (the factor of \(1/\sqrt{2\pi }\) coming from the requirement that \(\int ^{2\pi }_0 |\Psi (x,t)|^2 dx = 1\) at all times).

  46. 46.

    These are not real functions but can be thought as limits of functions whose integrals are always equal to one and that tend to 0 for all \(x\ne q\), for example the sequence \(f_n(x)= {\sqrt{\frac{n}{2\pi }}\exp (-\frac{n(x-q)^2}{2})}\), as \(n \rightarrow \infty \). In that limit, the function becomes more and more concentrated on \(x=q\), and tends to 0 elsewhere. This explains Eq. (2.B.7) below.

  47. 47.

    The factor \( 1/(2\pi )^{1/2}\) plays the role of a normalization factor.

  48. 48.

    The set \(\mathbb {R}\) is called the spectrum of the operators Q and P and is also called “continuous”. See, e.g., [152, Chap. 15] or [412] for more details on the spectra of operators.

  49. 49.

    The first real derivation is due to Kennard [287], see, e.g., [266] for the history of the uncertainty principle.

  50. 50.

    Typically, some of these eigenvalues will be degenerate for A or B (or both).

  51. 51.

    For a proof, see, e.g., [236, Sect. 24] or [447, Chap. 9].

  52. 52.

    Note that, for states that are eigenstates of A or of B, both sides of (2.C.2.3) vanish [in contrast to (2.C.1.4)], but the impossibility of a simultaneous measurement of A and B holds nevertheless.

  53. 53.

    We follow here Bell [49, p. 130]. See also Bohm and Hiley [70, Chap. 6].

  54. 54.

    As Mermin suggests [335, pp. 811], if the eigenvalues of the matrices were all 0 or 1 (unlike the situation here, but one could easily adapt the argument), then measuring the “observable ” \(A+2B+4AB\) alone would give the values of all three quantities, A, B, and AB, and they would have to satisfy \(v(AB)=v(A)v(B)\).

  55. 55.

    This is discussed by Bell [49, pp. 8–9] and Mermin [335, pp. 811–812].

  56. 56.

    To avoid creating some later confusion in the reader’s mind, we should already mention here that the de Broglie-Bohm theory, discussed in Chap. 5, is, in some sense, a “contextual” hidden variables theory. This is explained in Sects. 5.1.4, 5.1.5 and 5.3.4. But in that theory, one does not introduce the hidden variables ruled out by the no hidden variables theorems (otherwise the theory would be inconsistent!).

  57. 57.

    This proof is the only place in this book where we use operators that act on an infinite-dimensional space of functions and that are not simply reduced to matrices . See, e.g., [152, Chaps. 13–15] or [412, Chaps. 7 and 8] for a rigorous treatment of operators .

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Bricmont .

Appendices

Appendices

2.A The Wave Function and the Schrödinger Equation

In this appendix, we will describe some of the mathematical properties of Schrödinger’s equation , without discussing in detail its physical meaning, something already done in the main text of this chapter and in Chaps. 4 and 5.

2.1.1 2.A.1 Linear Differential Equations

Let us start with the simplest differential equationFootnote 28:

$$\begin{aligned} \frac{dz(t)}{dt} = a z(t)\;, \end{aligned}$$
(2.A.1.1)

where \(t\in \mathbb {R}\), \(z : \mathbb {R}\rightarrow \mathbb {R}\), and \(a\in \mathbb {R}\). By definition, a solution of this equation is a function satisfying it for all t. It is easy to see that all solutions are of the form

$$\begin{aligned} z(t) = A e^{at}\;, \end{aligned}$$
(2.A.1.2)

for some constant A.

We obtain a unique solution if we fix some initial condition, that is, if we fix the value of z(t) at a given time t. To simplify the notation, let \(t=0\) and let us look for a solution such that \(z(0) = z_0\). Then we obtain a unique solution:

$$\begin{aligned} z(t) = z_0 e^{at}\;. \end{aligned}$$
(2.A.1.3)

In this simple example, we see that (2.A.1.1) has a class of solutions (2.A.1.2) and a unique solution (2.A.1.3) once an initial condition is fixed. This is true for more general equations of the type

$$\begin{aligned} \frac{dz(t)}{dt} = f( z(t))\;, \end{aligned}$$
(2.A.1.4)

for fairly general functions \(f : \mathbb {R}\rightarrow \mathbb {R}\), at least for short intervals of timeFootnote 29 (but we will not use those more general equations here).

Equation (2.A.1.1) is said to be linear because, if \(z_1(t)\) and \(z_2(t)\) are solutions of (2.A.1.1), then the function \(z(t)=c_1 z_1 (t) + c_2 z_2(t)\), with \(c_1, c_2 \in \mathbb {R}\), is also a solution.

We now generalize this simple example. First, we could replace z(t) by a complex-valued function: \(z : \mathbb {R}\rightarrow \mathbb {C}\), with \(a \in \mathbb {C}\) in (2.A.1.1). Nothing changes except that A and \(z_0\) in (2.A.1.2) and (2.A.1.3) are also complex.

Next, we replace z(t) by an n-component complex vector:

$$ {\mathbf{z}}: \mathbb {R}\rightarrow \mathbb {C}^n\;,\quad {\mathbf{z}}(t) = \left( \begin{array}{c} z_1(t) \\ \vdots \\ z_n(t)\end{array}\right) \;,\quad z_i(t)\in \mathbb {C}\;,\quad i =1, \ldots , n\,. $$

Equation (2.A.1.1) is replaced by

$$\begin{aligned} \frac{d{\mathbf{z}}(t)}{dt} = \mathcal {A} {\mathbf{z}}(t)\;, \end{aligned}$$
(2.A.1.5)

where \(\mathcal {A} \) is an \(n \times n\) complex matrix . The general solution is of the form

$$\begin{aligned} {\mathbf{z}}(t) = e^{\mathcal {A} t} \mathbf A \;, \end{aligned}$$
(2.A.1.6)

where \(\mathbf A \in \mathbb {C}^n\),

$$\begin{aligned} e^{\mathcal {A} t} = \sum ^\infty _{n=0} \frac{\mathcal {A}^n t^n }{n!}\;, \end{aligned}$$
(2.A.1.7)

and \(\mathcal {A}^n\) denotes the \(n\,\)th product of \(\mathcal {A}\) with itself. Equation (2.A.1.5) is again linear .

If we fix an initial condition \({\mathbf{z}}(0) = {\mathbf{z}}_0 \in \mathbb {C}^n\), we get a unique solution:

$$\begin{aligned} {\mathbf{z}}(t) = e^{\mathcal {A}t} {\mathbf{z}}_0\;. \end{aligned}$$
(2.A.1.8)

When \(\mathcal {A}\) possesses a basis of eigenvectors , i.e.,

$$\begin{aligned} \mathcal {A} {\mathbf{e}}_i = \lambda _i {\mathbf{e}}_i\;, \end{aligned}$$
(2.A.1.9)

where \(({\mathbf{e}}_i)^n_{i=1}\) is an orthonormal basis of \(\mathbb {C}^n\), the solution (2.A.1.8) can be written more explicitly. Indeed, (2.A.1.9) and (2.A.1.7) imply that

$$\begin{aligned} e^{\mathcal {A}t} {\mathbf{e}}_i = e^{\lambda _i t} {\mathbf{e}}_i\;, \end{aligned}$$
(2.A.1.10)

and if we expand \({\mathbf{z}}_0\) in the basis \(({\mathbf{e}}_i)^n_{i=1}\), i.e.,

$$\begin{aligned} {\mathbf{z}}_0 = \sum ^n_{i=1} c_i {\mathbf{e}}_i\;, \end{aligned}$$
(2.A.1.11)

where

$$\begin{aligned} c_i = \langle {\mathbf{e}}_i | {\mathbf{z}}_0 \rangle \;, \end{aligned}$$
(2.A.1.12)

and \(\langle \cdot |\cdot \rangle \) is the scalar product in \(\mathbb {C}^n (\langle z_1\vert z_2\rangle \equiv \sum ^N_{n=1} z^*_{1n} z_{2n})\), we find by linearity that (2.A.1.8) can be written as

$$\begin{aligned} {\mathbf{z}}(t) = \sum ^n_{i=1} c_i e^{\lambda _i t} {\mathbf{e}}_i\;. \end{aligned}$$
(2.A.1.13)

So the “recipe” for solving (2.A.1.5) is to solve the eigenvalue /eigenvector problem for \(\mathcal {A}\) (assuming that \(\mathcal {A}\) has a basis of eigenvectors), compute the coefficients using (2.A.1.12), and insert the result in (2.A.1.13).

2.1.2 2.A.2 The Schrödinger Equation

Let us start with the equation for one particle in three-dimensional space:

$$\begin{aligned} i \hbar \frac{d}{dt} \Psi (x,t)=H\Psi (x,t)\;, \end{aligned}$$
(2.A.2.1)

where \(t \in \mathbb {R}\), \(x\in \mathbb {R}^3\), and \(\hbar = h/2\pi \), with h the Planck constant. The unknown here is \(\Psi \), which is a complex-valued function of x and t.

One can think of \(\Psi \) as playing the role of \(\mathbf z \) in (2.A.1.5), with the index \(i=1,\ldots ,n\) being replaced by a continuous variable x. The factor \(i=\sqrt{-1}\), while essential for the physics of (2.A.2.1), does not make much difference at this stage with respect to (2.A.1.5), since \(\Psi \), like \(\mathbf z \), is complex anyway. H plays the role of \(\mathcal {A}\) in (2.A.1.5) and is a linear operator : it transforms a given function \(\Psi (x,t)\) into a new function \((H\Psi )(x,t)\) and does it in a linear way:

$$\begin{aligned} H(\alpha \psi _1 + \beta \psi _2) = \alpha H \psi _1 + \beta H \psi _2\;, \end{aligned}$$
(2.A.2.2)

which implies that a linear combination of solutions of (2.A.2.1) is again a solution of (2.A.2.1).

The detailed form of H or the justification of (2.A.2.1) will not matter very much and they can be found in any textbook on quantum mechanics,Footnote 30 but for one particle of mass m moving in \( \mathbb {R}^3\), the operator H has the form

$$\begin{aligned} H = \frac{\hbar ^2}{2m} \left( -\frac{d^2}{dx^2_1} - \frac{d^2}{dx^2_2}-\frac{d^2}{dx^2_3}\right) + V(x)\;, \end{aligned}$$
(2.A.2.3)

where \(x = (x_1,x_2,x_3)\) and V(x) is simply the classical potential [so that the force F(x) in classical mechanics equals \(F(x)=- \nabla V(x)\), \( \nabla \) denoting the gradient]. The first term, viz.,

$$ \frac{\hbar ^2}{2m} \left( -\frac{d^2}{dx^2_1} - \frac{d^2}{dx^2_2}-\frac{d^2}{dx^2_3}\right) \;, $$

is the kinetic energy term. To simplify notation, we will often consider the situation in one spatial dimension, where H is given by:

$$\begin{aligned} H = -\frac{\hbar ^2}{2m} \frac{d^2}{dx^2} + V(x)\;, \end{aligned}$$
(2.A.2.4)

with \(x\in \mathbb {R}\). Classically, the Hamiltonian is (again, in one dimension)

$$ H=\frac{p^2}{2m}+ V(x)\;, $$

and corresponds to the energy of an isolated system. The quantum version (2.A.2.4) is obtained by replacing the momentum variable p (equal to the mass times the velocity) in the classical Hamiltonian by the operatorFootnote 31 \(P=-i\hbar d/dx\) and the variable x by the operator Q of multiplication by x [and hence V(x) by the operator of multiplication by V(x)]. We will not justify this replacement now, but we will explain why the statistical distribution of results of measurements of momenta is related to the operator P in Appendices 2.A.3 and 2.B. For the variable x, we have already said that \(|\Psi (x,t)|^2\) is the probability density of the results of position measurements.

In the rest of these appendices, we will choose units in which \(\hbar =1\). Then, given an initial condition \(\Psi (x,0)=\Psi _0(x)\), the solution of (2.A.2.1) is (remember that \(1/i=-i\)):

$$\begin{aligned} \Psi (x,t)=(e^{-iHt} \Psi _0) (x)\;, \end{aligned}$$
(2.A.2.5)

where the operator \(e^{-iHt}\) can be defined through a power series as in (2.A.1.7) when the series converge, and in more subtle ways otherwise.Footnote 32 We will give concrete examples of what this solution looks like below, and also in Appendix 2.D and Chap. 5.

An important property of (2.A.2.5) is

$$\begin{aligned} \int _{\mathbb {R}^3} |\Psi (x,t)|^2 dx = \int _{\mathbb {R}^3} |\Psi _0 (x) |^2 dx\;, \end{aligned}$$
(2.A.2.6)

for all t, which allows us to consider \(|\Psi (x,t)|^2\) as the probability density of finding the particle at x if one measures its position at time t, provided one normalizes \( \int _{\mathbb {R}^3} |\Psi _0 (x) |^2 dx=1\).

What about H having a basis of eigenvectors ? For that to make sense, we have to define a space of functions \(\Psi \) and explain what a basis in that space means, but a simple example is provided by Fourier series .Footnote 33 Let f(x), \(x\in \mathbb {R}\), be a complex-valued integrable periodic function of period \(2\pi \,\):

$$\begin{aligned} f(x+2\pi ) = f(x)\;, \quad \forall x \in \mathbb {R}\;. \end{aligned}$$
(2.A.2.7)

Then f(x) can be written as

$$\begin{aligned} f(x) = \sum ^{+\infty }_{n=-\infty } c_n \frac{e^{inx}}{\sqrt{2\pi }}\;, \end{aligned}$$
(2.A.2.8)

with

$$\begin{aligned} c_n = \frac{1}{\sqrt{2\pi }} \int ^{2\pi }_0 f(x) e^{-inx}dx\;, \end{aligned}$$
(2.A.2.9)

at least when the series converge, which happens, in different senses, given certain properties of f(x). If f(x) is square integrable over \([0,2\pi ]\), i.e., \(\int ^{2\pi }_0 |f(x)|^2 dx < \infty \), then

$$\begin{aligned} \lim _{N\rightarrow \infty } \int ^{2\pi }_0 \left| f(x)-\sum ^{n= N}_{n=-N} c_n \frac{e^{inx}}{\sqrt{2\pi }}\right| ^2 dx = 0 \end{aligned}$$
(2.A.2.10)

and

$$\begin{aligned} \sum ^{n= \infty }_{n=-\infty } |c_n|^2 < \infty \;, \end{aligned}$$
(2.A.2.11)

which means, by definition, that the family of functions \((e^{inx}/\sqrt{2\pi })^{+\infty }_{n=-\infty }\) is a basis of the spaceFootnote 34 of square integrable functions over \([0,2\pi ]\).

These relations are similar to those in spaces of N dimensions (with \(N<\infty \)), the main difference being that in (2.A.2.8), (2.A.2.10), and (2.A.2.11) one has to take a limit \(N\rightarrow \infty \) and not simply write algebraic identities.

Now if H has a basis of eigenvectors , viz.,

$$\begin{aligned} H |e_n(x)\rangle \ = \lambda _n |e_n(x)\rangle \;, \end{aligned}$$
(2.A.2.12)

with \(n\in \mathbb {N}\) (in general, the family of eigenvectors will be infinite but countable, as in the example of the Fourier series , so that it can be indexed by \(\mathbb {N}\)), then one can apply the same recipe that led to (2.A.1.13). We thus write

$$\begin{aligned} \Psi (x, 0)= \Psi _0 (x) = \sum ^\infty _{n=0} c_n |e_n(x)\rangle \;, \end{aligned}$$
(2.A.2.13)

and the solution of (2.A.2.1) is

$$\begin{aligned} \Psi (x,t)=\sum ^\infty _{n=0} c_n \exp (-i\lambda _n t) |e_n(x)\rangle \;. \end{aligned}$$
(2.A.2.14)

Since \(|e^{-i\lambda _n t}|=1\), one can show that the solution converges for all times, provided that we have \(\int _{\mathbb {R}^3} |\Psi _0 (x)|^2 dx < \infty \) (which implies, as for the Fourier series , \(\sum ^\infty _{n=0} |c_n|^2 < \infty \)).

To illustrate what precedes with a simple example, consider a free particle , i.e., with \(V(x)=0\) in (2.A.2.4), on a circle of radius 1. This means that we take \(\Psi (x,t)\) to be periodic of period \(2\pi \) in \(x \in \mathbb {R}\) [see (2.A.2.7)], for all t. The operator H being given by \(H=-(1/2m)d^2/dx^2\), the eigenvalue /eigenvector problem is easy to solve. We have the following periodic eigenvectors:

$$\begin{aligned} H \frac{e^{inx}}{\sqrt{2\pi }} = - \frac{1}{2m} \frac{d^2}{dx^2} \frac{e^{inx}}{\sqrt{2\pi }} = \frac{1}{2m} n^2 \frac{e^{inx}}{\sqrt{2\pi }}\;, \end{aligned}$$
(2.A.2.15)

and, applying what we just said about Fourier series (2.A.2.7) and using (2.A.2.13) and (2.A.2.14), we get

$$\begin{aligned} \Psi (x,t) = \sum ^{+\infty }_{n=-\infty } c_n \exp \left( -\frac{in^2t}{2m}\right) \frac{e^{inx}}{\sqrt{2\pi }}\;, \end{aligned}$$
(2.A.2.16)

where the coefficients \(c_n\) come from (2.A.2.8) for \(f(x)=\Psi (x,0)\).

Sometimes the operator H does not have a basis of eigenvectors but the Schrödinger equation nevertheless has a more or less explicit solution. One example that we will refer to is given by a free particle [\(V(x)=0\) in (2.A.2.3)] in the whole d-dimensional space \(\mathbb {R}^d\), instead of the circle (but we will set \(d=1\) for simplicity). We want to solve the Schrödinger equation (2.A.2.1), with \(H=-(1/2m)d^2/dx^2\), so we want to solve:

$$\begin{aligned} i\frac{d}{dt}\Psi (x,t) = -\frac{1}{2m}\frac{d^2}{dx^2}\Psi (x,t)\;. \end{aligned}$$
(2.A.2.17)

It is convenient to introduce the Fourier transform of \(\Psi (x,t)\):

$$\begin{aligned} \hat{\Psi }(p, t) = \frac{1}{(2\pi )^{1/2}} \int _{\mathbb {R}} \Psi (x, t) e^{-ip x} dx\;. \end{aligned}$$
(2.A.2.18)

This is an invertible operation (for suitable functions \(\Psi (x,t)\), for example those satisfying \(\int _\mathbb {R}|\Psi (x,t)|^2 dx <\infty )\):

$$\begin{aligned} \Psi (x,t) = \frac{1}{(2\pi )^{1/2}} \int _{\mathbb {R}} \hat{\Psi }(p, t) e^{ip x} dp\;. \end{aligned}$$
(2.A.2.19)

This last formula defines the inverse Fourier transform. Inserting (2.A.2.19) into (2.A.2.17), we see that \(\hat{\Psi }(p, t)\) satisfies the equation

$$\begin{aligned} i\frac{d}{dt} \hat{\Psi }(p, t) = \frac{p^2}{2m} \hat{\Psi }(p, t)\;, \end{aligned}$$
(2.A.2.20)

whose solution is \( \hat{\Psi }(p, t)= \exp (- itp^2/2m) \hat{\Psi }(p, 0)\). So the solution of (2.A.2.17) is

$$\begin{aligned} \Psi (x,t) = \frac{1}{(2\pi )^{1/2}} \int _{\mathbb {R}} \exp \left( - \frac{itp^2}{2m}\right) \hat{\Psi }(p, 0) e^{ip x} dp\;, \end{aligned}$$
(2.A.2.21)

where \(\hat{\Psi }(p, 0)\) is given in terms of the initial wave function by

$$ \hat{\Psi }(p, 0) = \frac{1}{(2\pi )^{1/2}} \int _{\mathbb {R}} \Psi (x, 0) e^{-ip x} dx\;. $$

To see what happens in a concrete example, let us start with a Gaussian wave function, in \(d=1\):

$$ \Psi _0 (x)=\Psi (x, 0)=\frac{1}{\pi ^{1/4}}\exp \left( - \frac{x^2}{2}\right) \;, $$

which is normalized so that \(\int _{\mathbb {R}} |\Psi _0 (x) |^2 dx= 1\). Then one easily computes that

$$ \hat{\Psi }(p, 0)= \frac{1}{\pi ^{1/4}}\exp \left( - \frac{p^2}{2}\right) \;. $$

Inserting this in (2.A.2.21), one gets

$$\begin{aligned} \Psi (x,t) = \frac{1}{(2\pi )^{1/2}} \frac{1}{\pi ^{1/4}} \int _{\mathbb {R}} \exp \left( - \frac{itp^2}{2m}\right) \exp \left( - \frac{p^2}{2}\right) e^{ip x} dp\;, \end{aligned}$$
(2.A.2.22)

and the integral can again be computed to yield

$$\begin{aligned} \Psi (x,t) = \frac{1}{(1+it/m)^{1/2}} \frac{1}{\pi ^{1/4}}\exp \left[ -\frac{x^2}{2 (1+it/m)}\right] \;. \end{aligned}$$
(2.A.2.23)

The important property of \( \Psi (x,t)\) is its spreading:

$$\begin{aligned} | \Psi (x,t)|^2=\frac{1}{\sqrt{\pi \big [1+(t/m)^2\big ]} }\exp \left[ - \frac{x^2}{ 1+(t/m)^2}\right] \;. \end{aligned}$$
(2.A.2.24)

Note that we have \(\int _{\mathbb {R}} | \Psi (x,t)|^2=1\), in conformity with (2.A.2.6).

This means that the varianceFootnote 35 of the Gaussian \(| \Psi (x,t)|^2\) which was equal to 1 / 2 at \(t=0\), becomes equal to \([1+(t/m)^2]/2\) as time goes by. So the Gaussian becomes more and more “flat”, which means that, if \(| \Psi (x,t)|^2\) represents the probability density of finding the particle in some region of space, then that probability becomes less and less localized as time increases, and in a sense more and more “uncertain”.

We will end this appendix with a remark which, although well known, is at the root of the most revolutionary aspect of quantum mechanics, as we will see in Chap. 4.

Suppose we have a system of N particles, each of them in \(\mathbb {R}^3\). Then the wave functionFootnote 36 is a function \(\Psi (x_1,\ldots ,x_{3N},t)\) of \(\mathbb {R}^{3N} \times \mathbb {R}\) with values in \(\mathbb {C}\). It still satisfies the Schrödinger equation (2.A.2.1), but H now has the form

$$\begin{aligned} H = -\frac{1}{2} \sum ^N_{i=1} \frac{1}{m_i} \Delta _i + V (x_1,\ldots ,x_N)\;, \end{aligned}$$
(2.A.2.25)

where

$$\begin{aligned} \Delta _i = \frac{d^2}{dx^2_{i_1}} + \frac{d^2}{dx^2_{i_2}} + \frac{d^2}{dx^2_{i_3}}\;, \end{aligned}$$
(2.A.2.26)

\(m_i\) is the mass of the \(i\,\)th particle, and V is again the classical potential . What is “revolutionary” or at least has revolutionary consequences, is that \(\Psi \) is defined on what is called the configuration space of the system, i.e., the set of all possible positions of all the N particles, where N is arbitrary and could in principle include all the particles in the universe .

So there is a sense (although not very precise at this stage) in which all the particles of the universe are linked with one another. What this implies will be clarified in Chaps. 4 and 5.

2.1.3 2.A.3 The Probability Distribution for Results of MomentumMeasurements

We want to show here that the results of measurements of the momentum p (which classically is just the mass times the velocity of the particle) are distributed with a probability density given by \(|\hat{\Psi }(p)|^2\), where \(\hat{\Psi }\) is the Fourier transform of \(\Psi \), defined by (2.A.2.18) (without the time variable).

More precisely,

$$\begin{aligned} \int _A |\hat{\Psi }(p)|^2 dp \end{aligned}$$
(2.A.3.1)

is the probability that the value obtained by a measurement of momentum will belong to \(A \subset {\mathbb {R}}\).

In order to prove (2.A.3.1), we will measure p by measuring x(t) at time t, using \(p=mx(t)/ t\), since p is the mass times the velocity. Since we want the result to be independent of t, we will consider the asymptotic position , which means letting \(t\rightarrow \infty \). We will set \(m=1\) here and consider one dimension for simplicity.

We already know that the probability density of finding \(x(t)=x\), when one measures the position at time t, is given by \(|\Psi (x,t)|^2 \). Then, the probability of the momentum being observed to belong to a subset \(A\subset {\mathbb {R}}\) is \(\int _{At} |\Psi (x,t)|^2 dx\). Now, by a change of variable \(x=pt\), we get

$$\begin{aligned} \int _{At} |\Psi (x,t)|^2 dx= t \int _{A} |\Psi (pt,t)|^2 dp\;. \end{aligned}$$
(2.A.3.2)

Suppose that we have an initial wave function \(\Psi _0(x) = \Psi (x,0)\) supported in a bounded region \(B\subset {\mathbb {R}}\). We will prove that, \(\forall A \subset \mathbb {R}\),

$$\begin{aligned} \lim _{t\rightarrow \infty } t \int _A |\Psi (pt,t)|^2 dp = \int _A |\widehat{\Psi }(p,0)|^2 dp\;. \end{aligned}$$
(2.A.3.3)

Combining with (2.A.3.2), this means that, if we measure the asymptotic position x as \(t \rightarrow \infty \), we will obtain the quantum mechanical predictions (2.A.3.1).

To proveFootnote 37 (2.A.3.3), we consider the free evolution, which should hold for t large, and use (2.A.2.21). Since the inverse Fourier transform of a product of functions is the convolution of their inverse Fourier transforms, divided by \(\sqrt{2\pi }\), we get

$$\begin{aligned} \Psi (x,t) = \left( \frac{1}{2\pi it}\right) ^{1/2} \int _{\mathbb {R}} \exp \left[ \frac{i(x-y)^2}{ 2t}\right] \Psi (y,0) dy\;, \end{aligned}$$
(2.A.3.4)

using the fact that \(\sqrt{1/ it}\exp (ix^2/ 2t)\) is the inverse Fourier transform of \(\exp (-itp^2/ 2)\). Set \(x=pt\) in (2.A.3.4) and write it as

$$\begin{aligned} \Psi (pt,t) = \left( \frac{1}{2\pi it}\right) ^{1/2} \exp \left( \frac{ip^2t}{2}\right) \int _{\mathbb {R}} \exp \left( -ipy+i\frac{y^2}{2t}\right) \Psi (y,0) dy\;. \end{aligned}$$
(2.A.3.5)

Since \(\Psi (y,0)\) vanishes outside a bounded region B, we have, \(\forall y \in B\), \(\displaystyle \lim _{t\rightarrow \infty } \exp (iy^2/ 2t) =1\), which implies

$$\begin{aligned} \lim _{t\rightarrow \infty } \left( \frac{1}{2\pi }\right) ^{1/2} \int _{B} \exp \Bigl (-ipy+ \frac{iy^2}{2t}\Bigr ) \Psi (y,0)dy&= \left( \frac{1}{2\pi }\right) ^{1/2} \int _{\mathbb {R}} \exp \bigl (-ipy\bigr ) \Psi (y,0)dy \\&= \widehat{\Psi }(p,0), \end{aligned}$$
(2.A.3.6)

where, in the last equality, we use the fact that \( \Psi (y,0)\) is supported in B. Obviously,

$$\begin{aligned} \left| \left( \frac{1}{ it}\right) ^{1/2} \exp \left( \frac{ip^2t}{ 2}\right) \right| ^2 = t^{-1}\;. \end{aligned}$$
(2.A.3.7)

Inserting (2.A.3.5)–(2.A.3.7) in the left-hand side of (2.A.3.3) proves (2.A.3.3).

2.B Quantum States, “Observables ” and the “Collapse ” Rule

We have already encountered in Sect. 2.3 the special role of measurements within the quantum formalism. As we saw, we can have two different bases in \(\mathbb {C}^2\), \((|1\uparrow \rangle , |1\downarrow \rangle )\) or \((|2\uparrow \rangle , |2\downarrow \rangle )\), and using (2.3.5)–(2.3.8), we can write any given state in terms of those different bases. A measurement of the spin in direction 1 or 2 is associated with a given basis , and after a measurement, the state collapses onto one vector of the basis, depending on the result. Let us explain now the general quantum formalism.

In quantum mechanics, the space of states is a complex vector space , of finite dimension, \({{\mathbb {C}}}^N\), or of infinite dimension (we will discuss that situation below). The finite dimensional case generalizes the states associated with spin of Sect. 2.3. The state is endowed with a scalar product \(\langle z_1 |z_2 \rangle \equiv \sum _{n=1}^N z^*_{1n} z_{2n}\), where, for \(z\in \mathbb {C}\), \( z^*\) denotes its complex conjugate.

The state is also endowed with a norm associated with that scalar product: \(\Vert z\Vert ^2=\langle z |z \rangle \). The quantum state \(|\text {state (t)}\rangle \) is a vector in that space and evolves in time, when no measurements are made, according to a deterministic equation: a given state at time 0, \(|\text {state (0)}\rangle \), determines a unique state at time t, \(|\text {state (t)}\rangle \), for all times. This evolution is continuous in time and linear, see (2.3.10), (2.3.11). The norm of that vector \(\Vert \text {state (t)}\rangle \Vert \) is constant in time.

In classical physics, one introduces various physical quantities such as angular momentum, energy , etc. (all of which are functions of the positions and the velocities). In quantum mechanics, one associates with each such physical quantity a basis of vectors \((|e_n\rangle )\) of the state space and a set of numbers \((\lambda _n)\), where n runs over \(\{1,\ldots ,N\}\). The choice of these numbers \(\lambda _n\) is conventional. When there is a measurement of the quantity associated with those vectors and numbers at a certain time t, one writes the state as a linear combination of the basis vectors:

$$\begin{aligned} |\text {state (t)}\rangle = \sum _{n=1}^N c_n (t) |e_n\rangle \;, \end{aligned}$$
(2.B.1)

where \(c_n (t)=\langle e_n |\text {state (t)}\rangle \).

The recipe for computing probabilities of results of measurements, which generalizes what we discussed in Sect. 2.3, is that a measurement at time t yields a value \(\lambda _k\) with probability \(|c_k(t)|^2\). Since \(\Vert \text {state (t)}\rangle \Vert ^2= \sum _{n=1}^N |c_n (t)|^2\) is constant in time,Footnote 38 if we normalize \(\Vert \text {state (0)}\rangle \Vert =1\), we have \(\sum _{n=1}^N |c_n (t)|^2=1\) for all times, so that the sum of the probabilities of all the results equals 1.

This assignment of probabilities to results of measurements is called Born’s rule. Moreover, after the measurement, the quantum state collapses to \(|e_k\rangle \). As we explained in Sect. 2.3, that collapse is neither continuous in time, nor deterministic nor linear , contrary to the time evolution when no measurements are made.

To simplify matters, we assume here that each eigenvalue is non-degenerate , i.e., it corresponds to a unique eigenvector . In general, if there are several eigenvectors with the same eigenvalue \(\lambda _k\), the collapsed state is the projection of the original state on the subspace spanned by those eigenvectors, and the probability of occurrence of \(\lambda _k\) is the norm of that projected vector.

A correspondence can be made with the example of the spin measurement by associating \(\lambda =+1\) with the up result and \(\lambda =-1\) with the down result, but other conventions could be chosen.

The more advanced reader may find the above presentation somewhat unusual. Indeed, the standard approach is to associate a matrix with any physical quantity when N is finite, these having a basis of eigenvectors , viz.,

$$\begin{aligned} A |e_n\rangle \ = \lambda _n |e_n\rangle \;, \end{aligned}$$
(2.B.2)

where the \(\lambda _n\) are real.Footnote 39 But this is just a way to repeat what we said above: what matters is the basis of vectors \((|e_n\rangle )\), while the choice of the numbers \(\lambda _n\) as the real eigenvalues of a self-adjoint matrix A is a matter of convenience.Footnote 40

In the spin example, for direction 1, we could introduce the matrixFootnote 41

$$\begin{aligned} \sigma _1 = \left( \begin{array}{cc}1 &{} 0 \\ 0 &{} -1\end{array}\right) \end{aligned}$$
(2.B.3)

and for direction 2 the matrix

$$\begin{aligned} \sigma _2 = \left( \begin{array}{cc}0 &{} 1 \\ 1 &{} 0\end{array}\right) \;. \end{aligned}$$
(2.B.4)

It is easy to check, using the definitions (2.3.1)–(2.3.4), that

$$\begin{aligned} \sigma _1 |1\uparrow \rangle = |1\uparrow \rangle \;,\quad \sigma _1 |1\downarrow \rangle = -|1\downarrow \rangle \;,\quad \sigma _2 |2\uparrow \rangle = |2 \uparrow \rangle \;,\quad \sigma _2 |2\downarrow \rangle = -|2\downarrow \rangle \;, \end{aligned}$$
(2.B.5)

so that our basis vectors are indeed eigenvectors of the corresponding matrices with eigenvalues \(+1\) and \(-1\). But all that we really need conceptually are the eigenvectors and the associated numbers, even though the language of operators is very useful in practice.

Now we must also consider the spaces of wave functions, that are infinite dimensional.Footnote 42 One introduces (let’s say for a physical system consisting of one particle in one dimension)Footnote 43 the space of complex-valued functions \(\Psi : \mathbb {R}\rightarrow {{\mathbb {C}}}\) that are square-integrable: \(\int _{\mathbb {R}} |\Psi (x)|^2 dx <\infty \).

One can define a scalar product on that space: \(\langle \Psi | \Phi \rangle = \int _{\mathbb {R}} \Psi ^* (x ) \Phi (x) dx\), and, therefore, one can also define the notion of orthonormal sets of vectors and a norm associated to the scalar product: \(\Vert \Psi \Vert ^2= \langle \Psi | \Psi \rangle = \int _{\mathbb {R}} |\Psi (x)|^2 dx\).

The wave function is a vector in that space that depends on time, \(\Psi (x, t)\). When no measurements are made, that vector evolves according to a deterministic equation, like Schrödinger’s equation , the evolution is continuous in time and linear . Moreover, \(\Vert \Psi (t) \Vert ^2=\int _{\mathbb {R}} |\Psi (x, t)|^2 dx\) is constant in time, as in (2.A.2.6).

Again, one associates to physical quantities linear operators (see (2.A.2.2)) that act on functions, like matrices act on vectors.Footnote 44 If a quantity is associated to an operator A satisfying (2.B.2), with n running now over \({{\mathbb {N}}}\), we have the same rule as above when one measures A, except that the sum (2.B.1) has to be replaced by a limit, as in (2.A.2.10). We have again \(\int _{\mathbb {R}} |\Psi (x, t)|^2 dx= \sum _{n\in {{\mathbb {N}}}} |c_n(t)|^2\), and if we normalize \(\Vert \Psi (0) \Vert ^2=\int _{\mathbb {R}} |\Psi (x, 0)|^2 dx=1\), we have \(\sum _{n\in {{\mathbb {N}}}} |c_n(t)|^2=1\) for all times.

For example, suppose that we measure the quantity associated with H, defined in (2.A.2.3), (2.A.2.4) (this quantity corresponds classically to the energy ). Suppose also that H has a basis of eigenvectors, see (2.A.2.12), and that the state is of the form (2.A.2.14). Then we get the result \(\lambda _k\) with probability \(|c_k|^2\), and after the measurement the wave function becomes \( |e_k(x)\rangle \).Footnote 45

We will also need, but only in Appendix 2.F, operators that do not have a basis of eigenvectors . We introduced these operators Q and P in Appendix 2.A. The operator Q is called the position operator, and acts as

$$\begin{aligned} Q \Psi (x)=x \Psi (x)\;, \end{aligned}$$
(2.B.6)

and its eigenvectors are formally Dirac delta functions \(\delta ( q-x)\).Footnote 46 We have

$$\begin{aligned} Q \delta ( q-x)= q\delta ( q-x)\;, \end{aligned}$$
(2.B.7)

with eigenvalue q. If we write \(\Psi (x,t)= \int \delta ( q-x) \Psi (q, t) dq\), we can see this as a sort of continuous version of (2.B.1), and the interpretation of \(|c_k|^2\) as the probability of finding the eigenvalue \(\lambda _k\) upon measurement of A, translates here into considering \(|\Psi (q,t)|^2\) as the probability density of finding the particle at q, upon measurement of its position.

The momentum operator P is defined by:

$$\begin{aligned} P \Psi (x)=-i \frac{d}{dx} \Psi (x)\;, \end{aligned}$$
(2.B.8)

and we have the eigenvectorsFootnote 47

$$ \frac{1}{(2\pi )^{1/2}} \exp (ipx)\;, $$

with eigenvalue p. Indeed, one checks that

$$\begin{aligned} P \frac{1}{(2\pi )^{1/2}} \exp (ipx) = p \frac{1}{(2\pi )^{1/2}} \exp (ipx)\;. \end{aligned}$$
(2.B.9)

It we consider the inverse Fourier transform formula (2.A.2.19), we can see it as the continuous version of (2.B.1), with eigenvectors \([1/(2\pi )^{1/2}] \exp (ipx)\), and the interpretation of \(|c_k|^2\) as the probability of finding the eigenvalue \(\lambda _k\) upon measurement of A, translates here into considering \(|\hat{\Psi }(p,t)|^2\) as the probability density of finding the value p upon measurement of its momentum , see (2.A.3.1), derived in Appendix 2.A.3.

We see that, for both Q and P, the set of possible results of measurements is the set \(\mathbb {R}\) of real numbers. This set plays the same role here as the one played by the eigenvalues for matrices .Footnote 48

The “collapse rule” in the case of measurements of Q and P works as follows: since a measurement whose result can be any real number is never infinitely precise, but is rather an interval of real numbers, the collapsed wave function will be the original wave function restricted to that interval and normalized so that \(\int _\mathbb {R}|\Psi (x,t)|^2 dx = 1\) holds after the collapse .

All this may sound terribly abstract and “unphysical”, but the goal of this presentation is precisely to emphasize how much the quantum algorithm is an unambiguous method for accurately predicting results of measurements, and nothing else. In particular, it should not be associated with any mental picture of what is “really” going on. The main issue of course is whether one should consider this algorithm as satisfactory or as being, in some sense, the “end of physics”, or whether one should try to go beyond it.

2.C “Uncertainty” Relations and “Complementarity ”

An easy remark about the uncertainty relations is that there is a great deal of uncertainty about what exactly they mean: indeed, are they uncertainty relations or indeterminacy relations, and what are the differences between these two terms?

The first derivation of these relations by Heisenberg [256], which was more a heuristic argument than a real derivation,Footnote 49 was entirely compatible with a disturbance view of measurement, as expressed, for example, in the statement by Heisenberg [256] quoted in Sect. 2.5.2. This way of speaking assumes that electrons have a position and a velocity, even when they are not measured. It only shows that there are limits to how much we can know about one of these quantities without disturbing the other.

However, more radical conclusions are sometimes drawn, namely, that those uncertainty relations are really indeterminacy relations, i.e., that the positions and the velocities are indeterminate or do not exist before we measure them, or even that it does not make sense to speak of quantities that we cannot measure simultaneously . Here, we will leave aside these issues, which ultimately depend on our views about the meaning of the quantum state, discussed in Sect. 2.5, and simply give some precise versions of those relations.

2.1.1 2.C.1 A Statistical Relation

Consider a random variable x that can take values \(a_1,\ldots , a_n\) with respective probabilities \(p_i\), \(i=1,\ldots ,n\). The variance of x, \(\text{ Var }(x)\), is a way to measure how much the distribution of x is spread around its mean. For \(f: \{ a_1,\ldots , a_n\} \rightarrow \mathbb {R}\), we define the mean or the average of f(x) by

$$\begin{aligned} \langle f(x) \rangle = \sum ^n_{i=1} f(a_i) p_i\;. \end{aligned}$$
(2.C.1.1)

Then \(\text{ Var }(x)\) is defined as

$$\begin{aligned} \text{ Var }(x) = \langle x^2 \rangle - \langle x \rangle ^2 = \langle (x-\langle x\rangle )^2\rangle \;, \end{aligned}$$
(2.C.1.2)

where the second equality is checked by expanding the binomial. The quantity \(|x-\langle x\rangle |\) expresses the deviation of the variable x from its mean, so (2.C.1.2) gives a measure of the size of that deviation.

If x is a continuous random variable on \(\mathbb {R}\) (we work in one dimension for simplicity), with probability density p(x), then the definition (2.C.1.2) is still valid, with (2.C.1.1) replaced by

$$\begin{aligned} \langle f(x)\rangle = \int _{\mathbb {R}} f(x) p(x) dx\;. \end{aligned}$$
(2.C.1.3)

A precise statement of the uncertainty relations is as follows. Given a wave function \(\Psi (x)\), we know that the probability distribution density of results of measurements of the position x is \(|\Psi (x)|^2\), meaning that \(\int _A | \Psi (x)|^2 dx\) is the probability that, when the position of the particle is measured, the result belongs to \(A \subset {\mathbb {R}}\). We also showed in Appendix 2.A.3 that the results of measurements of the momentum p (which classically is just the mass times the velocity of the particle) are distributed with a probability density given by \(|\hat{\Psi }(p)|^2\), where \(\hat{\Psi }\) is the Fourier transform of \(\Psi \), defined by (2.A.2.18) (without the time variable), see (2.A.3.1).

We note that, since \(\int _{\mathbb {R}} | \Psi (x)|^2 dx= 1\), then by Plancherel’s theorem \(\int _{\mathbb {R}} |\hat{\Psi }(p)|^2 dp= 1\).

Given this, we have a variance \(\text{ Var }(x)\) for the distribution of x and a variance \(\text{ Var }(p)\) for the distribution of p. Their product satisfies a lower bound:

$$\begin{aligned} \text{ Var }(x) \text{ Var }(p) \ge \frac{1}{4}\;, \end{aligned}$$
(2.C.1.4)

bearing in mind that we choose units where \(\hbar = 1\). The bound (2.C.1.4) is a rather simple mathematical relation between a function and its Fourier transform and its proof can be found in many textbooks on Fourier transforms (see, e.g., [155]), as well as those on quantum mechanics.

One can give a concrete example of Heisenberg’s inequality (2.C.1.4) by considering Gaussian wave functions. For \(d=1\), let \(\Psi (x)= (a/\pi )^{1/4} \exp (-a x^2/2)\), which is normalized so that \(\int _{\mathbb {R}} |\Psi (x) |^2 dx= 1\). Then, using (2.A.2.18), it is easy to show that

$$ \hat{\Psi }(p)= \frac{1}{(\pi a)^{1/4}}\exp \left( - \frac{p^2}{2a}\right) \;. $$

If one computes the respective variances, one obtains: \(\text{ Var }(x)=1/2a\), \(\text{ Var } (p)= a/2\), whose product is 1 / 4, namely the lower bound in (2.C.1.4).

This illustrates the impossibility of “measuring both the position and the momentum ” simultaneously with arbitrary precision. Indeed, assume that, after a position measurement, the “collapsed” wave function is a “narrow” one (assumed to be Gaussian for simplicity), \(\Psi (x)= (a/\pi )^{1/4} \exp (-a x^2/2)\), with a large, which means that the position measurement is precise, since \(\text{ Var } (x)=1/2a\) is small. Then, the variance of the distribution of future measurements of momenta, \(\text{ Var } (p)= a/2\), will necessarily be large.

Since (2.C.1.4) is a lower bound on variances of results of measurement, it implies nothing whatsoever about the intrinsic properties of quantum particles. One could perfectly think, in accordance with the statistical interpretation , that each individual particle has a well-defined position and momentum; but, when we prepare a large number of particles having the same quantum state, then the positions and momenta of those particles vary and have certain statistical distributions whose variances satisfy (2.C.1.4).

This statistical view is untenable, but not because of the uncertainty relations . The problem for that view comes, as we saw in Sect. 2.5.2, from the no hidden variables theorems.

However there is another, more qualitative, version of “uncertainty” in quantum mechanics.

2.1.2 2.C.2 A Qualitative Argument and Its Relation to “Complementarity ”

Let us consider finite-dimensional systems for simplicity. As we saw in Appendix 2.B, a physical quantity (such as the spin) is associated with a self-adjoint matrix . Consider two such matrices A and B. Let us define their commutator :

$$\begin{aligned}{}[A,B] = AB-BA\;, \end{aligned}$$
(2.C.2.1)

where AB is the matrix product. Suppose \([A,B]=0\). If \(| e \rangle \ \) is an eigenvector of A, i.e.,

$$ A|e \rangle \ = \lambda |e \rangle \;, $$

then it is easy to see that \(B | e \rangle \ \) is also an eigenvector of A, with the same eigenvalue :

$$\begin{aligned} AB |e \rangle \ = BA |e \rangle \ = \lambda B |e \rangle \;. \end{aligned}$$
(2.C.2.2)

This holds also if we exchange A and B. Using this remark, one shows that, if \([A,B]=0\), then A and B have a common basis of eigenvectors (with different eigenvalues ).Footnote 50

Conversely, if A and B have a common basis of eigenvectors, then \([A,B]=0\). Since the only physically meaningful quantity are the basis vectors (and the associated numbers) corresponding to a physical quantity, if \([A,B]=0\), A and B just associate different numbers to the same basis.

Measuring A will reduce the quantum state to one of the eigenvectors of A. But if we then measure B, we will reduce the state to one of the eigenvectors of B, which is also an eigenvector of A if A and B commute . Hence, if we remeasure A after having measured B, the result will be with certainty the same eigenvalue of A as before and the state will not change, unlike when one tries to measure the spin in two different directions (see Fig. 2.2). It is in this sense that, if \([A,B]=0\), one can measure A and B simultaneously (and also the products AB or BA).

But if \([A,B]\ne 0\), there will be some eigenvector of A that is not an eigenvector of B. Suppose that one measures A when the state is an eigenvector of B with eigenvalue b. If, after the measurement of A, the state is an eigenvector of A that is not an eigenvector of B, then the result of a later measurement of B will not give back the original value b, since the state produced by the measurement of A is no longer an eigenvector of B.

This is what happened with the spin in directions 1 and 2, as was observed phenomenologically in Sect. 2.1 and described by the quantum formalism in Sect. 2.3. If we start with an eigenstate of the spin in direction 1 and then measure it in direction 2, we “lose” the memory of what value the spin had in direction 1, since the result of the spin measurement in direction 2 is an eigenvector of the matrix \(\sigma _2\) (2.B.4) and hence a superposition of states in direction 1 [see (2.3.5) and (2.3.6)].

This is one possible meaning of the word “complementarity ” which was so fundamental to Niels Bohr . The measurement of A or B gives us a “classical” description of reality where “classical” does not refer to classical physics but means “expressible in ordinary language” or “representable ” or “macroscopic ”. But since the two quantities cannot be measured simultaneously (i.e., without the measurement of one quantity disturbing the measurement of the other), one cannot “combine” the picture coming from the measurement of A and the one coming from the measurement of B into a coherent picture.

One can check that the operators Q and P, introduced in Appendices 2.A.2 and 2.B, do not commute :

$$\begin{aligned} (PQ\Psi )(x)&= -i \hbar \frac{d}{dx}\big [x\Psi (x)\big ]= -i \hbar \left[ \Psi (x) + x\frac{d}{dx}\Psi (x)\right] \ne -i \hbar x\frac{d}{dx}\Psi (x) \\&= (QP\Psi )(x). \end{aligned}$$

This can then be interpreted in terms of “complementarity ” between a “picture” based on positions and one based on momenta. But what this means depends on how we understand (2.C.1.4), and therefore how we understand the quantum formalism. The non-commutation of Q and P does not have an obvious meaning.

Let us remark finally that there is also a generalization of (2.C.1.4) that expresses quantitively this incompatibility between A and B. Given a quantum state \(\Psi \), one obtains a probability distribution for the results of the measurements of A and of B (described in Appendix 2.B: an eigenvalue \(\lambda _k\) occurs with probability \(|c_k|^2\)). Thus we can define the variances \(\text{ Var }_\Psi (A)\), \(\text{ Var }_\Psi (B)\), associated with those probability distributions. The generalization of (2.C.1.4) isFootnote 51

$$\begin{aligned} \text{ Var }_\Psi (A) \text{ Var }_\Psi (B) \ge \frac{1}{4} \Big |\big \langle \Psi \vert [A,B] \Psi \big \rangle \Big |^2\;, \end{aligned}$$
(2.C.2.3)

which is similar to (2.C.1.4).Footnote 52

2.D The Quantum Mechanical Description of Measurements

Let us consider a very simple measurement of the spin.Footnote 53 We start with a quantum state for the combined system composed of the particle and the measuring device:

$$\begin{aligned} \displaystyle \Psi _0 = \varphi _0 (z) \left[ c_1 \left( \begin{array}{ccc} 1 \\ 0 \end{array}\right) + c_2 \left( \begin{array}{ccc} 0 \\ 1 \end{array}\right) \right] \;, \end{aligned}$$
(2.D.1)

where z denotes a macroscopic variable, namely the position of the center of mass of the measuring device, and \(\varphi _0 (z)\) is centered at \(z=0\), meaning that the pointer is as in the first picture of Fig. 2.6. We leave aside here the spatial part of the quantum state of the particle, since we are only interested in what happens to the measuring device.

Let the Hamiltonian be

$$ H= - i\sigma \frac{\partial }{\partial z}\;,\quad \text{ where } \;\sigma = \left( \begin{array}{ccc} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) \;, $$

which corresponds to the introduction of an inhomogeneous magnetic field . One neglects here the kinetic energy term (corresponding to the free evolution) \(- (1/2m)\partial ^2 \Psi (z,t)/\partial z^2\). With these simplifications, the Schrödinger equation is

$$ i \frac{\partial }{\partial t} \Psi = -i\sigma \frac{\partial }{\partial z} \Psi \;, $$

and one can easily check that its solution is

$$\begin{aligned} c_1 \left( \begin{array}{ccc} 1 \\ 0 \end{array}\right) \varphi _0 (z-t) + c_2 \left( \begin{array}{ccc} 0 \\ 1 \end{array}\right) \varphi _{0} (z+t)\;. \end{aligned}$$
(2.D.2)

Since \(\varphi _0(z)\) is centered at \(z=0\), \(\varphi _0 (z\pm t)\) is centered at \(z=\mp t\), corresponding to the last two pictures in Fig. 2.6 (for a suitable t), which is the result mentioned in Sect. 2.5.1, where we wrote \(\varphi ^\uparrow (z)\) for \( \varphi _0 (z-t)\) and \(\varphi ^\downarrow (z)\) for \( \varphi _0 (z+t)\).

We can discuss the Mach–Zehnder interferometer in the presence of a wall in a similar way. Once the wall is inserted as in Fig. 2.4 and the state of the particle is (2.4.2), we get, for the combined system particle plus wall, the state

$$\begin{aligned} \frac{1}{\sqrt{2}} \Big [| 2\uparrow \rangle \ | \text{ path } 2\uparrow \rangle \varphi _0 (z) - | 2\downarrow \rangle \ | \text{ path } 2\downarrow \rangle \varphi _1 (z)\Big ]\;, \end{aligned}$$
(2.D.3)

where \(\varphi _0 (z)\) denotes the wave function of the wall not having absorbed the particle and \(\varphi _1 (z)\) that of the wall having absorbed the particle. If we replace the wall by an active bomb, as in the Elitzur –Vaidman bomb testing mechanism, \(\varphi _0 (z)\) will be the wave function of the unexploded bomb and \(\varphi _1 (z)\) that of the bomb having exploded. In both cases, we have a macroscopic object (the wall or the bomb) that plays the same role as the pointer in (2.D.2).

Consider now the more general situation described in Appendix 2.B, where the operator A is associated with a given physical quantity having a basis of eigenvectors :

$$\begin{aligned} A|e_n\rangle \ = \lambda _n |e_n\rangle \;, \end{aligned}$$
(2.D.4)

and the state of the system to be measured is

$$\begin{aligned} |\text {state}\rangle = \sum _n c_n |e_n\rangle \;, \end{aligned}$$
(2.D.5)

where n runs over a finite set or over \({{\mathbb {N}}}\). Consider a quantum state for the combined system plus measuring device:

$$\begin{aligned} \Psi _0 = \varphi _0 (z) \sum _n c_n |e_n\rangle \;, \end{aligned}$$
(2.D.6)

where z and \(\varphi _0 (z)\) are as above, i.e., \(\varphi _0 (z)\) is localized around 0.

Introducing a coupling between the system and the measuring device of the form H = \(\displaystyle -iA \partial /\partial z\) (A being the matrix \(\sigma \) in the example of the spin measurement above), one gets, neglecting again the kinetic energy term, the Schrödinger equation

$$\begin{aligned} i \frac{\partial }{\partial t} \Psi = -iA\frac{\partial }{\partial z} \Psi \;, \end{aligned}$$
(2.D.7)

whose solution is

$$\begin{aligned} \sum _n c_n \varphi _0 (z-\lambda _n t) |e_n\rangle \;, \end{aligned}$$
(2.D.8)

which generalizes (2.D.2), with \(\varphi _0 (z-\lambda _n t)\) having macroscopically disjoint supports for different \(\lambda _n\) when t is not too small, since \(\varphi _0 (z)\) is localized around 0. One obtains a situation similar to the pointer in the last two pictures in Fig. 2.6, but now with more possible positions [one for each n in the sum (2.D.8)].

Fig. 2.8
figure 8

The double-slit experiment. In all three figures, there is a source of particles going towards a screen in which one or two slits are open. There is second screen behind the first one on which particles are detected. In a the curve represents the density of particles detected on the second screen when one slit is open, and in b likewise, when the other slit is open. c The result when both slits are open. This is clearly not the sum of the first two results

2.E The Double-Slit Experiment

A standard way to introduce interference effects in quantum mechanics, such as the ones we saw in the Mach–Zehnder interferometer, is via the double-slit experiment [184]: particles are sent (one by one) through slits in a wall and the pictures below show how the particles are distributed when they are detected on another wall somewhere behind the slits. If only one slit is open, one gets the curves (a) and (b) of Fig. 2.8, representing the densities of particles being detected behind the slits (which is not surprising), while if both slits are open, one gets the interference effects shown in the last picture (c). One might expect that, with both slits open, the distribution of the particles would be the sum of those detected when only one slit is open. But instead we get the wavy line of Fig. 2.8c, with fewer particles at some places than there would be with only one slit open. So opening or closing one slit seems to influence the particles going through the other slit. And this remains true, qualitatively, even if the open slit is very far from the closed one.

This experiment illustrates once again the role of measurements in quantum mechanics: it is often described by saying that, if we close one slit, then we know which slit the particle went through, hence its behavior will be affected by our measurement. The same phenomenon (suppression of interference) would occur if we put a small light behind one of the slits that would allow us to detect the slit the particle went through.

This double-slit experiment is similar to the experiment with the Mach–Zehnder interferometer described in Sect. 2.2, but in the latter we dealt with sharper figures (100 % vs. 50 %) rather than the interference patterns . The calculus with spin (i.e., vectors in two dimensions) is also easier than it would be for the double-slit experiment, where one would have to solve the Schrödinger equation with initial conditions located around each of the slits in order to deduce the interference pattern of Fig. 2.8c. Figure 5.1 in Chap. 5 shows a numerical solution yielding the interference pattern (within the de Broglie–Bohm theory).

This experiment is often considered as the essence of the quantum mechanical mystery. On the basis of this experiment, one often denies that it makes sense to speak of particles going through one slit or the other. One also sometimes says that, if both slits are open, quantum objects behave as waves, and if only one slit is open, they behave as particles, which is another instance of Bohr’s “complementarity ”: one can have a “wave picture” or a “particle picture”, but not both simultaneously .

After describing the double-slit phenomenon, Feynman wrote:

Nobody knows any machinery. Nobody can give you a deeper explanation of this phenomenon than I have given; that is, a description of it.

Richard Feynman , [185, p. 145]

And in a well known classical textbook on quantum mechanics, Landau and Lifshitz said:

It is clear that [the results of the double-slit experiment] can in no way be reconciled with the idea that electrons move in paths . [...] In quantum mechanics there is no such concept as the path of a particle.

Lev Landau and Evgeny Lifshitz [302, p. 2]

We will discuss these statements in Chap. 5 in the light of the de Broglie–Bohm theory.

2.F Proof of the No Hidden Variables Theorem

We will now state more precisely and prove the theorem given at the the end of Sect. 2.5. The theorem is divided into two parts, and so are the proofs, which are similar, but using different background notions. We first state each part of the theorem precisely and then give its proof.

Precise Statement of Part 1

Let \({\mathcal O}\) be the set of self-adjoint matrices on a complex vector space of dimension four. Then, there does not exist a function \(v\,\):

$$\begin{aligned} v: { {\mathcal O}} \rightarrow {\mathbb {R}} \end{aligned}$$
(2.F.1)

such that:

  1. (1)
    $$\begin{aligned} \forall A \in { {\mathcal O}}\;,\quad v(A) \in \{ \text {eigenvalues of A} \}\;, \end{aligned}$$
    (2.F.2)
  2. (2)
    $$\begin{aligned} \forall A, B \in { {\mathcal O}}\;, \;\, \text {with} \ [A, B]= AB-BA=0\;,\;\, v(AB)=v(A)v(B)\;. \end{aligned}$$
    (2.F.3)

Remarks

We use here the formulation of quantum mechanical “measurements” in terms of matrices and eigenvalues , see Appendix 2.B. The first condition is natural if a measurement is supposed to reveal a pre-existing value corresponding to the quantity A. However, it should be stressed that we do not use the first condition very much in the proof. In fact, we only use it for \(A= \mathbf{-1}\), \(\mathbf{1}\) being the unit matrix, in the form \(v(\mathbf{-1})=-1\).

The second condition is necessary if the values v(A) are supposed to be in agreement with the quantum predictions, since, when A and B commute (i.e., when \(AB-BA=0\)), it is in principle possible to measure A, B, and AB simultaneously , and the product of the results of the first two measurements must be equal to the result of the last one, i.e., they must satisfy (2.F.3) (see Appendix 2.C.2).Footnote 54 This condition, unlike the first one, will be used repeatedly in the proof. Indeed, by choosing suitable pairs of commuting matrices, and applying (2.F.3) to each pair, we will derive a contradiction.

There are similar no hidden variables theorems in any space of dimension at least 3, see Bell [36], Kochen and Specker [291], and Mermin [335], but the proof given here works only in a four dimensional space (or in any space whose dimension is a multiple of four, by considering matrices that are direct sums of copies of the matrices used here).

It should be emphasized that, even though the set \( {\mathcal O}\) contains matrices that do not commute with each other, we use relation (2.F.3) only for commuting matrices, so that the only assumptions of the theorem are the quantum mechanical predictions for the results of possible measurements.

Sometimes people think that this theorem rules out only “non-contextual ” hidden variables: what this means is that, if we consider three matrices, AB and C, where A commutes with B and C, but B and C do not commute, then we are assuming that the result of measuring A does not depend on whether we choose to measure B or C simultaneously with A.Footnote 55 To be precise, we could write \(v(AB)=v(A) v(B)\) or \(v(AC)=v(A) v(C)\), since A commutes with both B and C, and we assume here that one has the same value v(A) in both equations.

Hidden variables would be called contextual if they depended on that choice (so, here, the hidden variables are non-contextual ). But this is not a way to “save” the possibility of hidden variables, at least those considered here: if measuring A is supposed to reveal an intrinsic property of the particle pre-existing to the measurement (and this is what is meant here by hidden variables), then it cannot possibly depend on whether I choose to measure B or C simultaneously with A, since I could measure A and nothing else. If someone has an age, a height and a weight (those being intrinsic properties of that person), then how could the result of measuring one of those properties depend on whether I measure or not another property together with that one, or on which property I would choose to measure?

The second condition (2.F.3) is necessary to derive a proof of the Theorem, but it does not affect its meaning.Footnote 56

Proof

We use the standard Pauli matrices \(\sigma _x\) [equal to \(\sigma _2\) in (2.B.4)], \(\sigma _y\), and \(\sigma _z\) [equal to \(\sigma _1\) in (2.B.4)]:

$$ \sigma _x = \left( \begin{array}{ccc} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) \;,\qquad \sigma _y = \left( \begin{array}{ccc} 0 &{} -i \\ i &{} 0 \end{array}\right) \;,\qquad \sigma _z = \left( \begin{array}{ccc} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) \;. $$

We consider a couple of each of those matrices, \(\sigma _x^i\), \(\sigma _y^i\), \(i=1,2\), where tensor products are implicit: \(\sigma _x^1 \equiv \sigma _x^1 \otimes \mathbf{1} \), \(\sigma _x^2 \equiv \mathbf{1} \otimes \sigma _x^2\), etc., with \( \mathbf{1} \) the unit matrix . These operators act on \(\mathbf{C}^4\). The following identities are well known and easy to check:

  1. (i)
    $$\begin{aligned} (\sigma _x^i)^2=(\sigma _y^i)^2 =(\sigma _z^i)^2=\mathbf{1}\;, \end{aligned}$$
    (2.F.4)

    for \(i=1,2\).

  2. (ii)

    Different Pauli matrices anticommute :

    $$\begin{aligned} \sigma _\alpha ^i \sigma _\beta ^i=-\sigma _\beta ^i\sigma _\alpha ^i\;, \end{aligned}$$
    (2.F.5)

    for \(i=1,2\), and \(\alpha , \beta = x, y, z\), \(\alpha \ne \beta \). And they have the following commutation relations:

    $$\begin{aligned}{}[\sigma _\alpha ^i, \sigma _\beta ^i]=2i \sigma _\gamma ^i\;, \end{aligned}$$
    (2.F.6)

    for \(i=1,2\), and \(\alpha , \beta , \gamma \) a cyclic permutation of xyz.

  3. (iii)

    Finally,

    $$\begin{aligned}{}[\sigma _\alpha ^1,\sigma _\beta ^2]= \sigma _\alpha ^1\sigma _\beta ^2-\sigma _\beta ^2 \sigma _\alpha ^1= \mathbf{0}\;, \end{aligned}$$
    (2.F.7)

    where \(\alpha , \beta = x, y, z\) and \(\mathbf{0}\) is the matrix with all entries equal to zero.

Consider now the identity

$$\begin{aligned} \sigma _x^1\sigma _y^2\sigma _y^1\sigma _x^2\sigma _x^1\sigma _x^2 \sigma _y^1\sigma _y^2 = -\mathbf{1}\;, \end{aligned}$$
(2.F.8)

which follows, using first (ii) and (iii) above to move \(\sigma _x^1\) in the product from the first place (starting from the left) to the fourth place, a move that involves one anticommutation (2.F.5) and two commutations (2.F.7), viz.,

$$\begin{aligned} \sigma _x^1\sigma _y^2\sigma _y^1\sigma _x^2\sigma _x^1\sigma _x^2\sigma _y^1\sigma _y^2 = -\sigma _y^2\sigma _y^1\sigma _x^2\sigma _x^1 \sigma _x^1\sigma _x^2\sigma _y^1\sigma _y^2\;, \end{aligned}$$
(2.F.9)

and then using (i) repeatedly, to see that the right-hand side of (2.F.9) equals \(-\mathbf{1}\).

We now define the operators

$$ A=\sigma _x^1 \sigma _y^2\;,\quad B=\sigma _y^1 \sigma _x^2\;,\quad C=\sigma _x^1 \sigma _x^2\;,\quad D=\sigma _y^1 \sigma _y^2\;,\quad X=AB\;,\quad Y=CD\;. $$

Using (ii) and (iii), we observe:

(\(\alpha \)):

\([A,B]=0\)

(\(\beta \)):

\([C,D]=0\)

(\(\gamma \)):

\([X,Y]=0\)

The identity (2.F.9) can be rewritten as

$$\begin{aligned} XY=-\mathbf{1}\;. \end{aligned}$$
(2.F.10)

But, using (2.F.3), \((\alpha )\), \((\beta )\), \((\gamma )\), and (2.F.7), we get:

  1. (a)

    \(v(XY)=v(X)v(Y)=v(AB)v(CD)\)

  2. (b)

    \(v(AB)=v(A)v(B)\)

  3. (c)

    \(v(CD)=v(C)v(D)\)

  4. (d)

    \(v(A)=v(\sigma _x^1)v(\sigma _y^2)\)

  5. (e)

    \(v(B)=v(\sigma _y^1)v(\sigma _x^2)\)

  6. (f)

    \(v(C)=v(\sigma _x^1)v(\sigma _x^2)\)

  7. (g)

    \(v(D)=v(\sigma _y^1)v(\sigma _y^2)\)

Since the only eigenvalue of the matrix \(-\mathbf{1}\) is \(-1\), by combining (2.F.10) with (2.F.2) in the theorem and (a)–(g), we get

$$\begin{aligned} v(XY)= -1 = v(\sigma _x^1)v(\sigma _y^2)v(\sigma _y^1) v(\sigma _x^2)v(\sigma _x^1) v(\sigma _x^2)v(\sigma _y^1)v(\sigma _y^2)\;, \end{aligned}$$
(2.F.11)

where the right-hand side equals \(v(\sigma _x^1)^2 v(\sigma _y^2)^2 v(\sigma _y^1)^2 v(\sigma _x^2)^2\), since all the factors in the product appear twice. But this last expression, being the square of a real number, is positive, and so cannot equal \(-1\). \(\blacksquare \)

Part (2) of the Theorem

The proof of part (2) of the theorem is very similar to the proof of part (1) and is taken from a paper by Wayne Myrvold [344], which is a simplified version of a result due to Robert Clifton [98]. We need to introduce here operators \(Q_1\), \(Q_2\) that act as multiplication on functionsFootnote 57:

$$\begin{aligned} Q_j \Psi ( x_1, x_2)= x_j \Psi ( x_1, x_2)\;,\quad j=1,2\;, \end{aligned}$$
(2.F.12)

and operator s \(P_1\), \(P_2\) that act by differentiation:

$$\begin{aligned} P_j \Psi ( x_1, x_2)= -i\frac{\partial }{\partial x_j} \Psi ( x_1, x_2)\;,\quad j=1,2\;. \end{aligned}$$
(2.F.13)

We already mentioned these operators, for one variable, in our discussion of Schrödinger’s equation in Appendices 2.A.2 and 2.B.

We will also need the operators \(U_j (b)= \exp (-i b Q_j)\), \(V_j (c)= \exp (-i c P_j)\), with \(Q_j\), \(P_j\) defined by (2.F.12), (2.F.13), and \(b, c \in \mathbb {R}\). They act as

$$\begin{aligned} U_j (b)\Psi ( x_1, x_2)=\exp (-i b x_j) \Psi ( x_1, x_2)\;,\quad j=1,2\;, \end{aligned}$$
(2.F.14)

which follows trivially from (2.F.12), and

$$\begin{aligned} V_1 (c) \Psi ( x_1, x_2)= \Psi ( x_1-c, x_2)\;, \end{aligned}$$
(2.F.15)

and similarly for \(V_2 (c)\). Equation (2.F.15) follows from (2.F.13) by expanding both sides in a Taylor series , for functions \(\Psi \) such that the series converge, and by extending the unitary operator \(V_2 (b)\) to more general functions \(\Psi \) (see, e.g., [412, Chap. 8] for an explanation of that extension).

Precise Statement of Part 2

Let \({\mathcal O}\) be the set of functions of the operators \(Q_1\), \(Q_2\), \(P_1\), or \(P_2\). Then, there does not exist a function

$$\begin{aligned} v: { {\mathcal O}} \rightarrow {\mathbb {R}} \end{aligned}$$
(2.F.16)

such that

  1. (1)
    $$\begin{aligned} \forall A \in { {\mathcal O}}\;,\quad v(A) \in \{ \text {eigenvalues of A}\}\;, \end{aligned}$$
    (2.F.17)
  2. (2)
    $$\begin{aligned} \forall A, B \in { {\mathcal O}}\;, \;\, \text {with} \;\, [A, B]= AB-BA=0\;,\;\, v(AB)=v(A)v(B)\;, \end{aligned}$$
    (2.F.18)

    where AB is the operator product.

Remark

Since, for \(A \in { {\mathcal O}}\), the set of possible results of measurements of A is \(\mathbb {R}\) and the function \(v: { {\mathcal O}} \rightarrow {\mathbb {R}}\), we do not need to specify a condition like (2.F.2) in part 1 of the theorem for all \(A \in { {\mathcal O}}\) (that is why the condition only refers to eigenvalues ). And, as in the proof of the first part of the theorem, the first condition (2.F.17) is only used for \(A= \mathbf{-1}\), \(\mathbf{1}\) being the unit operator , in the form \(v(\mathbf{-1})=-1\).

Proof

We choose the following functions of the operators \(Q_i\), \(P_i\,\):

$$ A_1 = \cos (a Q_1)\;,\quad A_2 = \cos (a Q_2) \;,\quad B_1 = \cos \frac{\pi P_1}{a}\;,\quad B_2 = \cos \frac{\pi P_2}{a}\;, $$

where a is an arbitrary constant, and the functions are defined by (2.F.14), (2.F.15), and the Euler relations:

$$\begin{aligned} \begin{array}{l} \displaystyle \cos (a Q_j) =\frac{ \exp (i a Q_j)+ \exp (-i a Q_j)}{2}\;,\\ \displaystyle \cos \frac{\pi P_j}{ a} =\frac{ \exp ( i\pi P_j/ a)+ \exp (- i\pi P_j/ a)}{2}\;, \end{array} \end{aligned}$$
(2.F.19)

for \(j=1, 2\). By applying (2.F.18) several times to pairs of commuting operators , we will derive a contradiction.

We have the relations

$$\begin{aligned}{}[A_1,A_2]= [B_1,B_2]=[A_1,B_2]=[A_2,B_1]=0\;, \end{aligned}$$
(2.F.20)

since these operators act on different variables, and

$$\begin{aligned} A_1B_1 = -B_1A_1\;,\qquad A_2B_2 = -B_2A_2\;. \end{aligned}$$
(2.F.21)

To prove (2.F.21), note that, from (2.F.14) and (2.F.15), one gets

$$\begin{aligned} U_j (b) V_j (c)= \exp (-ibc) V_j (c)U_j (b)\;, \end{aligned}$$
(2.F.22)

for \(j=1,2\), which, for \(bc=\pm \pi \), means

$$\begin{aligned} U_j (b) V_j (c)= - V_j (c)U_j (b)\;. \end{aligned}$$
(2.F.23)

Now use (2.F.19) to expand the product \(\cos (a Q_j) \cos (\pi P_j/ a)\) into a sum of four terms; each term will have the form of the left-hand side of (2.F.22) with \(b=\pm a\), \(c= \pm \pi / a\), whence \(bc=\pm \pi \). Then applying (2.F.23) to each term proves (2.F.21).

The relations (2.F.20) and (2.F.21) imply

$$\begin{aligned} A_1A_2B_1B_2 = B_1B_2 A_1A_2\;,\qquad A_1B_2 A_2B_1 = A_2B_1A_1B_2\;. \end{aligned}$$
(2.F.24)

Let \(v(Q_1)=q_1\), \(v(Q_2)=q_2\), \(v(P_1)=p_1\), and \(v(P_2)=p_2\). Since the functions \(A_1\), \(A_2\), \(B_1\), and \(B_2\) can be defined by their Taylor series and we have \(v(Q_1^n)= v(Q_1)^n= q_1^n\) by (2.F.18) ( \(Q_1\) commutes with itself), and similarly for \(Q_2\), \(P_1\), \(P_2\), it follows that

$$\begin{aligned} v(A_1) = \cos (aq_1)\;,\quad v(A_2) = \cos (a q_2)\;,\quad v(B_1) = \cos \frac{\pi p_1}{ a}\;,\quad v(B_2) = \cos \frac{\pi p_2}{a}\;. \end{aligned}$$
(2.F.25)

Since \(A_1\) and \(A_2\) commute , we get from (2.F.18),

$$ v(A_1A_2) = v(A_1) v(A_2)\;, $$

and similarly,

$$\begin{aligned} v(B_1B_2) = v(B_1)v(B_2)\;,\quad v(A_1B_2) = v(A_1) v(B_2)\;,\quad v(A_2B_1) = v (A_2) v(B_2)\;. \end{aligned}$$
(2.F.26)

Consider now the operators \(X=A_1A_2B_1B_2\) and \(Y=A_1B_2A_2B_1\). Using \(B_2A_2=-A_2B_2\), from (2.F.21), we get

$$\begin{aligned} X = -Y\;. \end{aligned}$$
(2.F.27)

On the other hand, since by (2.F.24) \(A_1A_2\) commutes with \(B_1B_2\), we have from (2.F.18),

$$\begin{aligned} v(X) = v(A_1A_2B_1B_2) = v(A_1A_2) v(B_1B_2) = v(A_1)v(A_2)v(B_1)v(B_2)\;, \end{aligned}$$
(2.F.28)

where, in the last equality, we use (2.F.26). Similarly, since by (2.F.24) \(A_1B_2\) commutes with \(A_2B_1\),

$$\begin{aligned} v(Y)=v(A_1B_1) v(A_2B_2) = v(A_1) v(B_2) v(A_2) v(B_1)\;. \end{aligned}$$
(2.F.29)

Comparing (2.F.28) and (2.F.29), we see that

$$ v(X)=v(Y)\;, $$

while (2.F.27) implies \(v(X)=v(-Y)= v(\mathbf{-1} Y)=v(\mathbf{-1}) v(Y)=-v(Y)\). This means that \(v(X)=v(Y)=0\) and hence that one of the four quantities \(v(A_1)\), \(v(A_2)\), \(v(B_1)\), or \(v(B_2)\) vanishes, and this obviously cannot hold for all values of a in (2.F.25) and given values of \(q_1, q_2, p_1, p_2\). \(\blacksquare \)

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Bricmont, J. (2016). The First Mystery: Interference and Superpositions. In: Making Sense of Quantum Mechanics. Springer, Cham. https://doi.org/10.1007/978-3-319-25889-8_2

Download citation

Publish with us

Policies and ethics