Nearly three decades have passed since Aharonov et al. [1] introduced weak measurements and values. Nevertheless, they remain a subject of debate. Recently, Vaidman [2, 3] analyzed the nested Mach-Zehnder interferometer experiment with two-state vector formalism and insisted that the past of a quantum particle could be described according to the weak trace. Li et al. [4, 5] challenged Vaidman’s claim and insisted that the weak trace could be understood without any unusual probability theory if the disturbances of the weak measurements are considered. However, they agreed with Vaidman with regard to the physical meaning of the weak values.

Moreover, Ferrie and Combes [6, 7] argued that weak values are classical statistic quantities, which gave rise to a number of rebuttals [8,9,10,11,12]. In particular, Pusey [13] showed that anomalous (imaginary, negative, and unbounded) weak values are non-classical and proofs of contextuality. However, he did not show how the contextuality is responsible for the anomalous weak values.

As confirmed by many experiments, the measured value of the weak measurement agrees with the corresponding weak value. In this paper, therefore, we carefully examine the process of the weak measurement to know what the weak value is. It is shown that the physical meanings of weak measurements and weak values can be completely understood within the framework of a conventional quantum mechanical approach, that is, with Born rule and the general probability theory. Much confusion concerning the weak value has been caused by the following hypothesis: the weak value of \(\hat {A}\) is a conditional or some kind of expectation value of \(\hat {A}\). We demonstrate

$$ \langle \hat{A}\rangle^{w}\equiv\frac{\langle F|\hat{A}|I\rangle}{\langle F|I\rangle} $$
(1)

is not the expectation value of \(\hat {A}\) with the pre-state |I〉 and post-state 〈F|; its real and imaginary parts are, which are accompanied with some constant factors, essentially the expectation values of \((1/2)(|F\rangle \langle F|\hat {A}+\hat {A}|F\rangle \langle F|)\) and \((i/2)(|F\rangle \langle F|\hat {A}-\hat {A}|F\rangle \langle F|)\) for |I〉 boosted by 1/|〈I|F〉|2 via the post-selection, respectively. If \(\hat {A}\) and |F〉〈F| do not commute, these values are completely different from the real and imaginary parts of the expectation value of \(\hat {A}\) for |I〉 with the post-selection. Moreover, even if \(\hat {A}\) is a projection operator, \(\hat {A}|F\rangle \langle F|\) is not. Therefore, we have no reason to expect the weak value of \(\hat {A}\) within its eigenvalue range.

First, we examine the process of the weak measurement by means of von Neumann-type measurement [14] according to [1]. The interaction Hamiltonian \(\hat {H}_{A}\) between an observable \(\hat {A}\) of the observed system and the momentum \(\hat {\pi }_{A}\) of the pointer of the measuring device is

$$ \hat{H}_{A}\equiv g_{A}\hat{A}\hat{\pi}_{A}, $$
(2)

where g A is the coupling constant. \(\hat {H}_{A}\) is assumed to be constant and roughly equivalent to the total Hamiltonian \(\hat {H}\) over some interaction time t A . The initial wavefunction ϕ A (x) of the measuring apparatus is assumed to be

$$ \phi_{A} (x)=\langle x_{A}|\phi_{A}\rangle =\left( {1\over \sqrt{2\pi}\sigma_{A}}\right)^{1/2}\exp\left( -{{x_{A}^{2}}\over 4{\sigma_{A}^{2}}}\right), $$
(3)

where x A is the position of the pointer of the measuring device. The initial state |Φ A (0)〉 = |I〉|ϕ A 〉, where |I〉 is the initial state of the observed system, of the unified system of the observed system and the measuring device, evolves unitarily obeying the Schrödinger equation:

$$ i\hbar{d\over dt}|{\Phi}_{A}(t_{A})\rangle =\hat{H}|{\Phi}_{A} (t_{A})\rangle \sim\hat{H}_{A}|{\Phi}_{A} (t_{A})\rangle , $$
(4)

and becomes

$$ |{\Phi}_{A}(t_{A})\rangle =\exp\left( -\frac{i\hat{H}_{A}t_{A}}{\hbar}\right)|{\Phi}_{A}(0)\rangle. $$
(5)

Up to the first order of g A t A ,

$$ |{\Phi}_{A} (t_{A})\rangle =|I\rangle |\phi_{A}\rangle -{ig_{A}t_{A}\over\hbar}\hat{A}|I\rangle\hat\pi_{A} |\phi_{A}\rangle . $$
(6)

Instead, we can equally describe the unified system by means of the density matrix

$$ \hat\rho_{A}(t_{A})=|{\Phi}_{A}(t_{A})\rangle\langle{\Phi}_{A}(t_{A})|. $$
(7)

Without any post-selection, the expectation value of \(\overline x_{A}\) of the pointer’s position \(\hat {x}_{A}\) for this state is

$$\begin{array}{@{}rcl@{}} \overline{x}_{A}&=&\text{Tr}\left[\hat{\rho}_{A}(t_{A})\hat{x}_{A}\right]\\ &=&g_{A}t_{A}\langle I|\hat{A}|I\rangle . \end{array} $$
(8)

In [1], it was insisted that the state of the measuring device right after the unitary interaction with the measured system and with post-selection 〈F| for the measured system is

$$ \frac{\langle F|{\Phi}_{A}(t)\rangle}{\langle F|I\rangle}=|\phi_{A}\rangle -\frac{ig_{A}t_{A}}{\hbar}\frac{\langle F|\hat{A}|I\rangle}{\langle F|I\rangle}\hat{\pi}_{A}|\phi_{A}\rangle. $$
(9)

Here, we show that this claim is not exact because of the non-separability of the measured system and the measuring device [15, 16]. To this end, we assume that the ensemble S of the observed system and the ensemble M of the measuring device after their unitary interaction are both separately obtained by combining all the elements of sub-ensembles, each of which is described by its own ket. Then, each element of S belongs to one of the sub-ensembles E i , i = 1,2,⋯ described by |s i 〉 and each element of M belongs to one of the sub-ensembles E α , α = 1,2,⋯ described by |m α 〉, such that the sub-ensemble εi, α of the unified system, whose elements belong to both E i and E α , is described by the density matrix

$$ \hat\rho_{i,\alpha}=|s_{i}\rangle |m_{\alpha}\rangle\langle m_{\alpha}|\langle s_{i}|. $$
(10)

Because the unified system’s ensemble ε is the union of all the εi, α, the density matrix \(\hat \rho ^{\prime }\) describing ε should be written as the weighted sum of all the \(\hat \rho _{i,\alpha }\):

$$ \hat\rho^{\prime}=\sum\limits_{i,\alpha}P_{i,\alpha}\hat\rho_{i,\alpha} , $$
(11)

where Pi, α are suitable factors. However, ε is defined to be described by (6), such that it should be described by the density matrix (7). \(\hat \rho _{A}(t)\) and \(\hat \rho ^{\prime }\) are necessarily different, except in the case that |Φ A (t)〉 is a product of a vector |S〉 in the Hilbert space of the observed system and a vector |M〉 in the Hilbert space of the measuring apparatus, i.e.,

$$ |{\Phi}_{A}(t)\rangle =|S\rangle |M\rangle. $$
(12)

Equation (6) does not have this form. Therefore, the previous assumption has been shown to be false.

We must say for the above reason that both the observed system and the measuring device do not have separate ensembles of their own. Therefore, we conclude that the operation of 〈F| on (6) changes the unified system and (9) is not the state of the measuring device right after their unitary interaction, i.e. right after t A .

Then, we clarify what the weak value is. This requires careful examination of the weak measurement, especially of the post-selection. With this end in view we must consider two measuring devices: one weakly measures the observable \(\hat {A}\) and the other selects the post-state 〈F| via a projection measurement. Their interaction Hamiltonians are (2) and

$$\hat{H}_{F}=g_{F}\hat{F}\hat\pi_{F}, $$

where \(\hat {F}\equiv |F\rangle \langle F|\) and \(\hat \pi _{F}\) is the momentum of the pointer of the measuring device of \(\hat {F}\). The initial state of the unified system of the observed system and the two measuring devices is

$$|{\Phi}(0)\rangle=|I\rangle |\phi_{A}\rangle |\phi_{F}\rangle, $$

where |ϕ F 〉 is the initial state of the measuring device of \(\hat {F}\) whose wave function is assumed to be

$$ \phi_{F} (x)=\langle x_{F}|\phi_{F}\rangle =\left( {1\over \sqrt{2\pi}\sigma_{F}}\right)^{1/2}\exp\left( -{{x_{F}^{2}}\over 4{\sigma_{F}^{2}}}\right), $$
(13)

where x F is the position of the pointer of the measuring device of \(\hat {F}\).

We weakly measure \(\hat {A}\) and then select the final state. Therefore, the state following the interaction between the observed system and the measuring devices is

$$ |{\Phi}(t)\rangle =\exp \left( -\frac{iH_{F}t_{F}}{\hbar}\right)\exp \left( -\frac{iH_{A}t_{A}}{\hbar}\right)|{\Phi}(0)\rangle, $$
(14)

where

$$t=t_{A}+t_{F}. $$

Up to the first order of g A t A ,

$$ |{\Phi}(t)\rangle =\exp\left( -\frac{i\hat{H}_{F}t_{F}}{\hbar}\right)|\phi_{F}\rangle \left[|I\rangle |\phi_{A}\rangle -{ig_{A}t_{A}\over\hbar}\hat{A}|I\rangle\hat\pi_{A} |\phi_{A}\rangle\right] . $$
(15)

We define the partial density matrix \(\hat {\rho }^{(m)}(t)\) of the measuring devices as

$$ \hat{\rho}^{(m)}(t)=\text{Tr}^{(s)}\left[|{\Phi}(t)\rangle\langle{\Phi}(t)|\right], $$
(16)

where Tr(s) is the partial trace of the observed system. By calculating the expectation value of either \(\hat {x}_{A}\) or \(\hat {x}_{F}\), we can obtain the expectation value of either \(\hat {A}\) or \(\hat {F}\) accurately as follows:

$$ \overline x_{A}\equiv \text{Tr}\left[\hat\rho^{(m)}(t)\hat{x}_{A}\right] =g_{A}t_{A}\langle I|\hat{A}|I\rangle, $$
(17)
$$ \overline{x}_{F}\equiv \text{Tr}\left[\hat\rho^{(m)}(t)\hat{x}_{F}\right]=g_{F}t_{F}\langle I|\hat{F}|I\rangle. $$
(18)

Because \(\hat {x}_{A}\) and \(\hat {x}_{F}\) commute, we can obtain their measured values X A and X B simultaneously. However, we cannot know the expectation values of both \(\hat {A}\) and \(\hat {F}\) simultaneously [17]. Its reason is almost the same as the previous discussion: If the ensembles M A and M F of the two measuring devices after their unitary interaction with the measured system are both separately obtained by combining all the elements of the sub-ensembles, each of them can be described by its own ket. Each element of M A belongs to one of the sub-ensembles E α , α = 1,2,⋯, described by |a α 〉 and each element of M F belongs to one of the sub-ensembles E β , β = 1,2,⋯, described by |f β 〉 such that the sub-ensemble εα, β of the combined measuring device, whose elements belong to both E α and E β , is described by the density matrix

$$\hat\rho_{\alpha ,\beta}=|f_{\beta}\rangle |a_{\alpha}\rangle\langle a_{\alpha}|\langle f_{\beta}|, $$

and the ensemble of the combined measuring device is described as the weighted sum of \(\hat \rho _{\alpha ,\beta }\):

$$ \hat\rho^{\prime\prime}=\sum\limits_{\alpha ,\beta}P_{\alpha ,\beta}\hat\rho_{\alpha ,\beta}, $$
(19)

where Pα, β are suitable factors. However, (16) does not take the form of (19) if \(\hat {F}\) and \(\hat {A}\) do not commute. Therefore, \(\hat {x}_{A}\) and \(\hat {x}_{F}\) are entangled, i.e., the position operators of both measuring devices after the unitary interaction with the measured system do not have their own separate ensembles. We should regard the measurement of \(\hat {x}_{A}\) and \(\hat {x}_{F}\) as one manipulation.

Then, we reconsider the process to know what outcome we obtain, i.e., what observable of the unified measuring device we read in this manipulation and what observable of the observed system corresponds to the outcome of the unified measuring device.

Although both \(\hat {x}_{A}\) and \(\hat {x}_{F}\) are measured in the weak measurement with post-selection, their measured values X A and X F should not be treated separately, as shown above. Because \(\hat {x}_{F}\) is a projection operator, X F is 1 or 0 and \({X_{F}^{n}}=X_{F}\ (n\ne 0)\). Here and hereafter, we put g F t F = 1. On the other hand, we can know only the sum of post-selected (and not selected) X A ’s, so that the outcome must be regarded as linear of X A . Therefore the outcome of the weak measurement with the post-selection is X A X F and the measured observable is \(\hat {x}_{A}\hat {x}_{F}\). Its expectation value is

$$\begin{array}{@{}rcl@{}} \overline{x_{A}x_{F}}&=&\text{Tr}\left[ \hat{x}_{F}\hat{x}_{A}\hat\rho^{(m)}(t)\right]\\ &=&\text{Tr}\left[ \hat{x}_{A}\hat{x}_{F}\hat\rho^{(m)}(t)\right]\\ &=&\frac{1}{2}g_{A}t_{A}\langle I|(\hat{F}\hat{A}+\hat{A}\hat{F})|I\rangle, \end{array} $$
(20)

which is equal to 〈X A X F 〉, the average of X A X F . Because \(\hat {x}_{F}\) and \(\hat {x}_{A}\) are entangled and \(\overline {x_{A}x_{F}}\ne \overline x_{A}\cdot \overline x_{F}\), we cannot obtain the expectation value of \(\hat {A}\) if it does not commute with \(\hat {F}\). (We can approximately obtain the expectation value of \(\hat {F}\) because the first measurement is weak.) Instead, we can obtain the expectation value of \((1/2)(\hat {F}\hat {A}+\hat {A}\hat {F})\) via the weak measurement.

The physical meaning of post-selection should be considered carefully in this context. In the post-selection, we select cases of X F = 1, which is approximate selection of the final state 〈F|. Because the post-selection X F = 1 (i.e., X F ≠  0) implies X A X F ≠  0 (if X A ≠  0), the average of X A X F after the post-selection X F = 1 is equal to the average of X A after the post-selection:

$$ \langle X_{A}\rangle^{(p)}=\langle X_{A}X_{F}\rangle^{(p)}, $$
(21)

where 〈 〉(p) stands for the average after post-selection. Moreover, because 〈X A X F (p) is the quotient of the sum of post-selected X A X F ’s, which is equal to the sum of all X A X F ’s without any post-selection, divided by the number of the post-selected data, it is boosted by 1/〈X F 〉:

$$ \frac{\langle X_{A}X_{F}\rangle^{(p)}}{\langle X_{A}X_{F}\rangle}=\frac{1}{\langle X_{F}\rangle}, $$
(22)

where 〈X F 〉 is nearly equal to \(\overline x_{F}\), because the first measurement is weak. For example, if the measured values are

(23)

then, 〈X A X F 〉 = 0.4, 〈X F 〉 = 0.2, 〈X A X F (p) = 2.

Gathering these pieces, we obtain

$$ \langle X_{A}\rangle^{(p)}=\frac{\overline{x_{A}x_{F}}}{\overline x_{F}}. $$
(24)

By means of (18) and (20), (24) becomes

$$ \frac{\langle X_{A}\rangle^{(p)}}{g_{A}t_{A}}=\frac{\langle I|(\hat{F}\hat{A}+\hat{A}\hat{F})|I\rangle}{2\langle I|\hat{F}|I\rangle}. $$
(25)

The right-hand side of (25) is the real part of the weak value (1). If some pairs of \(\hat {A}\), \(\hat {F}\) and |I〉〈I| commute, it becomes \(\langle I|\hat {A}|I\rangle \) independently of the post-selection. If \(\left [\hat {A},[\hat {F},|I\rangle \langle I|]\right ]= 0\), it is in proportion to \(\langle I|\hat {A}|I\rangle \). Otherwise, it is not an expectation value of \(\hat {A}\) in any sense, less to be the expectation value of \(\hat {A}\) after the post-selection 〈F|. In contrast, it is the expectation value of \((1/2)(\hat {F}\hat {A}+\hat {A}\hat {F})\) boosted by the post-selection. This is the reason why the weak value of \(\hat {A}\) may be out of the eigenvalue range of \(\hat {A}\).

In summary, our main result comes down to (24), which clarifies that weak values can be completely understood within the framework of conventional quantum mechanics, that is, with Born rule and the general probability theory. Weak measurement with post-selection should be considered as a method to measure an observable which are product of two observables, one of which is a projection operator, and to boost its measured value.