1 Introduction

One of the fundamental properties of quantum relative entropy is monotonicity under quantum channels, or the data processing inequality (DPI):

$$\begin{aligned} D(\Phi (\rho )\Vert \Phi (\sigma ))\le D(\rho \Vert \sigma ) \end{aligned}$$
(1)

for any pair of quantum states \(\rho ,\sigma \) and any completely positive trace preserving map, [32, 52]. The DPI implies other important quantum entropic inequalities, such as the Holevo bound [18], strong subadditivity of von Neumann entropy (SSA) [31] or the joint convexity of relative entropy. In fact, SSA, joint convexity and DPI are all equivalent, see [44] and the proof of DPI in [32] is based on the SSA.

The question when the data processing inequality becomes an equality for a completely positive map and a pair of states was first answered by Petz [38, 39], who proved that, provided the relative entropy \(D(\rho \Vert \sigma )\) is finite, equality occurs if and only if the two states can be fully recovered. This means that there exists a channel \(\Psi \) such that \(\Psi \circ \Phi (\rho )=\rho \) and \(\Psi \circ \Phi (\sigma )=\sigma \). In this case, we say that the channel \(\Phi \) is sufficient with respect to the pair of states \(\{\rho ,\sigma \}\), in analogy with the classical notion of a statistic sufficient with respect to a family of probability distributions. Moreover, Petz proved that there exists a universal recovery channel \(\Phi _\sigma \), such that \(\Phi _\sigma \circ \Phi (\sigma )=\sigma \) and we have \(\Phi _\sigma \circ \Phi (\rho )=\rho \) if and only if the channel \(\Phi \) is sufficient with respect to \(\{\rho ,\sigma \}\).

Sufficiency, or sometimes called reversibility, of channels was studied in a number of subsequent works and several characterizations and applications were found, [17, 20, 24, 25, 33, 35, 46]. In particular, equality conditions for various forms of DPI were studied, e.g. [27, 30, 44], and their relation to sufficiency were examined for other information theoretic or statistical quantities, such as different versions of quantum f-divergences [15, 16], quantum Rényi relative entropies [15, 21,22,23], Holevo quantity [45], Fisher information and \(L_1\)-distance [20].

An approximate version of sufficiency, called (approximate) recoverability is a much stronger result stating that if the decrease in the relative entropy is small, there exists a channel that recovers \(\sigma \) perfectly while \(\rho \) is recovered up to a small error. First result of this form was proved in the work of Fawzi and Renner [8], who considered approximate quantum Markov chains. This was soon extended to more general channels [28, 48, 49, 53] and a variety of quantities such as f-divergences [4, 5], optimized f-divergences [11] and Fisher information [12]. An important result in this context is existence of an universal recovery channel \(\Phi ^u_\sigma \) depending only on the state \(\sigma \) such that [28]

$$\begin{aligned} D(\rho \Vert \sigma )-D(\Phi (\rho )\Vert \Phi (\sigma ))\ge -2\log F(\rho , \Phi ^u_\sigma \circ \Phi (\rho ))\ge \Vert \rho -\Phi ^u_\sigma \circ \Phi (\rho )\Vert _1^2, \end{aligned}$$

here \(\Vert \cdot \Vert _1\) denotes the trace norm and \(F(\rho ,\sigma )=\Vert \rho ^{1/2}\sigma ^{1/2}\Vert _1\) is the fidelity. See also [6, 7, 11] for the respective results for normal states of von Neumann algebras.

In the simplest setting of quantum hypothesis testing, the null hypothesis \(H_0=\sigma \) is tested against the alternative \(H_1=\rho \). The tests are represented by operators \(0\le M\le I\), with the interpretation that \(\textrm{Tr}\,[\omega M]\) is the probability of rejecting the hypothesis if the true state is \(\omega \). For the test represented by M, the Bayes error probability for \(\lambda \in [0,1]\) is expressed as

$$\begin{aligned} P_e(\lambda ,\rho ,\sigma ,M)=\lambda \textrm{Tr}\,[\sigma M]+(1-\lambda )\textrm{Tr}\,[\rho (I-M)] \end{aligned}$$

and the test is Bayes optimal for \(\lambda \) if this error probability is minimal over all possible tests. It is quite clear that if we replace the states by \(\Phi (\sigma )\) and \(\Phi (\rho )\), the achievable error probabilities cannot be decreased. It is a natural question when the optimal error probabilities are preserved under \(\Phi \), which is equivalent to preservation of the \(L_1\)-norm:

$$\begin{aligned} \Vert \rho -s\sigma \Vert _1=\Vert \Phi (\rho )-s\Phi (\sigma )\Vert _1,\qquad \forall s. \end{aligned}$$
(2)

In classical statistics, the theorem of Pfanzagl [40, 47] states that if the achievable error probabilities for a pair of probability measures \(\{P_0,P_1\}\) do not increase after transformation by a statistic T, then T must be sufficient with respect to \(\{P_0,P_1\}\). The corresponding result for quantum channels was investigated in [20, 26], and in [34] where more general risk functions for decision problems were considered. The equivalent question of preservation of the \(L_1\)-distance, with applications to error correction, was studied in [2, 50]. In all these works, additional conditions were needed, such as the equalities have to be assumed either for larger sets of states with a special structure, or for any number of copies of \(\rho \) and \(\sigma \). The case when \(\rho \) and \(\sigma \) commute, or the channel \(\Phi \) has commutative range, was solved in [20].

Many of the results on recoverability of channels rely on an integral representation of the relative entropies or other quantities in question such as f-divergences. These formulas are based on integral representation of operator convex functions. Recently, a new integral formula for the relative entropy of positive semidefinite matrices was proved in [9]. This formula can be easily extended to infinite dimensional Hilbert spaces and rewritten in terms of the optimal Bayes error probabilities. We use this formula for simple proofs of a characterization of recoverability of quantum channels by preserving hypothesis testing error probabilities, or equivalently \(L_1\)-distances, without any additional assumptions needed in the previous works.

2 Preliminaries

Throughout this paper, \(\mathcal {H}\) is a Hilbert space and we denote by \(\mathcal {T}(\mathcal {H})\) the set of operators with finite trace and by \(\mathcal {S}(\mathcal {H})\) the set of states (density operators) on \(\mathcal {H}\), that is, positive operators of trace 1. For a self-adjoint operator \(A\in B(\mathcal {H})\), \(A_\pm \) denotes the positive/negative part of A and for \(A\ge 0\), we denote the projection onto the support of A by \(\textrm{supp}(A)\). The \(L_1\)-distance in \(\mathcal {T}(\mathcal {H})\) is defined as

$$\begin{aligned} \Vert S\Vert _1:=\sup _{\Vert A\Vert \le 1} \textrm{Tr}\,[AS]=\textrm{Tr}\,|S|,\qquad S\in \mathcal {T}(\mathcal {H}). \end{aligned}$$

If \(S\in \mathcal {T}(\mathcal {H})\) is self-adjoint, then we have

$$\begin{aligned} \textrm{Tr}\,[S_+]=\sup _{0\le M\le I} \textrm{Tr}\,[MS],\quad \textrm{Tr}\,[S_-]=-\inf _{0\le M\le I} \textrm{Tr}\,[MS] \end{aligned}$$

and \(\textrm{Tr}\,[S_\pm ]=\tfrac{1}{2}(\Vert S\Vert _1\pm \textrm{Tr}\,[S])\).

A quantum channel \(\Phi \) is a completely positive trace preserving map \(\mathcal {T}(\mathcal {H})\rightarrow \mathcal {T}(\mathcal {K})\). The adjoint of \(\Phi \) is the map \(\Phi ^*:B(\mathcal {K})\rightarrow B(\mathcal {H})\), defined by

$$\begin{aligned} \textrm{Tr}\,[\Phi ^*(A)\rho ]=\textrm{Tr}\,[A\Phi (\rho )],\qquad A\in B(\mathcal {H}),\ \rho \in \mathcal {S}(\mathcal {H}). \end{aligned}$$

It is easily seen that \(\Phi ^*\) is completely positive and unital.

For positive operators \(\rho ,\sigma \in \mathcal {T}(\mathcal {H})\), the quantum relative entropy is defined as

Relative entropy satisfies the data processing inequality (1) which holds for any pair of states \(\rho ,\sigma \) and any quantum channel \(\Phi : \mathcal {T}(\mathcal {H})\rightarrow \mathcal {T}(\mathcal {K})\).

2.1 Quantum hypothesis testing and \(L_1\)-distance

In the problem of hypothesis testing, the task is to test the hypothesis \(H_0=\sigma \) against the alternative \(H_1=\rho \). Any test is represented by an effect on \(\mathcal {H}\), that is, an operator \(0\le M\le I\), corresponding to rejecting the hypothesis. For a test M, the error probabilities are

$$\begin{aligned} \alpha (M)=\textrm{Tr}\,[\sigma M],\qquad \beta (M)= \textrm{Tr}\,[\rho (I-M)]. \end{aligned}$$

For \(\lambda \in (0,1)\), we define the Bayes optimal test as the minimizer of

$$\begin{aligned}{} & {} P_e(\lambda ,\sigma ,\rho ,M):=\lambda \alpha (M)+(1-\lambda )\beta (M)=(1-\lambda )(1-\textrm{Tr}\,[(\rho -s\sigma )M]),\qquad \\{} & {} \quad s=\frac{\lambda }{1-\lambda }. \end{aligned}$$

The proof of the following description of the Bayes optimal tests can be found in [26].

Lemma 1

(Quantum Neyman-Pearson) Let \(\rho ,\sigma \) be states, \(\lambda \in (0,1)\) and put \(s=\frac{\lambda }{1-\lambda }\). A test M is a Bayes optimal test for \(\lambda ,\sigma ,\rho \) if and only if

$$\begin{aligned} M=P_{s,+}+X_s,\qquad 0\le X_s\le P_{s,0}, \end{aligned}$$

where \(P_{s,\pm }=\textrm{supp}((\rho -s\sigma )_\pm )\) and \(P_{s,0}=I-P_{s,+}-P_{s,-}\). The optimal error probability is then

$$\begin{aligned} P_e(\lambda ,\sigma ,\rho ):=\max _M P_e(\lambda ,\sigma ,\rho ,M)&=(1-\lambda )(1-\textrm{Tr}\,[(\rho -s\sigma )_+])\\&=(1-\lambda )(s-\textrm{Tr}\,[(\rho -s\sigma )_-])\\&=\frac{1}{2}(1-(1-\lambda )\Vert \rho -s\sigma \Vert _1). \end{aligned}$$

It is easily seen that the error probabilities and the related quantities in the above lemma are monotone under channels, in particular,

$$\begin{aligned} P_e(\lambda ,\Phi (\sigma ),\Phi (\rho ))&\ge P_e(\lambda ,\sigma ,\rho ),\\ \Vert \Phi (\rho )-s\Phi (\sigma )\Vert _1&\le \Vert \rho -s\sigma \Vert _1,\\ \textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_-]&\le \textrm{Tr}\,[(\rho -s\sigma )_-]. \end{aligned}$$

In fact, monotonicity holds if \(\Phi \) is a positive trace preserving map, so complete positivity is not needed.

2.2 Integral formula for the relative entropy

The following new integral representation of the relative entropy was proved by Frenkel in [9], in the case \(\dim (\mathcal {H})<\infty \).

Theorem 1

Let \(\rho ,\sigma \) be positive operators in \(\mathcal {T}(\mathcal {H})\). Then

$$\begin{aligned} D(\rho \Vert \sigma )=\textrm{Tr}\,[\rho -\sigma ]+\int _{-\infty }^\infty \frac{dt}{|t|(1-t)^2}\textrm{Tr}\,[((1-t)\rho +t\sigma )_-]. \end{aligned}$$

Proof

By [9, Theorem 6], the equality holds if \(\dim (\mathcal {H})<\infty \). We will now prove that it can be extended to the case when \(\dim (\mathcal {H})=\infty \). Assume first that \(\textrm{supp}(\rho )\le \textrm{supp}(\sigma )\), so that we may assume that \(\sigma \) is faithful and \(\mathcal {H}\) is separable by restriction to the support of \(\sigma \). We will use a standard limiting argument to extend the finite dimensional result to the separable case.

Let \(P_n\) be an increasing sequence of finite rank projections such that \(\vee _n P_n=I\). Put \(\rho _n=P_n\rho P_n\), \(\sigma _n=P_n\sigma P_n\). Then restricted to the finite dimensional space \(P_n\mathcal {H}\), \(\rho _n\) and \(\sigma _n\) are positive semidefinite operators with \(\textrm{supp}(\rho _n)\le \textrm{supp}(\sigma _n)\). Moreover, \(\lim _n \textrm{Tr}\,[\rho _n]=\textrm{Tr}\,[\rho ]\), \(\lim _n\textrm{Tr}\,[\sigma _n]=\textrm{Tr}\,[\sigma ]\) and by [14, Theorem 4.5] we have \(D(\rho \Vert \sigma )=\lim _n D(\rho _n\Vert \sigma _n)\).

For \(t\in {\mathbb {R}}\) and \(n\in {\mathbb {N}}\), put

$$\begin{aligned} f_n(t):=\textrm{Tr}\,[((1-t)\rho _n-t\sigma _n)_-], \qquad f(t):=\textrm{Tr}\,[((1-t)\rho -t\sigma )_-]. \end{aligned}$$

Then

$$\begin{aligned} f_n(t)&=\textrm{Tr}\,[(t\sigma _n-(1-t)\rho _n)_+]=\sup _{0\le M_n\le P_n} \textrm{Tr}\,[M_n(t\sigma -(1-t)\rho )]\\ {}&\le \sup _{0\le M_{n+1}\le P_{n+1}}\textrm{Tr}\,[M_{n+1}(t\sigma -(1-t)\rho )]=\textrm{Tr}\,[((1-t)\rho _{n+1}-t\sigma _{n+1})_-]\\ {}&=f_{n+1}(t), \end{aligned}$$

where the inequality follows from \(0\le M_n\le P_n\le P_{n+1}\). Furthermore, since \(P_n\rightarrow I\) in the strong operator topology, we have using [13, Theorem 1] that

$$\begin{aligned} \Vert P_n((1-t)\rho -t\sigma )P_n\Vert _1\rightarrow \Vert (1-t)\rho -t\sigma \Vert _1. \end{aligned}$$

It follows that

$$\begin{aligned} f_n(t)&=\tfrac{1}{2}(\Vert (1-t)\rho _n-t\sigma _n\Vert _1-\textrm{Tr}\,[(1-t)\rho _n-t\sigma _n])\\&\quad \rightarrow \tfrac{1}{2} (\Vert (1-t)\rho -t\sigma \Vert _1-\textrm{Tr}\,[(1-t)\rho -t\sigma ))\\&=f(t). \end{aligned}$$

Hence \(f_n\) is an increasing sequence of positive integrable functions converging pointwise to f. Since the integral formula holds in finite dimensions, we see using the Lebesgue monotone convergence theorem that

$$\begin{aligned} D(\rho \Vert \sigma )&=\lim _n D(\rho _n\Vert \sigma _n)=\lim _n \left( \textrm{Tr}\,[\rho _n-\sigma _n]+\int _{-\infty }^\infty \frac{dt}{|t|(1-t)^2}f_n(t)\right) \\&=\textrm{Tr}\,[\rho -\sigma ]+\int _{-\infty }^\infty \frac{dt}{|t|(1-t)^2}f(t). \end{aligned}$$

If \(\textrm{supp}(\rho )\not \le \textrm{supp}(\sigma )\), then there is some projection Q such that \(\textrm{Tr}\,[\sigma Q]=0\) and \(c:=\textrm{Tr}\,[\rho Q]>0\). Then for any \(t>1\) we have

$$\begin{aligned} \textrm{Tr}\,[((1-t)\rho -t\sigma )_-]\ge \textrm{Tr}\,[Q(t\sigma -(1-t)\rho )]=(t-1)c \end{aligned}$$

and hence

$$\begin{aligned} \int _{-\infty }^{\infty }\frac{dt}{|t|(t-1)^2}\textrm{Tr}\,[((1-t)\rho -t\sigma )_-]\ge c\int _1^\infty \frac{dt}{t(t-1)}=\infty . \end{aligned}$$

In this case we also have \(D(\rho \Vert \sigma )=\infty \) by definition. \(\square \)

The integral formula leads to an easy proof of the fact that DPI for the relative entropy holds for all positive trace preserving maps. This fact was first proved in [36], using interpolation techniques.

For our purposes, the following form of the integral formula will be useful.

Corollary 1

Let \(\rho ,\sigma \in \mathcal {S}(\mathcal {H})\). Then for any \(\lambda ,\mu \ge 0\) such that \(\mu \sigma \le \rho \le \lambda \sigma \), we have

$$\begin{aligned} D(\rho \Vert \sigma )=\int _\mu ^\lambda \frac{ds}{s}\textrm{Tr}\,[(\rho -s\sigma )_-]+ \log (\lambda )+1-\lambda . \end{aligned}$$

Proof

Since \(((1-t)\rho +t\sigma )_-=0\) for \(t\in [0,1]\), the integral splits into two parts, integrating over \(t\le 0\) and \(t\ge 1\). For the first integral, since \(1-t>0\), we have \(((1-t)\rho +t\sigma )_-=(1-t)(\rho -\frac{t}{t-1}\sigma )_-\) and

$$\begin{aligned} \int _{-\infty }^0\frac{dt}{-t(1-t)^2}\textrm{Tr}\,[(1-t)\rho +t\sigma ]_-&=\int _{-\infty }^0 \frac{dt}{t(t-1)}\textrm{Tr}\,[(\rho -\frac{t}{t-1}\sigma )_-]\\&=\int _0^1\frac{ds}{s}\textrm{Tr}\,[(\rho -s\sigma )_-]\\&=\int _\mu ^1\frac{ds}{s}\textrm{Tr}\,[(\rho -s\sigma )_-]. \end{aligned}$$

For \(t\ge 1\), we use \(((1-t)\rho +t\sigma )_-=((t-1)\rho -t\sigma )_+=(t-1)(\rho -\frac{t}{t-1}\sigma )_+\) and inserting into the integral, we obtain

$$\begin{aligned} \int _1^\infty \frac{dt}{t(t-1)^2}\textrm{Tr}\,[((1-t)\rho +t\sigma )_-]= & {} \int _1^\infty \frac{ds}{s}\textrm{Tr}\,[(\rho -s\sigma )_+]\\ {}= & {} \int _1^\lambda \frac{ds}{s}\textrm{Tr}\,[(\rho -s\sigma )_+]. \end{aligned}$$

The proof is finished by using the equality \(\textrm{Tr}\,[(\rho -s\sigma )_+]=1-s +\textrm{Tr}\,[(\rho -s\sigma )_-]\).

\(\square \)

Remark 1

The smallest value of \(\lambda \) in the above expression is related to the quantum max-relative entropy defined as

$$\begin{aligned} D_{\max }(\rho \Vert \sigma ):=\log \min \{\lambda ,\ \rho \le \lambda \sigma \}. \end{aligned}$$

Similarly, the largest value of \(\mu \) is \(e^{-D_{\max }(\sigma \Vert \rho )}\). An important related quantity is the Hilbert projective metric [3]

$$\begin{aligned} D_{\Omega }(\rho \Vert \sigma ):=D_{\max }(\rho \Vert \sigma )+D_{\max }(\sigma \Vert \rho ). \end{aligned}$$

See [41,42,43] for more details and interpretations in the context of quantum information theory. Note also that we may always put \(\mu =0\) and if \(\dim (\mathcal {H})<\infty \), then the condition that \(\rho \le \lambda \sigma \) for some \(\lambda >0\) is equivalent to \(\textrm{supp}(\rho )\le \textrm{supp}(\sigma )\), so it holds whenever \(D(\rho \Vert \sigma )\) is finite. In infinite dimensions, this condition is much more restrictive.

2.3 Sufficiency and recoverability for quantum channels

The following definition first appeared in [39] and can be seen as a quantum generalization of the classical notion of a sufficient statistic.

Definition 1

We say that a channel \(\Phi :B(\mathcal {H})\rightarrow B(\mathcal {K})\) is sufficient with respect to a set of states \(\mathcal {S}\subseteq \mathcal {S}(\mathcal {H})\) if there exists a channel \(\Psi :B(\mathcal {K})\rightarrow B(\mathcal {H})\) such that

$$\begin{aligned} \Psi \circ \Phi (\rho )=\rho ,\qquad \forall \rho \in \mathcal {S}. \end{aligned}$$

For a state \(\sigma \in \mathcal {S}(\mathcal {H})\), we define an inner product \(\langle \,\cdot ,\cdot \,\rangle _\sigma \) in \(B(\textrm{supp}(\sigma ))\) by

$$\begin{aligned} \langle \,A,B\,\rangle _\sigma :=\textrm{Tr}\,[A^*\sigma ^{1/2}B\sigma ^{1/2}],\qquad A,B\in B(\textrm{supp}(\sigma )). \end{aligned}$$

It was proved in [39] that the (unique) linear map \(\Phi _\sigma : \mathcal {T}(\textrm{supp}(\Phi (\sigma )))\rightarrow \mathcal {T}(\textrm{supp}(\sigma ))\) determined by

$$\begin{aligned} \langle \,\Phi ^*(B),A\,\rangle _\sigma =\langle \,B,\Phi _\sigma ^*(A)\,\rangle _{\Phi (\sigma )},\qquad A\in B(\textrm{supp}(\sigma )), B\in B(\textrm{supp}(\Phi (\sigma ))) \end{aligned}$$

is a channel, called the Petz dual of \(\Phi \) with respect to \(\sigma \) (or the Petz recovery map). Note that we always have \(\Phi _\sigma \circ \Phi (\sigma )=\sigma \) and as it was further proved in [39], if both \(\sigma \) and \(\Phi (\sigma )\) are faithful, then \(\Phi \) is sufficient with respect to \(\mathcal {S}\) if and only if \(\Phi _\sigma \circ \Phi (\rho )=\rho \) for all \(\rho \in \mathcal {S}\), so that \(\Phi _\sigma \) is a universal recovery channel.

Remark 2

If \(\dim (\mathcal {H})<\infty \), we obtain the familiar form of the Petz recovery channel:

$$\begin{aligned} \Phi _\sigma (\cdot )= \sigma ^{1/2}\Phi ^*(\Phi (\sigma )^{-1/2}\cdot \Phi (\sigma )^{-1/2})\sigma ^{1/2}. \end{aligned}$$

We also define

$$\begin{aligned} \Phi _{\sigma ,t}(\cdot )= \sigma ^{-it}\Phi _\sigma (\Phi (\sigma )^{it}\cdot \Phi (\sigma )^{-it})\sigma ^{it},\qquad t\in {\mathbb {R}} \end{aligned}$$

and

$$\begin{aligned} \Phi _{\sigma ,\mu }(\cdot )=\int _{-\infty }^\infty \Phi _{\sigma ,t}(\cdot )d\mu (t), \end{aligned}$$

for a probability measure \(\mu \) on \({\mathbb {R}}\). Clearly, all these maps are channels \(\mathcal {T}(\textrm{supp}(\Phi (\sigma )))\rightarrow \mathcal {T}(\textrm{supp}(\sigma ))\) that recover the state \(\sigma \).

Theorem 2

Assume that \(\rho ,\sigma \in \mathcal {S}(\mathcal {H})\) are such that \(D(\rho \Vert \sigma )<\infty \). Then the following are equivalent.

  1. (i)

    \(\Phi \) is sufficient with respect to \(\{\rho ,\sigma \}\);

  2. (ii)

    \(D(\Phi (\rho )\Vert \Phi (\sigma ))=D(\rho \Vert \sigma )\);

  3. (iii)

    \(\Phi _{\sigma ,t}\circ \Phi (\rho )=\rho \), for some \(t\in {\mathbb {R}}\);

  4. (iv)

    \(\Phi _{\sigma ,t}\circ \Phi (\rho )=\rho \), for all \(t\in {\mathbb {R}}\);

  5. (v)

    \(\Phi _{\sigma ,\mu }\circ \Phi (\rho )=\rho \) for some probability measure \(\mu \).

Proof

In finite dimensions the proof follows from [53, Theorem 3.3]. The proof in the general case will be given in the Appendix. \(\square \)

The following is an approximate version of sufficiency of channels, called recoverability of \(\Phi \).

Theorem 3

[28] Let \(\sigma \in \mathcal {S}(\mathcal {H})\). Then for any channel \(\Phi : \mathcal {T}(\mathcal {H})\rightarrow \mathcal {T}(\mathcal {K})\) there exists a channel \(\Phi ^u_\sigma : \mathcal {T}(\mathcal {K})\rightarrow \mathcal {T}(\mathcal {H})\) such that \(\Phi ^u_\sigma \circ \Phi (\sigma )=\sigma \) and for any \(\rho \in \mathcal {S}(\mathcal {H})\) we have

$$\begin{aligned} D(\rho \Vert \sigma ){} & {} \ge D(\Phi (\rho )\Vert \Phi (\sigma ))-2\log F(\rho ,\Phi ^u_\sigma \circ \Phi (\rho ))\\{} & {} \ge D(\Phi (\rho )\Vert \Phi (\sigma )) + \frac{1}{4} \Vert \rho -\Phi ^u_\sigma \circ \Phi (\rho )\Vert _1^2. \end{aligned}$$

In the above theorem, \(F(\rho _0,\rho _1)\) is the fidelity

$$\begin{aligned} F(\rho _0,\rho _1)=\Vert \rho _0^{1/2}\rho _1^{1/2}\Vert _1. \end{aligned}$$

The second inequality in Theorem 3 is obtained using the inequality \(-\log (x)\ge 1-x\) for \(x\in (0,1)\) and the Fuchs-van de Graaf inequality, [10].

The universal recovery channel \(\Phi ^u_\sigma \) can be chosen as

$$\begin{aligned} \Phi ^u_\sigma (\cdot )=\Phi _{\sigma ,\beta _0}(P\cdot P)+\textrm{Tr}\,[(I-P)\cdot ], \end{aligned}$$
(3)

here \(P=\textrm{supp}(\Phi (\sigma ))\) and \(\beta _0\) is the probability density function

$$\begin{aligned} \beta _0(t)=\frac{\pi }{\cosh (2\pi t)+1}. \end{aligned}$$

Note that if \(\textrm{supp}(\rho )\le \textrm{supp}(\sigma )\), then \(\textrm{supp}(\Phi (\rho ))\le \textrm{supp}(\Phi (\sigma ))\), so that \(\Phi ^u_\sigma (\Phi (\rho ))=\Phi _{\sigma ,\beta _0}(\Phi (\rho ))\) and the statement in this case follows by [28, Theorem 2.1]. If \(\textrm{supp}(\rho )\not \le \textrm{supp}(\sigma )\), then \(D(\rho \Vert \sigma )=\infty \) and the inequality holds trivially.

3 Sufficiency and recoverability by hypothesis testing

The characterization in Theorem 2 and the integral formula in Corollary 1 now give an easy proof of characterization of sufficiency and recoverability by quantities related to hypothesis testing. Note that here we do not have to make any further assumptions about the states.

Theorem 4

Let \(\Phi :\mathcal {T}(\mathcal {H})\rightarrow \mathcal {T}(\mathcal {K})\) be a channel and let \(\rho ,\sigma \in \mathcal {S}(\mathcal {H})\). Then the following are equivalent.

  1. (i)

    \(P_e(\lambda , \Phi (\rho ),\Phi (\sigma ))= P_e(\lambda ,\rho ,\sigma )\), for all \(\lambda \in [0,1]\);

  2. (ii)

    \(\Vert \Phi (\rho )-s\Phi (\sigma )\Vert _1= \Vert \rho -s\sigma \Vert _1\), for all \(s\ge 0\);

  3. (iii)

    \(\textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_{+}]= \textrm{Tr}\,[(\rho -s\sigma )_{+}]\), for all \(s\ge 0\);

  4. (iv)

    \(\textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_{-}]= \textrm{Tr}\,[(\rho -s\sigma )_{-}]\), for all \(s\ge 0\);

  5. (v)

    \(\Phi \) is sufficient with respect to \(\{\rho ,\sigma \}\).

Proof

The equivalences between (i)-(iv) are clear from Lemma 1. Assume that (iv) holds. Suppose first that \(\rho \le \lambda \sigma \) for some \(\lambda >0\). Then also \(\Phi (\rho )\le \lambda \Phi (\sigma )\) and we have by Corollary 1

$$\begin{aligned} D(\Phi (\rho )\Vert \Phi (\sigma ))&=\int _0^\lambda \frac{ds}{s}\textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_-]+\log (\lambda )+1-\lambda \\&= \int _0^\lambda \frac{ds}{s}\textrm{Tr}\,[(\rho -s\sigma )_-]+\log (\lambda )+1-\lambda =D(\rho \Vert \sigma ). \end{aligned}$$

By Theorem 2, this implies (v). In the general case, let \(\sigma _0=\frac{1}{2}(\rho +\sigma )\), then \(\rho \le 2\sigma _0\) and it is easily seen that the equality (ii) implies a similar equality with \(\sigma \) replaced by \(\sigma _0\). It follows that \(\Phi \) is sufficient with respect to \(\{\rho ,\sigma _0\}\), which implies (v). The implication (v) \(\implies \) (ii) follows from monotonicity of the \(L_1\)-distance. \(\square \)

We are now interested in a similar result for recoverability. Assume first that there is a channel \(\Lambda : \mathcal {T}(\mathcal {K})\rightarrow \mathcal {T}(\mathcal {H})\) such that \(\Lambda \circ \Phi (\sigma )=\sigma \) and \(\Vert \Lambda \circ \Phi (\rho )-\rho \Vert _1\le \epsilon \). We then have

$$\begin{aligned} \Vert \rho -s\sigma \Vert _1=\Vert \rho -\Lambda \circ \Phi (\rho )+\Lambda \circ \Phi (\rho )-s\Lambda \circ \Phi (\sigma )\Vert _1\le \Vert \Phi (\rho )-s\Phi (\sigma )\Vert _1+\epsilon . \end{aligned}$$
(4)

Using Lemma 1, we see that the resulting inequality in (4) is equivalent to any of the following inequalities

$$\begin{aligned} \textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_+]&\ge \textrm{Tr}\,[(\rho -s\sigma )_+]-\frac{\epsilon }{2},\qquad s\ge 0 \end{aligned}$$
(5)
$$\begin{aligned} \textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_-]&\ge \textrm{Tr}\,[(\rho -s\sigma )_-]-\frac{\epsilon }{2},\qquad s\ge 0 \end{aligned}$$
(6)
$$\begin{aligned} P_e(\lambda ,\Phi (\sigma ),\Phi (\rho ))&\le P_e(\lambda ,\sigma ,\rho )+\frac{1-\lambda }{2}\epsilon ,\qquad \lambda \in [0,1]. \end{aligned}$$
(7)

The following result gives the converse statement. Note that here we will need the assumption that the Hilbert projective metric \(D_\Omega (\rho \Vert \sigma )\) is finite, equivalently, that \(\mu \sigma \le \rho \le \lambda \sigma \) for some \(\mu ,\lambda >0\) (see Remark 1), to get a nontrivial result.

Theorem 5

Let \(\rho ,\sigma \in \mathcal {S}(\mathcal {H})\) and let \(\Phi :\mathcal {T}(\mathcal {H})\rightarrow \mathcal {T}(\mathcal {K})\) be a quantum channel. If

$$\begin{aligned} \Vert \Phi (\rho )-s\Phi (\sigma )\Vert _1\ge \Vert \rho -s\sigma \Vert _1-\epsilon ,\qquad \forall s\ge 0 \end{aligned}$$

holds for some \(\epsilon \ge 0\), then there exists a channel \(\Lambda :\mathcal {T}(\mathcal {K})\rightarrow \mathcal {T}(\mathcal {H})\) such that \(\Lambda \circ \Phi (\sigma )=\sigma \) and

$$\begin{aligned} \Vert \Lambda \circ \Phi (\rho )-\rho \Vert _1\le \sqrt{2\epsilon }D_\Omega (\rho \Vert \sigma )^{1/2}. \end{aligned}$$

Moreover, we may take \(\Lambda =\Phi ^u_\sigma \) as in (3).

Proof

The statement is trivial if \(D_\Omega (\rho \Vert \sigma )=\infty \), so assume that \(\mu \sigma \le \rho \le \lambda \sigma \) for \(\mu ,\lambda >0\), \(\mu =e^{-D_{\max }(\sigma \Vert \rho )}\) and \(\lambda =e^{D_{\max }(\rho \Vert \sigma )}\). Then also \(\mu \Phi (\sigma )\le \Phi (\rho )\le \lambda \Phi (\sigma )\). By the assumptions, inequality (6) holds. Using Corollary 1, we get

$$\begin{aligned} D(\rho \Vert \sigma )-D(\Phi (\rho )\Vert \Phi (\sigma ))&=\int _\mu ^\lambda \frac{ds}{s}\bigl (\textrm{Tr}\,[(\rho -s\sigma )_-]-\textrm{Tr}\,[(\Phi (\rho )-s\Phi (\sigma ))_-]\bigr )\\&\le \frac{\epsilon }{2} \int _\mu ^\lambda \frac{1}{s}ds=\frac{\epsilon }{2}\bigl (\log (\lambda )-\log (\mu )\bigr )=\frac{\epsilon }{2}D_\Omega (\rho \Vert \sigma ). \end{aligned}$$

The statement now follows by Theorem 3. \(\square \)

Remark 3

The recoverability result can be also formulated in the setting of comparison of statistical experiments, which is an extension of the classical theory of Blackwell [1], Törgersen [51] and Le Cam [29]. A (quantum) statistical experiment is any parametrized family of (quantum) states. For two experiments \({\mathcal {E}}\) and \({\mathcal {E}}_0\) with the same parameter set (not necessarily living on the same Hilbert space), we say that \({\mathcal {E}}_0\) is \((2,\epsilon )\)-deficient with respect to \({\mathcal {E}}\) if the error probabilities of testing problems involving elements of \({\mathcal {E}}_0\) are up to \(\epsilon \) not worse than those of corresponding testing problems for \({\mathcal {E}}\). See [19] for a more precise definition. In particular, for \({\mathcal {E}}_0=\{\rho _0,\sigma _0\}\) and \({\mathcal {E}}=\{\rho ,\sigma \}\), this amounts to the condition

$$\begin{aligned} P_e(\lambda ,\rho _0,\sigma _0)\le P_e(\lambda ,\rho ,\sigma )+\epsilon ,\qquad \forall \lambda \in [0,1]. \end{aligned}$$

Using Lemma 1, we see that this is equivalent to any of the inequalities

$$\begin{aligned} \Vert \rho _0-s\sigma _0\Vert _1&\ge \Vert \rho -s\sigma \Vert _1-2\epsilon (1+s),\qquad s\ge 0\\ \textrm{Tr}\,[(\rho _0-s\sigma _0)_{+}]&\ge \textrm{Tr}\,[(\rho -s\sigma )_{+}]-\epsilon (1+s),\qquad s\ge 0\\ \textrm{Tr}\,[(\rho _0-s\sigma _0)_{-}]&\ge \textrm{Tr}\,[(\rho -s\sigma )_{-}]-\epsilon (1+s), \qquad s\ge 0. \end{aligned}$$

It is easily seen that this is true if there is some channel \(\Lambda \) such that \(\Vert \Lambda (\rho _0)-\rho \Vert _1\le \epsilon \) and \(\Vert \Lambda (\sigma _0)-\sigma \Vert _1\le \epsilon \). In the classical case the converse holds, but in the quantum case this is not true. We can obtain some form of the converse statement if \(\rho _0=\Phi (\rho )\) and \(\sigma _0=\Phi (\sigma )\), in a similar way as in Theorem 5.