Abstract
In this paper, we develop a theory of learning nonlinear input–output maps with fading memory by dissipative quantum systems, as a quantum counterpart of the theory of approximating such maps using classical dynamical systems. The theory identifies the properties required for a class of dissipative quantum systems to be universal, in that any input–output map with fading memory can be approximated arbitrarily closely by an element of this class. We then introduce an example class of dissipative quantum systems that is provably universal. Numerical experiments illustrate that with a small number of qubits, this class can achieve comparable performance to classical learning schemes with a large number of tunable parameters. Further numerical analysis suggests that the exponentially increasing Hilbert space presents a potential resource for dissipative quantum systems to surpass classical learning schemes for input–output maps.
Similar content being viewed by others
Change history
10 October 2019
Unfortunately, some errors were detected after publication of the article referred to in the title. This errata corrects errors in the proofs of Lemma 5 on page 23 and Proposition 1 on page 26 of the article. The authors also noticed a weaker condition for Lemma 1 on page 19 to hold. The corrections and improvements do not impact the main results and conclusion of the article.
References
Preskill, J.: Quantum computing in the NISQ era and beyond (2018). Arxiv preprint arXiv:1801.00862
Mills, M.: Hearing aids and the history of electronics miniaturization. IEEE Ann. Hist. Comput. 22(3), 24 (2011)
Aaronson, S., Arkhipov, A.: The computational complexity of linear optics. In: Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC), pp. 333–342 (2011)
Lund, A.P., Bremner, M.J., Ralph, T.C.: Quantum sampling problems, Boson sampling and quantum supremacy. NPJ Quantum Inf. 3(1), 15 (2017)
Bremner, M.J., Jozsa, R., Shepherd, D.J.: Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy. Proc. R. Soc. A 467, 459 (2010)
Boixo, S., Isakov, S.V., Smelyanskiy, V.N., Babbush, R., Ding, N., Jiang, Z., Bremner, M.J., Martinis, J.M., Neven, H.: Characterizing quantum supremacy in near-term devices. Nat. Phys. 14(6), 595 (2018)
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195 (2017)
Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm (2014). ArXiv preprint arXiv:1411.4028
Peruzzo, A., McLean, J., Shadbolt, P., Yung, M., Zhou, X., Love, P.J., Aspuru-Guzik, A., O’Brien, J.L.: A variational eigenvalue solver on a quantum processor. Nat. Commun. 5, 4213 (2013)
McClean, J.R., Romero, J., Babbush, R., Aspuru-Guzik, A.: The theory of variational hybrid quantum-classical algorithms. New J. Phys. 18, 023023 (2016)
Wang, D., Higgott, O., Brierley, S.: A generalised variational quantum eigensolver (2018). ArXiv preprint arXiv:1802.00171
Mitarai, K., Negoro, M., Kitagawa, M., Fujii, K.: Quantum circuit learning. Phys. Rev. A 98(3), 032309 (2018)
Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J.M., Gambetta, J.M.: Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242 (2017)
Otterbach, J.S., et al.: Unsupervised machine learning on a hybrid quantum computer (2017). ArXiv preprint arXiv:1712.05771
Verstraete, F., Wolf, M.M., Cirac, J.I.: Quantum computation and quantum-state engineering driven by dissipation. Nat. Phys. 5(9), 633 (2009)
Alvarez-Rodriguez, U., Lamata, L., Escandell-Montero, P., Martín-Guerrero, J.D., Solano, E.: Supervised quantum learning without measurements. Sci. Rep. 7(1), 13645 (2017)
Fujii, K., Nakajima, K.: Harnessing disordered-ensemble quantum dynamics for machine learning. Phys. Rev. Appl. 8(2), 024030 (2017)
Nakajima, K., Fujii, K., Negoro, M., Mitarai, K., Kitagawa, M.: Boosting computational power through spatial multiplexing in quantum reservoir computing. Phys. Rev. Appl. 11(3), 034021 (2019)
Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communications. Science 304, 5667 (2004)
Maass, W., Natschläger, T., Markram, H.: Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 14, 2531 (2002)
Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127 (2009)
Pavlov, A., van de Wouw, N., Nijmeijer, H.: Convergent systems: analysis and synthesis. In: Meurer, T., Graichen, K., Gilles, E.D. (eds.) Control and Observer Design for Nonlinear Finite and Infinite Dimensional Systems. Lecture Notes in Control and Information Science, vol. 322, pp. 131–146. Springer, Berlin (2005)
Boyd, S., Chua, L.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. 32(11), 1150 (1985)
Appelant, L., et al.: Information processing using a single dynamical node as complex systems. Nat. Commun. 2, 468 (2011)
Torrejon, J., et al.: Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428 (2017)
Grigoryeva, L., Ortega, J.P.: Echo state networks are universal. Neural Netw. 108, 495 (2018)
Grigoryeva, L., Ortega, J.P.: Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems. J. Mach. Learn. Res. 19(1), 892 (2018)
Buehner, M., Young, P.: A tighter bound for the echo state property. IEEE Trans. Neural Netw. 17(3), 820 (2006)
Dieudonné, J.: Foundations of Modern Analysis. Read Books Ltd, Redditch (2013)
Ni, X., Verhaegen, M., Krijgsman, A.J., Verbruggen, H.B.: A new method for identification and control of nonlinear dynamic systems. Eng. Appl. Artif. Intell. 9(3), 231 (1996)
Atiya, A.F., Parlos, A.G.: New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Trans. Neural Netw. 11(3), 697 (2000)
Dormand, J.R., Prince, P.J.: A family of embedded Runge–Kutta formulae. J. Comput. Appl. Math. 6(1), 19 (1980)
Lukoševičius, M.: A practical guide to applying echo state networks. In: Neural Networks: Tricks of the Trade, pp. 659–686. Springer (2012)
Trotter, H.F.: On the product of semi-groups of operators. Proc. Am. Math. Soc. 10(4), 545 (1959)
Suzuki, M.: Relationship among exactly soluble models of critical phenomena. I: 2D Ising model, dimer problem and the generalized XY-model. Prog. Theor. Phys. 46(5), 1337 (1971)
Vandersypen, L.M., Steffen, M., Breyta, G., Yannoni, C.S., Sherwood, M.H., Chuang, I.L.: Experimental realization of Shor’s quantum factoring algorithm using nuclear magnetic resonance. Nature 414(6866), 883 (2001)
IBM Q 20 Tokyo. https://www.research.ibm.com/ibm-q/technology/devices/. Accessed: 10 April 2019
Nielsen, M.A., Chuang, I.L.: Quantum computation and quantum information: 10th anniversary edition, 10th edn. Cambridge University Press, New York (2011)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics New York, vol. 1. Springer, New York (2001)
Bouten, L., van Handel, R., James, M.R.: A discrete invitation to quantum filtering and feedback. SIAM Rev. 51(2), 239 (2009)
Gross, J.A., Caves, C.M., Milburn, G.J., Combes, J.: Qubit models of weak continuous measurements: markovian conditional and open-system dynamics. Quantum Sci. Technol. 3(2), 024005 (2018)
Aleksandrowicz, G. et al.: Qiskit: an open-source framework for quantum computing (2019). https://doi.org/10.5281/zenodo.2562110
Richter, S., Werner, R.F.: Ergodicity of quantum cellular automata. J. Stat. Phys. 82(3–4), 963 (1996)
Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-Hill, New York (1964)
Perez-Garcia, D., Wolf, M.M., Petz, D., Ruskai, M.B.: Contractivity of positive and trace-preserving maps under \(L_p\) norms. J. Math. Phys. 47(8), 083506 (2006)
Kubrusly, C.S.: A concise introduction to tensor product. Far East J. Math. Sci. 22(2), 137 (2006)
Lang, S.: Complex Analysis. Graduate Texts in Mathematics. Springer-Verlag (1985). https://books.google.com.au/books?id=7S7vAAAAMAAJ
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 The convergence property
Recall from the main text that for a compact subset \(D \subseteq {\mathbb {R}}\) and \(L>0\), \(K_L(D)\) denotes the set of all real sequences \(u=\{u_k\}_{k \in {\mathbb {Z}}}\) taking values in \(D \cap [-L,L]\). Let \(K_{L}^{-}(D)\) and \(K_{L}^{+}(D)\) be subsets of input sequences in \(K_{L}(D)\) whose indices are restricted to \({\mathbb {Z}}^{-}=\{\ldots , -2, -1, 0\}\) and \({\mathbb {Z}}^{+} = \{1, 2, \ldots \}\), respectively. In the following, we write T for both input-independent and input-dependent CPTP maps. As in the main text, we write \(T(u_k)\) for a CPTP map that is determined by an input \(u_k\), and \(\Vert \cdot \Vert _{p}\) for any Schatten p-norm for \(p \in [1, \infty )\). All dissipative quantum systems considered here are finite-dimensional. We now state the definition of a convergent CPTP map with respect to \(K_{L}(D)\).
Definition 1
(Convergence) An input-dependent CPTP map T is convergent with respect to \(K_{L}(D)\) if there exists a sequence \(\delta = \{\delta _k\}_{k > 0}\) with \(\mathop {\lim }_{k \rightarrow \infty } \delta _{k} = 0\), such that for all \(u = \{u_{k}\}_{k \in {\mathbb {Z}}^{+}} \in K_{L}^{+}(D)\) and any two density operators \(\rho _{j, k}\) \((j = 1, 2)\) satisfying \(\rho _{j, k} = T(u_{k})\rho _{j, k-1}\), it holds that \(\left\| \rho _{1, k} - \rho _{2, k} \right\| _{2} \le \delta _{k}\). We call a dissipative quantum system whose dynamics is governed by a convergent CPTP map a convergent system.
The convergence property can be viewed as an extension of the mixing property for a noisy quantum channel described by an input-independent CPTP map [43].
Definition 2
(Mixing) A n-qubit dissipative quantum system described by a CPTP map T is mixing if for all \(\rho _{0} \in {\mathscr {D}}({\mathbb {C}}^{2^n})\), if there exists a unique density operator \(\rho _{*}\) such that,
We will see later that if an input-dependent CPTP map \(T(u_{k})\) satisfies the sufficient condition in Theorem 1, then \(T(u_k)\) is mixing for each \(u_k \in D \cap [-L, L]\).
Theorem 1
(Convergence property) A n-qubit dissipative quantum system governed by an input-dependent CPTP map T is convergent with respect to \(K_{L}(D)\) if, for all \(u_{k} \in D \cap [-L, L]\), \(T(u_k)\) on the hyperplane \(H_{0}(2^n)\) of \(2^n \times 2^n\) traceless Hermitian operators satisfies \(\Vert T(u_k)|_{H_{0}(2^n)} \Vert _{2-2} :=\sup _{A \in H_{0}(2^n), A \ne 0} \frac{\Vert T(u_k) A\Vert _{2}}{\Vert A\Vert _{2}} \le 1 - \epsilon \) for some \(0 < \epsilon \le 1\). Moreover, any pair of initial density operators converge uniformly to one another under T.
Proof
Let \(\rho _{1, 0}\) and \(\rho _{2, 0}\) be two arbitrary initial density operators and \(\rho _{1, 0} - \rho _{2, 0}\) is a traceless Hermitian operator. We have,
where the last inequality follows from the fact that for all \(\rho \in {\mathscr {D}}({\mathbb {C}}^{2^n})\), \(\left\| \rho \right\| _{2} \le 1\). \(\square \)
We remark that for a n-qubit dissipative quantum system that satisfies the condition in Theorem 1, any initial density operator \(\rho _{0}\) reaches the state \(\lim _{k \rightarrow \infty } \left( \overleftarrow{\prod }_{j=1}^{k}T(u_j) \right) \left( \frac{I}{2^{n}}\right) \). To see this, let
where \(\sigma ^{(i)}_{j_{i}}\) denotes, for qubit i, the identity operator I if \(j_{i}=0\), the Pauli X operator if \(j_{i} = 1\), the Pauli Y operator if \(j_{i}=2\) and the Pauli Z operator if \(j_{i}=3\). Since \(\bigotimes _{i=1}^{n} \sigma ^{(i)}_{j_i}\) for \(j_1j_2\ldots j_n\ne 0\) are all traceless Hermitian operators, therefore as \(k \rightarrow \infty \),
1.2 The universality property
We now show the universality property of convergent dissipative quantum systems. Let \({\mathbb {R}}^{{\mathbb {Z}}}\) be the set of all real-valued infinite sequences. Consider a n-qubit convergent dissipative quantum system described by Eqs. (1) and (2) in the main text, whose dynamics and output are defined by a CPTP map T and a functional \(h: {\mathscr {D}}({\mathbb {C}}^{2^n}) \rightarrow {\mathbb {R}}\), respectively. We associate this quantum system with an induced filter \(M^{T}_{h}: K_{L}(D) \rightarrow {\mathbb {R}}^{{\mathbb {Z}}}\), such that for any initial condition \(\rho _{-\infty } \in {\mathscr {D}}({\mathbb {C}}^{2^n})\), when evaluated at time \(t = k \tau \),
where \(\overrightarrow{\prod }_{j=0}^{\infty } T(u_{k-j}) = \mathop {\lim }_{N \rightarrow \infty } \overleftarrow{\prod }_{j=0}^{N} T(u_{k+(j-N)})=\lim _{N \rightarrow \infty } T(u_{k}) T(u_{k-1}) \cdots T(u_{k-N})\), and the limit is a point-wise limit. Lemma 1 states that this limit is well-defined.
Lemma 1
The filter \(M_{h}^{T}: K_{L}(D) \rightarrow {\mathbb {R}}^{{\mathbb {Z}}}\) is well-defined. In particular, the limit
exists and is independent of \(\rho _{-N}\).
Proof
The set \({\mathscr {D}}({\mathbb {C}}^{2^n})\) equipped with the distance function induced by the norm \(\Vert \cdot \Vert _{2}\) is a complete metric space. Therefore, every Cauchy sequence converges to a point in \({\mathscr {D}}({\mathbb {C}}^{2^n})\) [44]. It remains to show that \(S_{n} = T(u_{k})T(u_{k-1}) \cdots T(u_{k-n}) \rho _{-n}\) is a Cauchy sequence. By hypothesis, for all \(u_{k} \in D \cap [-L, L]\), \(\Vert T(u_{k})|_{H_{0}(2^{n})}\Vert _{2} \le 1 - \epsilon \) for some \(0 < \epsilon \le 1\). For any \(\epsilon ' > 0\), let \(N>0\) such that \((1-\epsilon )^{N} < \frac{\epsilon '}{2}\). Then for all \(n, m > N\), suppose that \(n \le m\),
\(\square \)
This filter is causal since given \(u, v \in K_{L}(D)\) satisfying \(u_{\tau } = v_{\tau }\) for \(\tau \le k\), \(M_{h}^{T}(u)_{k} = M_{h}^{T}(v)_{k}\). For any \(\tau \in {\mathbb {Z}}\), let \(M_{\tau }\) be the shift operator defined by \(M_{\tau }(u)_{k} = u_{k-\tau }\). A filter is said to be time invariant if it commutes with \(M_{\tau }\). It is straightforward to show that \(M_{h}^{T}\) is time invariant.
For a time-invariant and causal filter, there is a corresponding functional \(F_{h}^{T}: K_{L}^{-}(D) \rightarrow {\mathbb {R}}\) defined as \(F_{h}^{T}(u) = M_{h}^{T}(u)_0\) (see [23]). The corresponding filter can be recovered via \(M_{h}^{T}(u)_{k} = F_{h}^{T}(P \circ M_{-k}(u)) \), where P truncates u up to 0, that is \(P(u) = u|_0\). We say a filter \(M_{h}^{T}\) has the fading memory property if and only if \(F_{h}^{T}\) is continuous with respect to a weighted norm defined as follows.
Definition 3
(Weighted norm) For a null sequence \(w=\{w_{k}\}_{k \ge 0}\), that is \(w: \{0\} \cup {\mathbb {Z}}^{+} \rightarrow (0, 1]\) is decreasing and \(\mathop {\lim }_{k \rightarrow \infty } w_{k} = 0\), define a weighted norm \(\Vert \cdot \Vert _{w}\) on \(K^{-}_{L}(D)\) as \(\Vert u\Vert _{w} :=\mathop {\sup _{k \in {\mathbb {Z}}^{-}}} \left| u_{k} \right| w_{-k}\).
Definition 4
(Fading memory) A time-invariant causal filter \(M: K_{L}(D) \rightarrow {\mathbb {R}}^{{\mathbb {Z}}}\) has the fading memory property with respect to a null sequence w if and only if its corresponding functional \(F: K_{L}^{-}(D) \rightarrow {\mathbb {R}}\) is continuous with respect to the weighted norm \(\Vert \cdot \Vert _{w}\).
To emphasize that the fading memory property is defined with respect to a null sequence w, we will say that M is a w-fading memory filter and the corresponding functional F is a w-fading memory functional. We state the following compactness result [27, Lemma 2] and the Stone–Weierstrass theorem [29, Theorem 7.3.1].
Lemma 2
(Compactness) For any null sequence w, \(K_{L}^{-}(D)\) is compact with the weighted norm \(\Vert \cdot \Vert _{w}\).
We write \((K_{L}^{-}(D), \Vert \cdot \Vert _{w})\) to denote the space \(K_{L}^{-}(D)\) equipped with the weighted norm \(\Vert \cdot \Vert _{w}\).
Theorem 2
(Stone–Weierstrass) Let E be a compact metric space and C(E) be the set of real-valued continuous functions defined on E. If a subalgebra A of C(E) contains the constant functions and separates points of E, then A is dense in C(E).
Let \(C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})\) be the set of continuous functionals \(F: (K_{L}^{-}(D), \Vert \cdot \Vert _{w}) \rightarrow {\mathbb {R}}\). The following theorem is a result of the compactness of \((K_{L}^{-}(D), \Vert \cdot \Vert _{w})\) (Lemma 2) and the Stone–Weierstrass theorem (Theorem 2).
Theorem 3
Let w be a null sequence. For convergent CPTP maps T, let \({\mathscr {M}}_{w} = \{M_{h}^{T} \mid h : {\mathscr {D}}({\mathbb {C}}^{2^{n}}) \rightarrow {\mathbb {R}}\}\) be a set of w-fading memory filters. Let \({\mathscr {F}}_{w}\) be the family of corresponding w-fading memory functionals defined on \(K_{L}^{-}(D)\). If \({\mathscr {F}}_{w}\) forms a polynomial algebra of \(C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})\), contains the constant functionals and separates points of \(K_{L}^{-}(D)\), then \({\mathscr {F}}_{w}\) is dense in \(C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})\). That is for any w-fading memory filter \(M_{*}\) and any \(\epsilon > 0\), there exists \(M_{h}^{T} \in {\mathscr {M}}_{w}\) such that for all \(u \in K_{L}(D)\), \(\Vert M_{*}(u) - M_{h}^{T}(u)\Vert _{\infty } = \sup _{k \in {\mathbb {Z}}}|M_{*}(u)_{k} - M_{h}^{T}(u)_{k}| < \epsilon \).
Proof
\({\mathscr {F}}_{w}\) is dense follows from Lemma 2 and Theorem 2. To prove the second part of the theorem, since \({\mathscr {F}}_{w}\) is dense in \(C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})\), for any w-fading memory functional \(F_{*}\) and any \(\epsilon >0\) , there exists \(F_{h}^{T} \in {\mathscr {F}}_{w}\) such that for all \(u_{-} \in K_{L}^{-}(D)\), \( | F_{*}(u_{-}) - F_{h}^{T}(u_{-})| < \epsilon . \) For \(u \in K_{L}(D)\), notice that \(P \circ M_{-k}(u) \in K_{L}^{-}(D)\) for all \(k\in {\mathbb {Z}}\); hence,
Since this is true for all \(k \in {\mathbb {Z}}\), therefore for all \(u \in K_{L}(D)\), \(\Vert M_{*}(u) - M_{h}^{T}(u) \Vert _{\infty } < \epsilon \). \(\square \)
1.3 Fading memory property and polynomial algebra
Before we prove the universality of the family of dissipative quantum systems introduced in Sect. 4 in the main text, we first show two important observations regarding the multivariate polynomial output in Eq. (3).
We specify h to be the multivariate polynomial as in the right-hand side of Eq. (3) in the main text. For ease of notation, we drop the subscript h in \(F_{h}^{T}\) and \(M_{h}^{T}\). Let \({\mathscr {F}} = \{F^{T}\}\) be the set of functionals induced from dissipative quantum systems given by Eqs. (1) and (3) in the main text. We will show in Lemma 3 that the convergence and continuity of T are sufficient to guarantee the fading memory property of \(F^{T}\), and in Lemma 5 that \({\mathscr {F}}\) forms a polynomial algebra, made of fading memory functionals. In the following, let \({\mathscr {L}}({\mathbb {C}}^{2^n})\) be the set of linear operators on \({\mathbb {C}}^{2^n}\), and for a CPTP map T, for all \(u_k \in D \cap [-L, L]\), define \(\Vert T(u_k)\Vert _{2-2} :=\sup _{A \in {\mathscr {L}}({\mathbb {C}}^{2^n}), \Vert A\Vert _{2}=1} \Vert T(u_k) A\Vert _{2}\).
Lemma 3
(Fading memory property) Consider a n-qubit dissipative quantum system with dynamics Eq. (1) and output Eq. (3). Suppose that for all \(u_k \in D \cap [-L, L]\), the CPTP map \(T(u_k)\) satisfies the condition in Theorem 1, so that it is convergent. Moreover, for any \(\epsilon > 0\), there exists \(\delta _{T}(\epsilon ) > 0\) such that \(\Vert T(x) - T(y)\Vert _{2-2} < \epsilon \) whenever \(|x - y| < \delta _{T}(\epsilon )\) for \(x, y \in D \cap [-L, L]\). Then for any null sequence w, the induced filter \(M^{T}\) and the corresponding functional \(F^{T}\) are w-fading memory.
Proof
We first state the boundedness of CPTP maps [45, Theorem 2.1].
Lemma 4
For a CPTP map \(T : {\mathscr {L}}({\mathbb {C}}^{2^n}) \rightarrow {\mathscr {L}}({\mathbb {C}}^{2^n})\), we have \(\left\| T \right\| _{2-2} \le \sqrt{2^n}\).
Moreover, recall that \({\mathrm{Tr}}(\cdot )\) is continuous, that is for any \(\epsilon >0\), there exists \(\delta _{{\mathrm{Tr}}}(\epsilon ) > 0\) such that \(|{\mathrm{Tr}}(A-B)|<\epsilon \) whenever \(\Vert A - B \Vert _{2} <\delta _{{\mathrm{Tr}}}(\epsilon )\) for any complex matrices A, B. Note that here \(\Vert \cdot \Vert _{2}\) denotes the Schatten 2-norm or the Hilbert–Schmidt norm.
Let w be an arbitrary null sequence. We will show the linear terms L(u) in the functional \(F^{T}\) are continuous with respect to \(\left\| \cdot \right\| _{w}\), and the continuity property of \(F^T\) follows from the fact that finite sums and products of continuous elements are also continuous.
For any \(u, v \in K_{L}^{-}(D)\),
Denote \(\rho _{u} = \left( \overrightarrow{\prod }_{k=N}^{\infty }T(u_{-k}) \right) \rho _{-\infty }\) and \(\rho _{v} = \left( \overrightarrow{\prod }_{k=N}^{\infty }T(v_{-k}) \right) \rho _{-\infty }\) for some \(0< N < \infty \),
Since T satisfies conditions in Theorem 1, any two density operators converge uniformly to one another. Therefore, for any \(\epsilon >0\), there exists \(N(\epsilon ) > 0\) such that for all \(N' > N(\epsilon )\),
Choose \(N' = N(\epsilon ) + 1\) and bound the first term inside the bracket on the right hand side of Eq. (4) by rewriting it as a telescopic sum:
where the last inequality follows from Lemma 4. We claim that for any \(\epsilon > 0\), if
then \(\left| L(u) - L(v) \right| < \epsilon \). Indeed, since w is decreasing, the above condition implies that
Since \(w_{N(\epsilon )} > 0\), for all \(0 \le l \le N(\epsilon )\),
By continuity of T, we bound Eq. (6) by
Since \(\left\| \rho _{u} \right\| _{2} \le 1\), Eqs. (4), (5) and (7) give
The result now follows from the continuity of \({\mathrm{Tr}}(\cdot )\). \(\square \)
Lemma 5
(Polynomial algebra) Let \({\mathscr {F}}=\{F^{T}\}\) be a family of functionals induced by dissipative quantum systems defined by Eqs. (1) and (3) in the main text. If for each member \(F^{T} \in {\mathscr {F}}\), T satisfies the conditions in Lemma 3, then for any null sequence w, \({\mathscr {F}}\) forms a polynomial algebra consisting of w-fading memory functionals.
Proof
Consider two dissipative quantum systems described by Eqs. (1) and (3), with \(n_{1}\) and \(n_{2}\) system qubits, respectively. Let \(\rho _{k}^{(m)} \in {\mathscr {D}}({\mathbb {C}}^{2^{n_m}})\) be the state and \(T^{(m)}\) be the CPTP map of the \(m{\mathrm{th}}\) system. Let \(j_{1} = 1, \ldots , n_{1}\) and \(j_{2} = 1, \ldots , n_{2}\) be the respective qubit indices for the two systems. For the observable \(Z^{(j_{m})}\) of qubit \(j_{m}\), notice that
where I is the identity operator. Therefore, we can relabel the qubit for the combined system described by the density operator \(\rho _{k}^{(1)} \otimes \rho _{k}^{(2)}\) by j, running from \(j = 1\) to \(j = n_{1} + n_{2}\). Using this notation, the above expectations can be re-expressed as
Following this idea, write out the outputs of two systems as follows,
For any \(\lambda \in {\mathbb {R}}\), let \(n = n_{1} + n_{2}\) and k denote the qubit index of the combined system running from \(k=1\) to \(k=n\), and \(R = \max \{R_1, R_2\}\), then
where the weights \({\bar{w}}_{k_1, \ldots , k_n}^{r_{k_1}, \ldots , r_{k_n}}\) are changed accordingly. For instance, if all \(k_{m} \le n_{1}\) for \(m=1,\ldots ,n\), then \({\bar{w}}_{k_1, \ldots , k_n}^{r_{k_1}, \ldots , r_{k_n}} = w_{i_1, \ldots , i_{n_1}}^{r_{i_1}, \ldots , r_{i_{n_1}}}\), corresponding to the weights for the output \({\bar{y}}^{(1)}_{k}\). Similarly, let \(R = R_1 + R_2\),
Therefore, \({\bar{y}}_{k}^{(1)} + \lambda {\bar{y}}_{k}^{(2)}\) and \({\bar{y}}_{k}^{(1)} {\bar{y}}_{k}^{(2)}\) again have the same form as the right-hand side of Eq. (3) in the main text. This implies that for any functionals \(F^{T^{(1)}}, F^{T^{(2)}} \in {\mathscr {F}}\), \(F^{T^{(1)}} + \lambda F^{T^{(2)}} \in {\mathscr {F}}\) and \(F^{T^{(1)}} F^{T^{(2)}} \in {\mathscr {F}}\). Thus, \({\mathscr {F}}\) forms a polynomial algebra.
It remains to show that for all \(u_k \in D \cap [-L, L]\), \(\Vert T(u_k) |_{H_0(2^{n})} \Vert _{2-2} = \Vert (T^{(1)}(u_k) \otimes T^{(2)}(u_k))|_{H_0(2^n)} \Vert _{2-2} \le 1-\epsilon \) for some \(0 < \epsilon \le 1\). This will imply that \(F^{T^{(1)}} + \lambda F^{T^{(2)}}\) and \(F^{T^{(1)}} F^{T^{(2)}}\) are w-fading memory by Lemma 3, and that \({\mathscr {F}}\) forms a polynomial algebra consisting of w-fading memory functionals. Suppose that for all \(u_k \in D \cap [-L, L]\), \(\Vert T(u_{k})|_{H_0(2^{n_m})}\Vert _{2-2} \le 1-\epsilon _{m}\) for \(m=1, 2\). Adopting the proof of [46, Proposition 3], let \(A = \sum _{i} A_{i} \otimes \tilde{A}_{i}\) be a traceless Hermitian operator. Without loss of generality, we assume that \(\{\tilde{A}_i\}\) is an orthonormal set with respect to the Hilbert–Schmidt inner product. Then \(\{A_{i} \otimes \tilde{A}_{i}\}\) and \(\{T^{(1)}(u_k)|_{H_0(2^{n_{1}})} A_{i} \otimes \tilde{A}_{i} \}\) are two orthogonal sets. By the Pythagoras theorem, \(T^{(1)}(u_k)|_{H_0(2^{n_{1}})} \otimes I\) on the hyperplane of traceless Hermitian operators satisfies
Therefore, \(\Vert T^{(1)}(u_k)|_{H_0(2^{n_1})} \otimes I \Vert _{2-2} \le \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_1})}\Vert _{2-2}\). Similarly, a symmetric argument shows that \(\Vert I \otimes T^{(2)}(u_k)|_{H_0(2^{n_2})}\Vert _{2-2} \le \Vert T^{(2)}(u_k)|_{H_{0}(2^{n_2})}\Vert _{2-2}\). Therefore, when restricted to traceless Hermitian operators,
The convergence of T follows from Theorem 1. \(\square \)
1.4 A universal class
We now prove the universality of the class of dissipative quantum systems introduced in the main text. Recall that this class consists of N non-interacting quantum subsystems initialized in a product state of the N subsystems, where the dynamics of subsystem K with \(n_K\) qubits is governed by the CPTP map:
where
with \(J^{j_{1}, j_{2}}_{K}\) and \(\alpha \) being real-valued constants and \({\mathrm{Tr}}_{i_{0}^{K}}\) denoting the partial trace over the ancilla qubit. Let \({\overline{H}}_{K} = I \otimes \cdots \otimes H_{K} \otimes \cdots \otimes I\) with \(H_{K}\) in the Kth position, the total Hamiltonian of N subsystems is
Writing \(\rho _{k} = \bigotimes _{K=1}^{N} \rho _{k}^{K}\), the overall dynamics and the output are given by
where h is the multivariate polynomial defined by the right-hand side of Eq. (3) in the main text.
Proposition 1
Let \({\mathscr {M}}_{S}\) be the set of filters induced from dissipative quantum systems described by Eq. (11) such that each \(T_{K}\) \((K = 1, \ldots , N)\) satisfies conditions in Theorem 1. Then for any null sequence w, the corresponding family of functionals \({\mathscr {F}}_{S}\) is dense in \(C(K_{1}^{-}([0, 1]), \Vert \cdot \Vert _{w})\).
Proof
We first show \(T_{K}(x)\) satisfies the conditions in Lemma 3 for all \(x \in [0, 1]\). Let \(x, y \in [0, 1]\) and Z be the Pauli Z operator. By definition,
where \(\tilde{T}\) is an input-independent CPTP map.
Now, the same argument in the proof of Lemma 5 shows that \(T = T_{1} \otimes \cdots \otimes T_{N}\) is convergent given the assumptions on each \(T_K\). Furthermore, given two convergent systems whose dynamics are described by Eq. (11) with Hamiltonians \(H^{(1)}\) and \(H^{(2)}\), the total Hamiltonian of the combined system is \(H = H^{(1)} \otimes I + I \otimes H^{(2)}\), which again has the form Eq. (10). Therefore, by the above observation and Lemma 5, \({\mathscr {F}}_{S}\) forms a polynomial algebra, consisting of w-fading memory functionals for any null sequence w.
It remains to show \({\mathscr {F}}_{S}\) contains constants and separates points. Constants can be obtained by setting the weights \(w_{i_1, \ldots , i_{n}}^{r_{i_1}, \ldots , r_{i_n}}\) in the output to be zero. To show the family \({\mathscr {F}}_{S}\) separates points, we state the following lemma for later use, whose proof can be found in [47, Theorem 3.2].
Lemma 6
Let \(f(\theta ) = \sum _{n=0}^{\infty } x_{n} \theta ^{n}\) be a non-constant real power series, having a nonzero radius of convergence. If \(f(0) = 0\), then there exists \(\beta >0\) such that \(f(\theta ) \ne 0\) for all \(\theta \) with \(\left| \theta \right| \le \beta \) and \(\theta \ne 0\).
Consider a single-qubit system interacting with a single ancilla qubit whose dynamics is governed by Eq. (11). Order an orthogonal basis of \({\mathscr {L}}({\mathbb {C}}^{2})\) as \({\mathscr {B}} = \{I, Z, X, Y\}\). Recall that the normal representations of a CPTP map T and a density operator \(\rho \) are given by
where \(B_{i} \in {\mathscr {B}}\). Without loss of generality, let \(\tau = 1\) and set \(J_{1}^{j_1, j_2} = J \in {\mathbb {R}}\) for all \(j_{1}, j_{2}\) in the Hamiltonian given by Eq. (9). We obtain the normal representation of the CPTP map defined in Eq. (8) as
When restricted to the hyperplane of traceless Hermitian operators,
with \(\left\| {\overline{T}}|_{H_{0}(2)} \right\| _{2-2} = \sigma _{\max }({\overline{T}}|_{H_{0}(2)}) = |\cos (2J)|\). Here, \(\left\| \cdot \right\| _{2-2}\) is the matrix 2-norm and \(\sigma _{\max }(\cdot )\) is the maximum singular value. Choose \(J \ne \frac{z \pi }{2}\) for \(z \in {\mathbb {Z}}\), then \(|\cos (2J)| \le 1-\epsilon \) for some \(0<\epsilon \le 1\). By Theorem 1, T is convergent and we choose an arbitrary initial density operator \({\overline{\rho }}_{-\infty } = \begin{pmatrix} 1/2&1/2&0&0 \end{pmatrix}^{T}\), corresponding to \(\rho _{-\infty } = |0 \rangle \langle 0|\). If we only take the expectation \(\langle Z \rangle \) in the output Eq. (3) by setting the degree \(R=1\), then this single-qubit dissipative quantum system induces a functional
for all \(u \in K^{-}_{1}([0, 1])\) and \(w\ne 0\). Here, \([\cdot ]_{2}\) refers to the second element of the vector, corresponding to \(\langle Z \rangle \) given the order of the orthogonal basis elements in \({\mathscr {B}}\). Given two input sequences \(u \ne v\) in \(K^{-}_{1}([0, 1])\), consider two cases:
-
(i)
If \(u_{0} \ne v_{0}\), choose \(J = \frac{\pi }{4}\) such that \(\cos ^2(2J) = 0\) and \(\sin ^{2}(2J)=1\). Then
$$\begin{aligned} F^{T}(u) - F^{T}(v) = w(u_{0} - v_{0}) \ne 0. \end{aligned}$$ -
(ii)
If \(u_{0} = v_{0}\),
$$\begin{aligned} F^{T}(u) - F^{T}(v) = w \sin ^2(2J) \sum _{j=0}^{\infty } \left( \cos ^2(2J) \right) ^{j}(u_{-j} - v_{-j}). \end{aligned}$$Let \(\theta = \cos ^{2}(2J)\), then given our choice of J, \(0 \le \theta \le 1-\epsilon \) and \(\sin ^2(2J) \ge \epsilon \) for some \(0<\epsilon \le 1\). Consider the power series
$$\begin{aligned} f(\theta ) = \sum _{j=0}^{\infty } \theta ^{j} (u_{-j} - v_{-j}), \end{aligned}$$since \(\left| u_{-j} - v_{-j} \right| \le 1\), \(f(\theta )\) has a nonzero radius of convergence R such that \((-1, 1) \subseteq R\). Moreover, \(f(\theta )\) is non-constant and \(f(0) = 0\). The separation of points follows from invoking Lemma 6.
Finally, the universality property of \({\mathscr {F}}_{S}\) follows from Theorem 3. \(\square \)
1.5 Detailed numerical experiment settings
In this section, we describe detailed formulas for the NARMA tasks, simulation of decoherence and experimental conditions for ESNs and the Volterra series.
1.5.1 The NARMA task
The general mth-order NARMA I/O map is described as [31]:
where \(\gamma \in {\mathbb {R}}\). In the main text, we consider \(\tau _{\mathrm{NARMA}} = \{15, 20, 30, 40\}\). For \(\tau _{\mathrm{NARMA}} = \{15, 20\}\), we set \(\gamma = 0.1\). For \(\tau _{\mathrm{NARMA}} = \{30, 40\}\), \(\gamma \) is set to be 0.05 and 0.04, respectively. A random input sequence \(u^{(r)}\), where each \(u_{k}^{(r)}\) is randomly uniformly chosen from [0, 0.2], is deployed for all the computational tasks. This range is chosen to ensure stability of the NARMA tasks.
1.5.2 Decoherence
We consider the dephasing, decaying and generalized amplitude damping (GAD) noise, which are of experimental importance. The dephasing noise has the Kraus operators [38]:
where \(\sqrt{1-p} = e^{-2 \frac{\gamma }{S}\delta _{t}}\). Therefore, we implement single-qubit phase-flip for all n system and ancilla qubits. That is, for \(j=1,\ldots , n+1\) the density operator \(\rho \) for the system and ancilla qubits undergoes the evolution:
where \(Z^{(j)}\) denotes the Pauli Z operator for qubit j.
The generalized amplitude damping (GAD) channel captures the effect of dissipation to an environment at a finite temperature \(\lambda \in [0, 1]\). Its Kraus operators are defined by
When \(\lambda =1\), the GAD channel corresponds to the amplitude damping channel (decaying noise). We simulate the generalized amplitude damping channel for \(\lambda =\{0.2, 0.4, 0.6, 0.8\}\). To implement the GAD channel with the same noise strengths as the dephasing channel, we set \(\sqrt{1-p} = e^{-2 \frac{\gamma }{S} \delta _{t}}, \sqrt{p} = \sqrt{1-e^{-4 \frac{\gamma }{S} \delta _t}}\) to be the same as the dephasing noise.
Following the discussion in Sect. 5.2, Fig. 8 plots the average SA NMSE for the LRPO, Missile, NARMA15 and NARMA20 tasks under the GAD channel for all the chosen temperature parameters. Figures 9 and 10 plot the average sum of modulus of off-diagonal elements in the system density operator, for the last 50 timesteps of the SA samples, under all noise types discussed above.
1.5.3 The echo state networks
An ESN with m reservoir nodes is a type of recurrent neural network with a \(m \times 1\) input matrix \(W_{i}\), a \(m \times m\) reservoir matrix \(W_{r}\) and an \(1 \times m\) output matrix \(W_{o}\). The state evolution and output are given by [19]
where \(w_{c}\) is a tunable constant and \(\tanh (\cdot )\) is an element-wise operation.
In the numerical examples, lengths of washout, learning and evaluation phases for ESNs and SA are the same. Given an output sequence y to be learned, the output weights \(w_{c}\) and \(W_{o}\) are optimized via standard least squares to minimize \(\sum _{k}|y_{k} - {\hat{y}}_{k}|^2\), for timesteps k during the training phase. We now detail the experimental conditions for ESNs in various subsections of the numerical experiments (Sect. 5).
For the comparison given in Sect. 5.1, we set the reservoir size to be \(m \in {\mathscr {M}}=\{10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 800\}\). Here, the number of computational nodes is \(m+1\) for each m. For each computational task and each m, the average NMSE of 100 ESNs is reported. The average NMSE for ESNs is obtained as follows. For each reservoir size m, we prepare 100 ESNs with elements of \(W_{r}\) randomly uniformly chosen \([-2, 2]\). Let \({\mathscr {S}}\) denote the set of 10 points evenly spaced between [0.01, 0.99]. For each of the 100 ESNs, we scale the maximum singular value of \(W_{r}\) to \(\sigma _{\max }(W_{r})=s\) for all \(s \in {\mathscr {S}}\). This ensures the convergence and fading memory property of ESNs [26]. For each of the chosen s, the elements of \(W_{i}\) are randomly uniformly chosen within \([-\delta , \delta ]\), where \(\delta \) is chosen from the set \({\mathscr {I}}\) of 10 points evenly spaced between [0.01, 1]. Now, for the ith (\(i=1, \ldots , 100\)) ESN with parameter \((m, s, \delta )\), we denote its associated NMSE to be \({\mathrm{NMSE}}_{(m, s, \delta , i)}\). For each reservoir size m, the average NMSE is computed as \(\frac{1}{\mathscr {|S|}} \frac{1}{\mathscr {|I|}} \frac{1}{100} \sum _{s \in {\mathscr {S}}}\sum _{\delta \in {\mathscr {I}}} \sum _{i=1}^{100}{\mathrm{NMSE}}_{(m, s, \delta , i)}\). Figure 11 summarizes the average ESNs NMSE for the LRPO, Missile, NARMA15 and NARMA20 tasks.
For the further comparison in Sect. 5.4, ESNs are simulated to approximate the LRPO, Missile, NARMA15, NARMA20, NARMA30 and NARMA40 tasks. The reservoir size of ESNs for each task is set to be \(m \in {\mathscr {M}} = \{256, 300, 400, 500\}\). For each m, the number of computational nodes \({\mathscr {C}}\) for ESNs is
where \({\mathscr {N}}_{n}\) denotes the chosen numbers of computational nodes for n-qubit SA defined as follows. Recall that in this experiment, 4-, 5- and 6-qubit SA with varying degrees R in the output are chosen. For 4-qubit SA, \(R_{4}=\{1, \ldots , 6\}\) correspond to the number of computational nodes \({\mathscr {N}}_{4} = \{5, 15, 35, 70, 126, 210\}\). For 5-qubit SA, \(R_{5}=\{1, \ldots , 5\}\), such that \({\mathscr {N}}_{5} = \{6, 21, 56, 126, 252\}\). For 6-qubit SA, \(R_{6}=\{1, \ldots , 4\}\), such that \({\mathscr {N}}_{6} = \{7, 28, 84, 210\}\). To compute the output weights \(W_{o}\) and \(w_{c}\) when \({\mathscr {C}} < m+1\), we first optimize \(W_{o}\) and \(w_{c}\) by standard least squares. Then choose \({\mathscr {C}}-1\) elements of \(W_{o}\) with the largest absolute values and their corresponding elements \(x_{k}'\) from the state \(x_k\). These \({\mathscr {C}}-1\) state elements \(x'_k\) are used to re-optimize \({\mathscr {C}}-1\) elements \(W'_{o}\) of \(W_{o}\) and \(w'_{c}\) via standard least squares. At each timestep k, the full state \(x_k\) evolves, while the output is computed as \({\hat{y}}' = W'_{o} x'_{k} + w'_{c}\). For this numerical experiment, the chosen parameters \({\mathscr {S}}\) and \({\mathscr {I}}\) of ESNs are the same as above. For the ith ESN with parameter \((m, s, \delta )\), the number of computational nodes \({\mathscr {C}}\) varies. Let \({\mathrm{NMSE}}_{(m, {\mathscr {C}}, s, \delta , i)}\) denote the corresponding NMSE. For each m and each \({\mathscr {C}}\), we report the average NMSE computed as \(\frac{1}{\mathscr {|S|}} \frac{1}{\mathscr {|I|}} \frac{1}{100} \sum _{s \in {\mathscr {S}}}\sum _{\delta \in {\mathscr {I}}} \sum _{i=1}^{100}{\mathrm{NMSE}}_{(m, {\mathscr {C}}, s, \delta , i)}\).
1.5.4 The Volterra series
The discrete-time finite Volterra series with kernel order o and memory p is given by [23]
where \(u_{k - j}\) is the delayed input and \(h_{0}\) and \(h_{i}^{j_1, \ldots , j_i}\) are real-valued kernel coefficients (or output weights in our context). Notice that when memory \(p=1\), the Volterra series is a map from the current input \(u_{k}\) to the output \({\hat{y}}_{k}\). The kernel coefficients are optimized via linear least squares to minimize \(\sum _{k} |y_{k} - {\hat{y}}_{k}|^2\) during the training phase, where y is the target output sequence to be learned.
The number of computational nodes, that is the number of kernel coefficients \(h_{0}\) and \(h_{i}^{j_1, \ldots , j_i}\), is given by \((p^{o+1} - p) / (p - 1) + 1\). We vary the parameters of the Volterra series as follows: For each \(o = \{2, \ldots , 8 \}\), choose p from \(\{2, \ldots , 27\}\) such that the maximum number of computational nodes does not exceed 801. Note that for \(o=1\), the output of the Volterra series is a linear function of delayed inputs. Since we are interested in nonlinear I/O maps, we choose \(o \ge 2\). Table 2 summarizes the number of computational nodes as o and p vary. Figure 12 shows the Volterra series NMSE according to the kernel order and memory.
It is observed in Fig. 12 that as the kernel order increases, the Volterra series task performance does not improve. On the other hand, as the memory increases for kernel order 2, the Volterra series task performance improves. The improvement is particularly significant as the memory p coincides with the delay for NARMA tasks, that is when \(p = \tau _{\mathrm{NARMA}} + 1\).
Rights and permissions
About this article
Cite this article
Chen, J., Nurdin, H.I. Learning nonlinear input–output maps with dissipative quantum systems. Quantum Inf Process 18, 198 (2019). https://doi.org/10.1007/s11128-019-2311-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11128-019-2311-9