Learning nonlinear input–output maps with dissipative quantum systems

Chen, Jiayin; Nurdin, Hendra I.

doi:10.1007/s11128-019-2311-9

Learning nonlinear input–output maps with dissipative quantum systems

Published: 15 May 2019

Volume 18, article number 198, (2019)
Cite this article

Quantum Information Processing Aims and scope Submit manuscript

Jiayin Chen¹ &
Hendra I. Nurdin¹

1220 Accesses
27 Citations
4 Altmetric
Explore all metrics

A Correction to this article was published on 10 October 2019

This article has been updated

Abstract

In this paper, we develop a theory of learning nonlinear input–output maps with fading memory by dissipative quantum systems, as a quantum counterpart of the theory of approximating such maps using classical dynamical systems. The theory identifies the properties required for a class of dissipative quantum systems to be universal, in that any input–output map with fading memory can be approximated arbitrarily closely by an element of this class. We then introduce an example class of dissipative quantum systems that is provably universal. Numerical experiments illustrate that with a small number of qubits, this class can achieve comparable performance to classical learning schemes with a large number of tunable parameters. Further numerical analysis suggests that the exponentially increasing Hilbert space presents a potential resource for dissipative quantum systems to surpass classical learning schemes for input–output maps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning quantum systems

Article 09 February 2023

Stationary states of a dissipative two-qubit quantum channel and their applications for quantum machine learning

Article 25 February 2023

Quantum learning with noise and decoherence: a robust quantum neural network

Article 31 January 2020

Change history

10 October 2019
Unfortunately, some errors were detected after publication of the article referred to in the title. This errata corrects errors in the proofs of Lemma 5 on page 23 and Proposition 1 on page 26 of the article. The authors also noticed a weaker condition for Lemma 1 on page 19 to hold. The corrections and improvements do not impact the main results and conclusion of the article.

References

Preskill, J.: Quantum computing in the NISQ era and beyond (2018). Arxiv preprint arXiv:1801.00862
Mills, M.: Hearing aids and the history of electronics miniaturization. IEEE Ann. Hist. Comput. 22(3), 24 (2011)
Article MathSciNet Google Scholar
Aaronson, S., Arkhipov, A.: The computational complexity of linear optics. In: Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC), pp. 333–342 (2011)
Lund, A.P., Bremner, M.J., Ralph, T.C.: Quantum sampling problems, Boson sampling and quantum supremacy. NPJ Quantum Inf. 3(1), 15 (2017)
Article ADS Google Scholar
Bremner, M.J., Jozsa, R., Shepherd, D.J.: Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy. Proc. R. Soc. A 467, 459 (2010)
Article ADS MathSciNet MATH Google Scholar
Boixo, S., Isakov, S.V., Smelyanskiy, V.N., Babbush, R., Ding, N., Jiang, Z., Bremner, M.J., Martinis, J.M., Neven, H.: Characterizing quantum supremacy in near-term devices. Nat. Phys. 14(6), 595 (2018)
Article Google Scholar
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195 (2017)
Article ADS Google Scholar
Farhi, E., Goldstone, J., Gutmann, S.: A quantum approximate optimization algorithm (2014). ArXiv preprint arXiv:1411.4028
Peruzzo, A., McLean, J., Shadbolt, P., Yung, M., Zhou, X., Love, P.J., Aspuru-Guzik, A., O’Brien, J.L.: A variational eigenvalue solver on a quantum processor. Nat. Commun. 5, 4213 (2013)
Article ADS Google Scholar
McClean, J.R., Romero, J., Babbush, R., Aspuru-Guzik, A.: The theory of variational hybrid quantum-classical algorithms. New J. Phys. 18, 023023 (2016)
Article ADS Google Scholar
Wang, D., Higgott, O., Brierley, S.: A generalised variational quantum eigensolver (2018). ArXiv preprint arXiv:1802.00171
Mitarai, K., Negoro, M., Kitagawa, M., Fujii, K.: Quantum circuit learning. Phys. Rev. A 98(3), 032309 (2018)
Article ADS Google Scholar
Kandala, A., Mezzacapo, A., Temme, K., Takita, M., Brink, M., Chow, J.M., Gambetta, J.M.: Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242 (2017)
Article ADS Google Scholar
Otterbach, J.S., et al.: Unsupervised machine learning on a hybrid quantum computer (2017). ArXiv preprint arXiv:1712.05771
Verstraete, F., Wolf, M.M., Cirac, J.I.: Quantum computation and quantum-state engineering driven by dissipation. Nat. Phys. 5(9), 633 (2009)
Article Google Scholar
Alvarez-Rodriguez, U., Lamata, L., Escandell-Montero, P., Martín-Guerrero, J.D., Solano, E.: Supervised quantum learning without measurements. Sci. Rep. 7(1), 13645 (2017)
Article ADS Google Scholar
Fujii, K., Nakajima, K.: Harnessing disordered-ensemble quantum dynamics for machine learning. Phys. Rev. Appl. 8(2), 024030 (2017)
Article ADS Google Scholar
Nakajima, K., Fujii, K., Negoro, M., Mitarai, K., Kitagawa, M.: Boosting computational power through spatial multiplexing in quantum reservoir computing. Phys. Rev. Appl. 11(3), 034021 (2019)
Article ADS Google Scholar
Jaeger, H., Haas, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communications. Science 304, 5667 (2004)
Article Google Scholar
Maass, W., Natschläger, T., Markram, H.: Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 14, 2531 (2002)
Article MATH Google Scholar
Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127 (2009)
Article MATH Google Scholar
Pavlov, A., van de Wouw, N., Nijmeijer, H.: Convergent systems: analysis and synthesis. In: Meurer, T., Graichen, K., Gilles, E.D. (eds.) Control and Observer Design for Nonlinear Finite and Infinite Dimensional Systems. Lecture Notes in Control and Information Science, vol. 322, pp. 131–146. Springer, Berlin (2005)
Chapter MATH Google Scholar
Boyd, S., Chua, L.: Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. 32(11), 1150 (1985)
Article MathSciNet MATH Google Scholar
Appelant, L., et al.: Information processing using a single dynamical node as complex systems. Nat. Commun. 2, 468 (2011)
Article ADS Google Scholar
Torrejon, J., et al.: Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428 (2017)
Article Google Scholar
Grigoryeva, L., Ortega, J.P.: Echo state networks are universal. Neural Netw. 108, 495 (2018)
Article Google Scholar
Grigoryeva, L., Ortega, J.P.: Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems. J. Mach. Learn. Res. 19(1), 892 (2018)
MathSciNet MATH Google Scholar
Buehner, M., Young, P.: A tighter bound for the echo state property. IEEE Trans. Neural Netw. 17(3), 820 (2006)
Article Google Scholar
Dieudonné, J.: Foundations of Modern Analysis. Read Books Ltd, Redditch (2013)
MATH Google Scholar
Ni, X., Verhaegen, M., Krijgsman, A.J., Verbruggen, H.B.: A new method for identification and control of nonlinear dynamic systems. Eng. Appl. Artif. Intell. 9(3), 231 (1996)
Article Google Scholar
Atiya, A.F., Parlos, A.G.: New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Trans. Neural Netw. 11(3), 697 (2000)
Article Google Scholar
Dormand, J.R., Prince, P.J.: A family of embedded Runge–Kutta formulae. J. Comput. Appl. Math. 6(1), 19 (1980)
Article MathSciNet MATH Google Scholar
Lukoševičius, M.: A practical guide to applying echo state networks. In: Neural Networks: Tricks of the Trade, pp. 659–686. Springer (2012)
Trotter, H.F.: On the product of semi-groups of operators. Proc. Am. Math. Soc. 10(4), 545 (1959)
Article MathSciNet MATH Google Scholar
Suzuki, M.: Relationship among exactly soluble models of critical phenomena. I: 2D Ising model, dimer problem and the generalized XY-model. Prog. Theor. Phys. 46(5), 1337 (1971)
Article ADS MathSciNet MATH Google Scholar
Vandersypen, L.M., Steffen, M., Breyta, G., Yannoni, C.S., Sherwood, M.H., Chuang, I.L.: Experimental realization of Shor’s quantum factoring algorithm using nuclear magnetic resonance. Nature 414(6866), 883 (2001)
Article ADS Google Scholar
IBM Q 20 Tokyo. https://www.research.ibm.com/ibm-q/technology/devices/. Accessed: 10 April 2019
Nielsen, M.A., Chuang, I.L.: Quantum computation and quantum information: 10th anniversary edition, 10th edn. Cambridge University Press, New York (2011)
MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics New York, vol. 1. Springer, New York (2001)
MATH Google Scholar
Bouten, L., van Handel, R., James, M.R.: A discrete invitation to quantum filtering and feedback. SIAM Rev. 51(2), 239 (2009)
Article ADS MathSciNet MATH Google Scholar
Gross, J.A., Caves, C.M., Milburn, G.J., Combes, J.: Qubit models of weak continuous measurements: markovian conditional and open-system dynamics. Quantum Sci. Technol. 3(2), 024005 (2018)
Article ADS Google Scholar
Aleksandrowicz, G. et al.: Qiskit: an open-source framework for quantum computing (2019). https://doi.org/10.5281/zenodo.2562110
Richter, S., Werner, R.F.: Ergodicity of quantum cellular automata. J. Stat. Phys. 82(3–4), 963 (1996)
Article ADS MathSciNet MATH Google Scholar
Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-Hill, New York (1964)
MATH Google Scholar
Perez-Garcia, D., Wolf, M.M., Petz, D., Ruskai, M.B.: Contractivity of positive and trace-preserving maps under $L_p$ norms. J. Math. Phys. 47(8), 083506 (2006)
Article ADS MathSciNet MATH Google Scholar
Kubrusly, C.S.: A concise introduction to tensor product. Far East J. Math. Sci. 22(2), 137 (2006)
MathSciNet MATH Google Scholar
Lang, S.: Complex Analysis. Graduate Texts in Mathematics. Springer-Verlag (1985). https://books.google.com.au/books?id=7S7vAAAAMAAJ
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Telecommunications, The University of New South Wales (UNSW), Sydney, NSW, 2052, Australia
Jiayin Chen & Hendra I. Nurdin

Authors

Jiayin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hendra I. Nurdin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hendra I. Nurdin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 The convergence property

Recall from the main text that for a compact subset $D \subseteq {\mathbb {R}}$ and $L>0$, $K_L(D)$ denotes the set of all real sequences $u=\{u_k\}_{k \in {\mathbb {Z}}}$ taking values in $D \cap [-L,L]$. Let $K_{L}^{-}(D)$ and $K_{L}^{+}(D)$ be subsets of input sequences in $K_{L}(D)$ whose indices are restricted to ${\mathbb {Z}}^{-}=\{\ldots , -2, -1, 0\}$ and ${\mathbb {Z}}^{+} = \{1, 2, \ldots \}$, respectively. In the following, we write T for both input-independent and input-dependent CPTP maps. As in the main text, we write $T(u_k)$ for a CPTP map that is determined by an input $u_k$, and $\Vert \cdot \Vert _{p}$ for any Schatten p-norm for $p \in [1, \infty )$. All dissipative quantum systems considered here are finite-dimensional. We now state the definition of a convergent CPTP map with respect to $K_{L}(D)$.

Definition 1

(Convergence) An input-dependent CPTP map T is convergent with respect to $K_{L}(D)$ if there exists a sequence $\delta = \{\delta _k\}_{k > 0}$ with $\mathop {\lim }_{k \rightarrow \infty } \delta _{k} = 0$, such that for all $u = \{u_{k}\}_{k \in {\mathbb {Z}}^{+}} \in K_{L}^{+}(D)$ and any two density operators $\rho _{j, k}$ $(j = 1, 2)$ satisfying $\rho _{j, k} = T(u_{k})\rho _{j, k-1}$, it holds that $\left\| \rho _{1, k} - \rho _{2, k} \right\| _{2} \le \delta _{k}$. We call a dissipative quantum system whose dynamics is governed by a convergent CPTP map a convergent system.

The convergence property can be viewed as an extension of the mixing property for a noisy quantum channel described by an input-independent CPTP map [43].

Definition 2

(Mixing) A n-qubit dissipative quantum system described by a CPTP map T is mixing if for all $\rho _{0} \in {\mathscr {D}}({\mathbb {C}}^{2^n})$, if there exists a unique density operator $\rho _{*}$ such that,

$$\begin{aligned} \mathop {\lim }_{k \rightarrow \infty } \left\| \left( \prod _{j=1}^{k} T(\rho _{0}) \right) - \rho _{*} \right\| _{2} = 0. \end{aligned}$$

We will see later that if an input-dependent CPTP map $T(u_{k})$ satisfies the sufficient condition in Theorem 1, then $T(u_k)$ is mixing for each $u_k \in D \cap [-L, L]$.

Theorem 1

(Convergence property) A n-qubit dissipative quantum system governed by an input-dependent CPTP map T is convergent with respect to $K_{L}(D)$ if, for all $u_{k} \in D \cap [-L, L]$, $T(u_k)$ on the hyperplane $H_{0}(2^n)$ of $2^n \times 2^n$ traceless Hermitian operators satisfies $\Vert T(u_k)|_{H_{0}(2^n)} \Vert _{2-2} :=\sup _{A \in H_{0}(2^n), A \ne 0} \frac{\Vert T(u_k) A\Vert _{2}}{\Vert A\Vert _{2}} \le 1 - \epsilon $ for some $0 < \epsilon \le 1$. Moreover, any pair of initial density operators converge uniformly to one another under T.

Proof

Let $\rho _{1, 0}$ and $\rho _{2, 0}$ be two arbitrary initial density operators and $\rho _{1, 0} - \rho _{2, 0}$ is a traceless Hermitian operator. We have,

$$\begin{aligned} \begin{aligned} \left\| \rho _{1, k} - \rho _{2, k} \right\| _{2}&= \left\| \left( \overleftarrow{\prod }_{j=1}^{k} T(u_{j}) \right) (\rho _{1, 0} - \rho _{2, 0}) \right\| _{2} \\&= \left\| \left( \overleftarrow{\prod }_{j=1}^{k} T(u_{j}) |_{H_{0}(2^n)} \right) (\rho _{1, 0} - \rho _{2, 0}) \right\| _{2} \\&\le \overleftarrow{\prod }_{j=1}^{k} \left\| T(u_{j}) |_{H_{0}(2^n)} \right\| _{2-2} \left\| \rho _{1, 0} - \rho _{2, 0} \right\| _{2} \\&\le \overleftarrow{\prod }_{j=1}^{k} (1-\epsilon ) \left\| \rho _{1, 0} - \rho _{2, 0} \right\| _{2} \\&\le \overleftarrow{\prod }_{j=1}^{k} (1 - \epsilon ) (\left\| \rho _{1, 0} \right\| _{2} + \left\| \rho _{2, 0} \right\| _{2}) \\&\le 2(1-\epsilon )^{k}, \end{aligned} \end{aligned}$$

where the last inequality follows from the fact that for all $\rho \in {\mathscr {D}}({\mathbb {C}}^{2^n})$, $\left\| \rho \right\| _{2} \le 1$. $\square $

We remark that for a n-qubit dissipative quantum system that satisfies the condition in Theorem 1, any initial density operator $\rho _{0}$ reaches the state $\lim _{k \rightarrow \infty } \left( \overleftarrow{\prod }_{j=1}^{k}T(u_j) \right) \left( \frac{I}{2^{n}}\right) $. To see this, let

$$\begin{aligned} \rho _{0} = \frac{I}{2^n} + \sum _{\begin{array}{c} j_1, j_2,\ldots ,j_n = \{0, 1, 2, 3\} \\ j_1j_2\ldots j_n \ne 0 \end{array}} \alpha _{j_1j_2\ldots j_n}\bigotimes _{i=1}^{n} \sigma ^{(i)}_{j_i}, \end{aligned}$$

where $\sigma ^{(i)}_{j_{i}}$ denotes, for qubit i, the identity operator I if $j_{i}=0$, the Pauli X operator if $j_{i} = 1$, the Pauli Y operator if $j_{i}=2$ and the Pauli Z operator if $j_{i}=3$. Since $\bigotimes _{i=1}^{n} \sigma ^{(i)}_{j_i}$ for $j_1j_2\ldots j_n\ne 0$ are all traceless Hermitian operators, therefore as $k \rightarrow \infty $,

$$\begin{aligned} \begin{aligned} \left\| \rho _{k} - \left( \overleftarrow{\prod }_{j=1}^{k} T(u_j) \right) \left( \frac{I}{2^n} \right) \right\| _{2} \rightarrow 0. \end{aligned} \end{aligned}$$

1.2 The universality property

We now show the universality property of convergent dissipative quantum systems. Let ${\mathbb {R}}^{{\mathbb {Z}}}$ be the set of all real-valued infinite sequences. Consider a n-qubit convergent dissipative quantum system described by Eqs. (1) and (2) in the main text, whose dynamics and output are defined by a CPTP map T and a functional $h: {\mathscr {D}}({\mathbb {C}}^{2^n}) \rightarrow {\mathbb {R}}$, respectively. We associate this quantum system with an induced filter $M^{T}_{h}: K_{L}(D) \rightarrow {\mathbb {R}}^{{\mathbb {Z}}}$, such that for any initial condition $\rho _{-\infty } \in {\mathscr {D}}({\mathbb {C}}^{2^n})$, when evaluated at time $t = k \tau $,

$$\begin{aligned} M_{h}^{T}(u)_{k} = h\left( \left( \overrightarrow{\prod }_{j=0}^{\infty } T(u_{k-j}) \right) \rho _{-\infty } \right) , \end{aligned}$$

where $\overrightarrow{\prod }_{j=0}^{\infty } T(u_{k-j}) = \mathop {\lim }_{N \rightarrow \infty } \overleftarrow{\prod }_{j=0}^{N} T(u_{k+(j-N)})=\lim _{N \rightarrow \infty } T(u_{k}) T(u_{k-1}) \cdots T(u_{k-N})$, and the limit is a point-wise limit. Lemma 1 states that this limit is well-defined.

Lemma 1

The filter $M_{h}^{T}: K_{L}(D) \rightarrow {\mathbb {R}}^{{\mathbb {Z}}}$ is well-defined. In particular, the limit

$$\begin{aligned} \lim _{N \rightarrow \infty } T(u_{k})T(u_{k-1}) \cdots T(u_{k-N})\rho _{-N} \end{aligned}$$

exists and is independent of $\rho _{-N}$.

Proof

The set ${\mathscr {D}}({\mathbb {C}}^{2^n})$ equipped with the distance function induced by the norm $\Vert \cdot \Vert _{2}$ is a complete metric space. Therefore, every Cauchy sequence converges to a point in ${\mathscr {D}}({\mathbb {C}}^{2^n})$ [44]. It remains to show that $S_{n} = T(u_{k})T(u_{k-1}) \cdots T(u_{k-n}) \rho _{-n}$ is a Cauchy sequence. By hypothesis, for all $u_{k} \in D \cap [-L, L]$, $\Vert T(u_{k})|_{H_{0}(2^{n})}\Vert _{2} \le 1 - \epsilon $ for some $0 < \epsilon \le 1$. For any $\epsilon ' > 0$, let $N>0$ such that $(1-\epsilon )^{N} < \frac{\epsilon '}{2}$. Then for all $n, m > N$, suppose that $n \le m$,

$$\begin{aligned} \begin{aligned} \Vert S_{n} - S_{m}\Vert _{2}&= \left\| T(u_{k})T(u_{k-1})\cdots T(u_{k-n}) \left( \rho _{-n} - T(u_{k-n-1})\cdots T(u_{k-m}) \rho _{-m}\right) \right\| _{2} \\&\le (1-\epsilon )^{n+1} \left( \Vert \rho _{-n}\Vert _{2} +\Vert \left( T(u_{k-n-1})\cdots T(u_{k-m}) \right) \rho _{-m}\Vert _{2} \right) \\&\le 2(1-\epsilon )^{N} < \epsilon ' \end{aligned} \end{aligned}$$

$\square $

This filter is causal since given $u, v \in K_{L}(D)$ satisfying $u_{\tau } = v_{\tau }$ for $\tau \le k$, $M_{h}^{T}(u)_{k} = M_{h}^{T}(v)_{k}$. For any $\tau \in {\mathbb {Z}}$, let $M_{\tau }$ be the shift operator defined by $M_{\tau }(u)_{k} = u_{k-\tau }$. A filter is said to be time invariant if it commutes with $M_{\tau }$. It is straightforward to show that $M_{h}^{T}$ is time invariant.

For a time-invariant and causal filter, there is a corresponding functional $F_{h}^{T}: K_{L}^{-}(D) \rightarrow {\mathbb {R}}$ defined as $F_{h}^{T}(u) = M_{h}^{T}(u)_0$ (see [23]). The corresponding filter can be recovered via $M_{h}^{T}(u)_{k} = F_{h}^{T}(P \circ M_{-k}(u)) $, where P truncates u up to 0, that is $P(u) = u|_0$. We say a filter $M_{h}^{T}$ has the fading memory property if and only if $F_{h}^{T}$ is continuous with respect to a weighted norm defined as follows.

Definition 3

(Weighted norm) For a null sequence $w=\{w_{k}\}_{k \ge 0}$, that is $w: \{0\} \cup {\mathbb {Z}}^{+} \rightarrow (0, 1]$ is decreasing and $\mathop {\lim }_{k \rightarrow \infty } w_{k} = 0$, define a weighted norm $\Vert \cdot \Vert _{w}$ on $K^{-}_{L}(D)$ as $\Vert u\Vert _{w} :=\mathop {\sup _{k \in {\mathbb {Z}}^{-}}} \left| u_{k} \right| w_{-k}$.

Definition 4

(Fading memory) A time-invariant causal filter $M: K_{L}(D) \rightarrow {\mathbb {R}}^{{\mathbb {Z}}}$ has the fading memory property with respect to a null sequence w if and only if its corresponding functional $F: K_{L}^{-}(D) \rightarrow {\mathbb {R}}$ is continuous with respect to the weighted norm $\Vert \cdot \Vert _{w}$.

To emphasize that the fading memory property is defined with respect to a null sequence w, we will say that M is a w-fading memory filter and the corresponding functional F is a w-fading memory functional. We state the following compactness result [27, Lemma 2] and the Stone–Weierstrass theorem [29, Theorem 7.3.1].

Lemma 2

(Compactness) For any null sequence w, $K_{L}^{-}(D)$ is compact with the weighted norm $\Vert \cdot \Vert _{w}$.

We write $(K_{L}^{-}(D), \Vert \cdot \Vert _{w})$ to denote the space $K_{L}^{-}(D)$ equipped with the weighted norm $\Vert \cdot \Vert _{w}$.

Theorem 2

(Stone–Weierstrass) Let E be a compact metric space and C(E) be the set of real-valued continuous functions defined on E. If a subalgebra A of C(E) contains the constant functions and separates points of E, then A is dense in C(E).

Let $C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})$ be the set of continuous functionals $F: (K_{L}^{-}(D), \Vert \cdot \Vert _{w}) \rightarrow {\mathbb {R}}$. The following theorem is a result of the compactness of $(K_{L}^{-}(D), \Vert \cdot \Vert _{w})$ (Lemma 2) and the Stone–Weierstrass theorem (Theorem 2).

Theorem 3

Let w be a null sequence. For convergent CPTP maps T, let ${\mathscr {M}}_{w} = \{M_{h}^{T} \mid h : {\mathscr {D}}({\mathbb {C}}^{2^{n}}) \rightarrow {\mathbb {R}}\}$ be a set of w-fading memory filters. Let ${\mathscr {F}}_{w}$ be the family of corresponding w-fading memory functionals defined on $K_{L}^{-}(D)$. If ${\mathscr {F}}_{w}$ forms a polynomial algebra of $C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})$, contains the constant functionals and separates points of $K_{L}^{-}(D)$, then ${\mathscr {F}}_{w}$ is dense in $C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})$. That is for any w-fading memory filter $M_{*}$ and any $\epsilon > 0$, there exists $M_{h}^{T} \in {\mathscr {M}}_{w}$ such that for all $u \in K_{L}(D)$, $\Vert M_{*}(u) - M_{h}^{T}(u)\Vert _{\infty } = \sup _{k \in {\mathbb {Z}}}|M_{*}(u)_{k} - M_{h}^{T}(u)_{k}| < \epsilon $.

Proof

${\mathscr {F}}_{w}$ is dense follows from Lemma 2 and Theorem 2. To prove the second part of the theorem, since ${\mathscr {F}}_{w}$ is dense in $C(K_{L}^{-}(D), \Vert \cdot \Vert _{w})$, for any w-fading memory functional $F_{*}$ and any $\epsilon >0$ , there exists $F_{h}^{T} \in {\mathscr {F}}_{w}$ such that for all $u_{-} \in K_{L}^{-}(D)$, $ | F_{*}(u_{-}) - F_{h}^{T}(u_{-})| < \epsilon . $ For $u \in K_{L}(D)$, notice that $P \circ M_{-k}(u) \in K_{L}^{-}(D)$ for all $k\in {\mathbb {Z}}$; hence,

$$\begin{aligned} \begin{aligned} \left| F_{*}(P \circ M_{-k}(u)) - F_{h}^{T}(P \circ M_{-k}(u)) \right| = \left| M_{*}(u)_{k} - M_{h}^{T}(u)_{k} \right| < \epsilon . \end{aligned} \end{aligned}$$

Since this is true for all $k \in {\mathbb {Z}}$, therefore for all $u \in K_{L}(D)$, $\Vert M_{*}(u) - M_{h}^{T}(u) \Vert _{\infty } < \epsilon $. $\square $

1.3 Fading memory property and polynomial algebra

Before we prove the universality of the family of dissipative quantum systems introduced in Sect. 4 in the main text, we first show two important observations regarding the multivariate polynomial output in Eq. (3).

We specify h to be the multivariate polynomial as in the right-hand side of Eq. (3) in the main text. For ease of notation, we drop the subscript h in $F_{h}^{T}$ and $M_{h}^{T}$. Let ${\mathscr {F}} = \{F^{T}\}$ be the set of functionals induced from dissipative quantum systems given by Eqs. (1) and (3) in the main text. We will show in Lemma 3 that the convergence and continuity of T are sufficient to guarantee the fading memory property of $F^{T}$, and in Lemma 5 that ${\mathscr {F}}$ forms a polynomial algebra, made of fading memory functionals. In the following, let ${\mathscr {L}}({\mathbb {C}}^{2^n})$ be the set of linear operators on ${\mathbb {C}}^{2^n}$, and for a CPTP map T, for all $u_k \in D \cap [-L, L]$, define $\Vert T(u_k)\Vert _{2-2} :=\sup _{A \in {\mathscr {L}}({\mathbb {C}}^{2^n}), \Vert A\Vert _{2}=1} \Vert T(u_k) A\Vert _{2}$.

Lemma 3

(Fading memory property) Consider a n-qubit dissipative quantum system with dynamics Eq. (1) and output Eq. (3). Suppose that for all $u_k \in D \cap [-L, L]$, the CPTP map $T(u_k)$ satisfies the condition in Theorem 1, so that it is convergent. Moreover, for any $\epsilon > 0$, there exists $\delta _{T}(\epsilon ) > 0$ such that $\Vert T(x) - T(y)\Vert _{2-2} < \epsilon $ whenever $|x - y| < \delta _{T}(\epsilon )$ for $x, y \in D \cap [-L, L]$. Then for any null sequence w, the induced filter $M^{T}$ and the corresponding functional $F^{T}$ are w-fading memory.

Proof

We first state the boundedness of CPTP maps [45, Theorem 2.1].

Lemma 4

For a CPTP map $T : {\mathscr {L}}({\mathbb {C}}^{2^n}) \rightarrow {\mathscr {L}}({\mathbb {C}}^{2^n})$, we have $\left\| T \right\| _{2-2} \le \sqrt{2^n}$.

Moreover, recall that ${\mathrm{Tr}}(\cdot )$ is continuous, that is for any $\epsilon >0$, there exists $\delta _{{\mathrm{Tr}}}(\epsilon ) > 0$ such that $|{\mathrm{Tr}}(A-B)|<\epsilon $ whenever $\Vert A - B \Vert _{2} <\delta _{{\mathrm{Tr}}}(\epsilon )$ for any complex matrices A, B. Note that here $\Vert \cdot \Vert _{2}$ denotes the Schatten 2-norm or the Hilbert–Schmidt norm.

Let w be an arbitrary null sequence. We will show the linear terms L(u) in the functional $F^{T}$ are continuous with respect to $\left\| \cdot \right\| _{w}$, and the continuity property of $F^T$ follows from the fact that finite sums and products of continuous elements are also continuous.

For any $u, v \in K_{L}^{-}(D)$,

$$\begin{aligned} \begin{aligned} \left| L(u) - L(v) \right| = \left| {\mathrm{Tr}} \left( Z^{(i_{1})} \left( \left( \overrightarrow{\prod }_{k=0}^{\infty }T(u_{-k}) \right) \rho _{-\infty } - \left( \overrightarrow{\prod }_{k=0}^{\infty }T(v_{-k}) \right) \rho _{-\infty } \right) \right) \right| . \end{aligned} \end{aligned}$$

Denote $\rho _{u} = \left( \overrightarrow{\prod }_{k=N}^{\infty }T(u_{-k}) \right) \rho _{-\infty }$ and $\rho _{v} = \left( \overrightarrow{\prod }_{k=N}^{\infty }T(v_{-k}) \right) \rho _{-\infty }$ for some $0< N < \infty $,

$$\begin{aligned} \begin{aligned}&\left\| Z^{(i_{1})} \left( \left( \overrightarrow{\prod }_{k=0}^{\infty } T(u_{-k}) \right) \rho _{-\infty } - \left( \overrightarrow{\prod }_{k=0}^{\infty } T(v_{-k}) \right) \rho _{-\infty } \right) \right\| _{2} \\&\quad \le \left\| Z^{(i_{1})} \right\| _{2} \left( \left\| \overrightarrow{\prod }_{k=0}^{N-1} T(u_{-k}) - \overrightarrow{\prod }_{k=0}^{N-1}T(v_{-k}) \right\| _{2-2} \left\| \rho _{u} \right\| _{2} \right. \\&\qquad + \left. \left\| \left( \overrightarrow{\prod }_{k=0}^{N-1} T(v_{-k}) \right) (\rho _{u} - \rho _{v}) \right\| _{2} \right) . \end{aligned} \end{aligned}$$

(4)

Since T satisfies conditions in Theorem 1, any two density operators converge uniformly to one another. Therefore, for any $\epsilon >0$, there exists $N(\epsilon ) > 0$ such that for all $N' > N(\epsilon )$,

$$\begin{aligned} \left\| \left( \overrightarrow{\prod }_{k=0}^{N'-1}T(v_{-k}) \right) (\rho _{u} - \rho _{v}) \right\| _{2} < \frac{\delta _{{\mathrm{Tr}}}(\epsilon )}{2 \left\| Z^{(i_{1})} \right\| _{2}}. \end{aligned}$$

(5)

Choose $N' = N(\epsilon ) + 1$ and bound the first term inside the bracket on the right hand side of Eq. (4) by rewriting it as a telescopic sum:

$$\begin{aligned} \begin{aligned}&\left\| \overrightarrow{\prod }_{k=0}^{N(\epsilon )}T(u_{-k}) - \overrightarrow{\prod }_{k=0}^{N(\epsilon )}T(v_{-k}) \right\| _{2-2} \\&\quad = \left\| \sum _{l=0}^{N(\epsilon )} \left( T(v_{0}) \cdots T(v_{-(l- 1)}) T(u_{-l}) T(u_{-(l+1)}) \cdots T(u_{-N(\epsilon )}) \right. \right. \\&\qquad \left. - T(v_{0}) \cdots T(v_{-(l-1)}) T(v_{-l}) T(u_{-(l+1)}) \cdots T(u_{-N(\epsilon )}) \right) \bigg \Vert _{2-2} \\&\quad \le \sum _{l=0}^{N(\epsilon )} \left\| T(v_{0}) \cdots T(v_{-(l-1)}) \right\| _{2-2} \left\| T(u_{-l}) - T(v_{-l}) \right\| _{2-2} \left\| T(u_{-(l+1)}) \cdots T(u_{-N(\epsilon )}) \right\| _{2-2} \\&\quad \le 2^{n} \sum _{l=0}^{N(\epsilon )} \left\| T(u_{-l}) - T(v_{-l}) \right\| _{2-2}, \end{aligned} \end{aligned}$$

(6)

where the last inequality follows from Lemma 4. We claim that for any $\epsilon > 0$, if

$$\begin{aligned} \Vert u - v \Vert _{w} = \mathop {\sup _{k \in {\mathbb {Z}}^{-}}} \left| u_{k} - v_{k} \right| w_{-k} < \delta _{T}\left( \frac{\delta _{{\mathrm{Tr}}} (\epsilon )}{2^{n+1} \left\| Z^{(i_1)} \right\| _{2} (N(\epsilon )+1)} \right) w_{N(\epsilon )} \end{aligned}$$

then $\left| L(u) - L(v) \right| < \epsilon $. Indeed, since w is decreasing, the above condition implies that

$$\begin{aligned} \begin{aligned} \mathop {\max _{0 \le l \le N(\epsilon )}} \left| u_{-l} - v_{-l} \right| w_{N(\epsilon )} < \delta _{T}\left( \frac{\delta _{{\mathrm{Tr}}} (\epsilon )}{2^{n+1} \left\| Z^{(i_1)} \right\| _{2} (N(\epsilon )+1)} \right) w_{N(\epsilon )}. \end{aligned} \end{aligned}$$

Since $w_{N(\epsilon )} > 0$, for all $0 \le l \le N(\epsilon )$,

$$\begin{aligned} |u_{-l} - v_{-l}| < \delta _{T}\left( \frac{\delta _{{\mathrm{Tr}}} (\epsilon )}{2^{n+1} \left\| Z^{(i_1)} \right\| _{2} (N(\epsilon )+1)} \right) . \end{aligned}$$

By continuity of T, we bound Eq. (6) by

$$\begin{aligned} 2^{n} \sum _{l=0}^{N(\epsilon )}\left\| T(u_{-l}) - T(v_{-l}) \right\| _{2-2}&< 2^{n} \sum _{l=0}^{N(\epsilon )} \frac{\delta _{{\mathrm{Tr}}} (\epsilon )}{2^{n+1} \left\| Z^{(i_1)} \right\| _{2} (N(\epsilon )+1)} = \frac{\delta _{{\mathrm{Tr}}} (\epsilon )}{2 \left\| Z^{(i_1)} \right\| _{2}}.\nonumber \\ \end{aligned}$$

(7)

Since $\left\| \rho _{u} \right\| _{2} \le 1$, Eqs. (4), (5) and (7) give

$$\begin{aligned} \begin{aligned}&\left\| Z^{(i_1)} \right\| _{2} \left( \left\| \overrightarrow{\prod }_{k=0}^{N(\epsilon )}T(u_{-k}) - \overrightarrow{\prod }_{k=0}^{N(\epsilon )}T(v_{-k}) \right\| _{2-2} \left\| \rho _{u} \right\| _{2} + \left\| \left( \overrightarrow{\prod }_{k=0}^{N(\epsilon )}T(v_{-k}) \right) (\rho _{u} - \rho _{v}) \right\| _{2} \right) \\&\quad < \delta _{{\mathrm{Tr}}}(\epsilon ). \end{aligned} \end{aligned}$$

The result now follows from the continuity of ${\mathrm{Tr}}(\cdot )$. $\square $

Lemma 5

(Polynomial algebra) Let ${\mathscr {F}}=\{F^{T}\}$ be a family of functionals induced by dissipative quantum systems defined by Eqs. (1) and (3) in the main text. If for each member $F^{T} \in {\mathscr {F}}$, T satisfies the conditions in Lemma 3, then for any null sequence w, ${\mathscr {F}}$ forms a polynomial algebra consisting of w-fading memory functionals.

Proof

Consider two dissipative quantum systems described by Eqs. (1) and (3), with $n_{1}$ and $n_{2}$ system qubits, respectively. Let $\rho _{k}^{(m)} \in {\mathscr {D}}({\mathbb {C}}^{2^{n_m}})$ be the state and $T^{(m)}$ be the CPTP map of the $m{\mathrm{th}}$ system. Let $j_{1} = 1, \ldots , n_{1}$ and $j_{2} = 1, \ldots , n_{2}$ be the respective qubit indices for the two systems. For the observable $Z^{(j_{m})}$ of qubit $j_{m}$, notice that

$$\begin{aligned} \begin{aligned} {\mathrm{Tr}}\left( Z^{(j_{1})} \rho _{k}^{(1)} \right)&= {\mathrm{Tr}} \left( (Z^{(j_{1})} \otimes I) (\rho _{k}^{(1)} \otimes \rho _{k}^{(2)}) \right) , \\ {\mathrm{Tr}}\left( Z^{(j_{2})} \rho _{k}^{(2)} \right)&= {\mathrm{Tr}} \left( ( I \otimes Z^{(j_{2})}) (\rho _{k}^{(1)} \otimes \rho _{k}^{(2)}) \right) , \end{aligned} \end{aligned}$$

where I is the identity operator. Therefore, we can relabel the qubit for the combined system described by the density operator $\rho _{k}^{(1)} \otimes \rho _{k}^{(2)}$ by j, running from $j = 1$ to $j = n_{1} + n_{2}$. Using this notation, the above expectations can be re-expressed as

$$\begin{aligned} \begin{aligned} {\mathrm{Tr}}\left( Z^{(j_{1})} \rho _{k}^{(1)} \right)&= {\mathrm{Tr}}\left( Z^{(j)} \rho _{k}^{(1)} \otimes \rho _{k}^{(2)} \right) , \quad j = j_{1}\\ {\mathrm{Tr}}\left( Z^{(j_{2})} \rho _{k}^{(2)} \right)&= {\mathrm{Tr}}\left( Z^{(j)} \rho _{k}^{(1)} \otimes \rho _{k}^{(2)} \right) , \quad j = n_{1} + j_{2}. \end{aligned} \end{aligned}$$

Following this idea, write out the outputs of two systems as follows,

$$\begin{aligned} {\bar{y}}_{k}^{(1)}= & {} C_{1} + \sum _{d_1 = 1 }^{R_1} \sum _{i_1 = 1}^{n_1} \cdots \sum _{i_{n_1} = i_{n_1 - 1} + 1 }^{n_1} \sum _{r_{i_1} + \cdots +r_{i_{n_1}} = d_1} w_{i_1, \ldots , i_{n_1}}^{r_{i_1}, \ldots , r_{i_{n_1}}} \langle Z^{(i_1)} \rangle _{k}^{r_{i_1}} \cdots \langle Z^{(i_{n_1})} \rangle _{k}^{r_{i_{n_1}}}, \\ {\bar{y}}_{k}^{(2)}= & {} C_{2} + \sum _{d_2 = 1 }^{R_2} \sum _{j_1 = 1}^{n_2} \cdots \sum _{j_{n_2} = j_{n_2 - 1} + 1 }^{n_2} \sum _{r_{j_1} + \cdots +r_{j_{n_2}} = d_2} w_{j_1, \ldots , j_{n_2}}^{r_{j_1}, \ldots , r_{j_{n_2}}} \langle Z^{(j_1)} \rangle _{k}^{r_{j_1}} \cdots \langle Z^{(j_{n_2})} \rangle _{k}^{r_{j_{n_2}}}. \end{aligned}$$

For any $\lambda \in {\mathbb {R}}$, let $n = n_{1} + n_{2}$ and k denote the qubit index of the combined system running from $k=1$ to $k=n$, and $R = \max \{R_1, R_2\}$, then

$$\begin{aligned} {\bar{y}}_{k}^{(1)} + \lambda {\bar{y}}_{k}^{(2)}= & {} C_{1} + \lambda C_{2} + \sum _{d=1}^{R} \sum _{k_{1} = 1}^{n} \cdots \sum _{k_{n} = k_{n-1} + 1}^{n} \sum _{r_{k_{1}} + \cdots + r_{k_n} = d} {\bar{w}}_{k_1, \ldots , k_n}^{r_{k_1}, \ldots , r_{k_n}} \langle Z^{(k_1)} \rangle _{k}^{r_{k_1}} \cdots \langle Z^{(k_n)} \rangle _{k}^{r_{k_n}}, \end{aligned}$$

where the weights ${\bar{w}}_{k_1, \ldots , k_n}^{r_{k_1}, \ldots , r_{k_n}}$ are changed accordingly. For instance, if all $k_{m} \le n_{1}$ for $m=1,\ldots ,n$, then ${\bar{w}}_{k_1, \ldots , k_n}^{r_{k_1}, \ldots , r_{k_n}} = w_{i_1, \ldots , i_{n_1}}^{r_{i_1}, \ldots , r_{i_{n_1}}}$, corresponding to the weights for the output ${\bar{y}}^{(1)}_{k}$. Similarly, let $R = R_1 + R_2$,

$$\begin{aligned} {\bar{y}}_{k}^{(1)} {\bar{y}}_{k}^{(2)}= & {} C_{1} C_{2} + \sum _{d=1}^{R} \sum _{k_{1} = 1}^{n} \cdots \sum _{k_{n} = k_{n-1} + 1}^{n} \sum _{r_{k_{1}} + \cdots + r_{k_n} = d} {\hat{w}}_{k_1, \ldots , k_n}^{r_{k_1}, \ldots , r_{k_n}} \langle Z^{(k_1)} \rangle _{k}^{r_{k_1}} \cdots \langle Z^{(k_n)} \rangle _{k}^{r_{k_n}}. \end{aligned}$$

Therefore, ${\bar{y}}_{k}^{(1)} + \lambda {\bar{y}}_{k}^{(2)}$ and ${\bar{y}}_{k}^{(1)} {\bar{y}}_{k}^{(2)}$ again have the same form as the right-hand side of Eq. (3) in the main text. This implies that for any functionals $F^{T^{(1)}}, F^{T^{(2)}} \in {\mathscr {F}}$, $F^{T^{(1)}} + \lambda F^{T^{(2)}} \in {\mathscr {F}}$ and $F^{T^{(1)}} F^{T^{(2)}} \in {\mathscr {F}}$. Thus, ${\mathscr {F}}$ forms a polynomial algebra.

It remains to show that for all $u_k \in D \cap [-L, L]$, $\Vert T(u_k) |_{H_0(2^{n})} \Vert _{2-2} = \Vert (T^{(1)}(u_k) \otimes T^{(2)}(u_k))|_{H_0(2^n)} \Vert _{2-2} \le 1-\epsilon $ for some $0 < \epsilon \le 1$. This will imply that $F^{T^{(1)}} + \lambda F^{T^{(2)}}$ and $F^{T^{(1)}} F^{T^{(2)}}$ are w-fading memory by Lemma 3, and that ${\mathscr {F}}$ forms a polynomial algebra consisting of w-fading memory functionals. Suppose that for all $u_k \in D \cap [-L, L]$, $\Vert T(u_{k})|_{H_0(2^{n_m})}\Vert _{2-2} \le 1-\epsilon _{m}$ for $m=1, 2$. Adopting the proof of [46, Proposition 3], let $A = \sum _{i} A_{i} \otimes \tilde{A}_{i}$ be a traceless Hermitian operator. Without loss of generality, we assume that $\{\tilde{A}_i\}$ is an orthonormal set with respect to the Hilbert–Schmidt inner product. Then $\{A_{i} \otimes \tilde{A}_{i}\}$ and $\{T^{(1)}(u_k)|_{H_0(2^{n_{1}})} A_{i} \otimes \tilde{A}_{i} \}$ are two orthogonal sets. By the Pythagoras theorem, $T^{(1)}(u_k)|_{H_0(2^{n_{1}})} \otimes I$ on the hyperplane of traceless Hermitian operators satisfies

$$\begin{aligned} \begin{aligned}&\Vert (T^{(1)}(u_k)|_{H_0(2^{n_{1}})} \otimes I ) \sum _{i} A_{i} \otimes \tilde{A}_{i} \Vert _{2}^2 =\sum _{i} \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_{1}})} A_{i} \otimes \tilde{A}_i \Vert _{2}^2 \\&\quad = \sum _{i} \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_{1}})}A_{i} \Vert _2^{2} \Vert \tilde{A}_i \Vert _2^2 \le \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_{1}})} \Vert _{2-2}^2 \sum _{i} \Vert A_i \Vert _2^2 \Vert \tilde{A}_i \Vert _2^2 \\&\quad = \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_{1}})} \Vert _{2-2}^2 \sum _{i}\Vert A_i\otimes \tilde{A}_i\Vert _2^2 = \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_{1}})} \Vert _{2-2}^2 \Vert \sum _{i} A_i\otimes \tilde{A}_i\Vert _2^2 . \end{aligned} \end{aligned}$$

Therefore, $\Vert T^{(1)}(u_k)|_{H_0(2^{n_1})} \otimes I \Vert _{2-2} \le \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_1})}\Vert _{2-2}$. Similarly, a symmetric argument shows that $\Vert I \otimes T^{(2)}(u_k)|_{H_0(2^{n_2})}\Vert _{2-2} \le \Vert T^{(2)}(u_k)|_{H_{0}(2^{n_2})}\Vert _{2-2}$. Therefore, when restricted to traceless Hermitian operators,

$$\begin{aligned} \begin{aligned}&\Vert (T^{(1)}(u_k) \otimes T^{(2)}(u_k))|_{H_0(2^{n})}\Vert _{2-2} = \Vert (T^{(1)}(u_k)|_{H_0(2^{n_1})} \otimes I) (I \otimes T^{(2)}(u_k)|_{H_0(2^{n_2})})\Vert _{2-2} \\&\quad \le \Vert T^{(1)}(u_k)|_{H_0(2^{n_1})} \otimes I\Vert _{2-2} \Vert I \otimes T^{(2)}(u_k)|_{H_0(2^{n_2})} \Vert _{2-2} \\&\quad \le \Vert T^{(1)}(u_k)|_{H_{0}(2^{n_1})}\Vert _{2-2} \Vert T^{(2)}(u_k)|_{H_{0}(2^{n_2})}\Vert _{2-2} \le (1-\epsilon _{1})(1-\epsilon _{2}). \end{aligned} \end{aligned}$$

The convergence of T follows from Theorem 1. $\square $

1.4 A universal class

We now prove the universality of the class of dissipative quantum systems introduced in the main text. Recall that this class consists of N non-interacting quantum subsystems initialized in a product state of the N subsystems, where the dynamics of subsystem K with $n_K$ qubits is governed by the CPTP map:

$$\begin{aligned} T_{K}(u_k) \rho _{k-1}^{K} = {\mathrm{Tr}}_{i_{0}^{K}} (e^{-i H_{K} \tau } \rho _{k-1}^{K} \otimes \rho _{i_0, k}^{K} e^{i H_{k} \tau }), \end{aligned}$$

(8)

where

$$\begin{aligned} \rho _{i_{0}, k}^{K}= & {} u_{k} |0 \rangle \langle 0| + (1-u_{k}) |1 \rangle \langle 1|, \quad 0 \le u_{k} \le 1\nonumber \\ H_{K}= & {} \sum _{j_1 = 0}^{n_{K}} \sum _{j_2 = j_1 + 1}^{n_K} J_{K}^{j_1, j_2} (X^{(i^K_{j_{1}})} X^{(i^K_{j_{2}})} + Y^{(i^K_{j_{1}})} Y^{(i^K_{j_{2}})}) + \sum _{j=0}^{n_{K}} \alpha Z^{(i_j^K)}, \end{aligned}$$

(9)

with $J^{j_{1}, j_{2}}_{K}$ and $\alpha $ being real-valued constants and ${\mathrm{Tr}}_{i_{0}^{K}}$ denoting the partial trace over the ancilla qubit. Let ${\overline{H}}_{K} = I \otimes \cdots \otimes H_{K} \otimes \cdots \otimes I$ with $H_{K}$ in the Kth position, the total Hamiltonian of N subsystems is

$$\begin{aligned} \begin{aligned} H&= \sum _{K = 1}^{N} {\overline{H}}_{K}. \end{aligned} \end{aligned}$$

(10)

Writing $\rho _{k} = \bigotimes _{K=1}^{N} \rho _{k}^{K}$, the overall dynamics and the output are given by

$$\begin{aligned} {\left\{ \begin{array}{ll} \rho _{k} = T(u_{k}) \rho _{k-1} = \bigotimes _{K=1}^{N} T_{K}(u_{k}) \rho ^{K}_{k-1}\\ {\bar{y}}_{k} = h(\rho _{k}), \end{array}\right. } \end{aligned}$$

(11)

where h is the multivariate polynomial defined by the right-hand side of Eq. (3) in the main text.

Proposition 1

Let ${\mathscr {M}}_{S}$ be the set of filters induced from dissipative quantum systems described by Eq. (11) such that each $T_{K}$ $(K = 1, \ldots , N)$ satisfies conditions in Theorem 1. Then for any null sequence w, the corresponding family of functionals ${\mathscr {F}}_{S}$ is dense in $C(K_{1}^{-}([0, 1]), \Vert \cdot \Vert _{w})$.

Proof

We first show $T_{K}(x)$ satisfies the conditions in Lemma 3 for all $x \in [0, 1]$. Let $x, y \in [0, 1]$ and Z be the Pauli Z operator. By definition,

$$\begin{aligned} \begin{aligned} \Vert T_{K} (x) - T_{K}(y) \Vert _{2-2}&= \sup _{\begin{array}{c} A \in {\mathscr {L}}({\mathbb {C}}^{2^n}) \\ \Vert A \Vert _{2} = 1 \end{array}} \Vert (T_{K}(x) - T_{K}(y)) A \Vert _{2} \\&= \sup _{\begin{array}{c} A \in {\mathscr {L}}({\mathbb {C}}^{2^{n}}) \\ \Vert A\Vert _{2} = 1 \end{array}} \Vert {\mathrm{Tr}}_{i_{0}}^{K}(e^{-i H_{K} \tau } A \otimes (x-y) Z e^{i H_{K} \tau }) \Vert _{2} \\&= |x-y| \sup _{\begin{array}{c} A \in {\mathscr {L}}({\mathbb {C}}^{2^{n}}) \\ \Vert A \Vert _{2} = 1 \end{array}} \Vert {\mathrm{Tr}}_{i_{0}}^{K}(e^{-i H_{K} \tau } A \otimes Z e^{i H_{K} \tau }) \Vert _{2} \\&= |x - y| \Vert \tilde{T} \Vert _{2-2}, \end{aligned} \end{aligned}$$

where $\tilde{T}$ is an input-independent CPTP map.

Now, the same argument in the proof of Lemma 5 shows that $T = T_{1} \otimes \cdots \otimes T_{N}$ is convergent given the assumptions on each $T_K$. Furthermore, given two convergent systems whose dynamics are described by Eq. (11) with Hamiltonians $H^{(1)}$ and $H^{(2)}$, the total Hamiltonian of the combined system is $H = H^{(1)} \otimes I + I \otimes H^{(2)}$, which again has the form Eq. (10). Therefore, by the above observation and Lemma 5, ${\mathscr {F}}_{S}$ forms a polynomial algebra, consisting of w-fading memory functionals for any null sequence w.

It remains to show ${\mathscr {F}}_{S}$ contains constants and separates points. Constants can be obtained by setting the weights $w_{i_1, \ldots , i_{n}}^{r_{i_1}, \ldots , r_{i_n}}$ in the output to be zero. To show the family ${\mathscr {F}}_{S}$ separates points, we state the following lemma for later use, whose proof can be found in [47, Theorem 3.2].

Lemma 6

Let $f(\theta ) = \sum _{n=0}^{\infty } x_{n} \theta ^{n}$ be a non-constant real power series, having a nonzero radius of convergence. If $f(0) = 0$, then there exists $\beta >0$ such that $f(\theta ) \ne 0$ for all $\theta $ with $\left| \theta \right| \le \beta $ and $\theta \ne 0$.

Consider a single-qubit system interacting with a single ancilla qubit whose dynamics is governed by Eq. (11). Order an orthogonal basis of ${\mathscr {L}}({\mathbb {C}}^{2})$ as ${\mathscr {B}} = \{I, Z, X, Y\}$. Recall that the normal representations of a CPTP map T and a density operator $\rho $ are given by

$$\begin{aligned} {\overline{T}}_{i,j} = \frac{{\mathrm{Tr}}\left( B_{i} T(B_{j}) \right) }{2} \qquad \text {and} \qquad {\overline{\rho }}_{i} = \frac{{\mathrm{Tr}}(\rho B_{i})}{2}, \end{aligned}$$

where $B_{i} \in {\mathscr {B}}$. Without loss of generality, let $\tau = 1$ and set $J_{1}^{j_1, j_2} = J \in {\mathbb {R}}$ for all $j_{1}, j_{2}$ in the Hamiltonian given by Eq. (9). We obtain the normal representation of the CPTP map defined in Eq. (8) as

$$\begin{aligned} {\overline{T}}(u_{k}) = \begin{pmatrix} 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ \sin ^2(2J)(2u_{k}-1) &{}\quad \cos ^{2}(2J) &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad \cos (2J)\cos (2\alpha ) &{}\quad -\cos (2J)\sin (2\alpha ) \\ 0 &{}\quad 0 &{}\quad \cos (2J)\sin (2\alpha ) &{}\quad \cos (2J)\cos (2\alpha ) \end{pmatrix}. \end{aligned}$$

When restricted to the hyperplane of traceless Hermitian operators,

$$\begin{aligned} {\overline{T}}|_{H_{0}(2)} = \begin{pmatrix} \cos ^{2}(2J) &{}\quad 0 &{}\quad 0\\ 0 &{}\quad \cos (2J)\cos (2\alpha ) &{}\quad -\cos (2J)\sin (2\alpha ) \\ 0 &{}\quad \cos (2J)\sin (2\alpha ) &{}\quad \cos (2J)\cos (2\alpha ) \end{pmatrix} \end{aligned}$$

with $\left\| {\overline{T}}|_{H_{0}(2)} \right\| _{2-2} = \sigma _{\max }({\overline{T}}|_{H_{0}(2)}) = |\cos (2J)|$. Here, $\left\| \cdot \right\| _{2-2}$ is the matrix 2-norm and $\sigma _{\max }(\cdot )$ is the maximum singular value. Choose $J \ne \frac{z \pi }{2}$ for $z \in {\mathbb {Z}}$, then $|\cos (2J)| \le 1-\epsilon $ for some $0<\epsilon \le 1$. By Theorem 1, T is convergent and we choose an arbitrary initial density operator ${\overline{\rho }}_{-\infty } = \begin{pmatrix} 1/2&1/2&0&0 \end{pmatrix}^{T}$, corresponding to $\rho _{-\infty } = |0 \rangle \langle 0|$. If we only take the expectation $\langle Z \rangle $ in the output Eq. (3) by setting the degree $R=1$, then this single-qubit dissipative quantum system induces a functional

$$\begin{aligned} \begin{aligned} F^{T}(u) = w \left[ \left( \overrightarrow{\prod }_{j=0}^{\infty } {\overline{T}}(u_{-j}) \right) {\overline{\rho }}_{-\infty } \right] _{2} + C, \end{aligned} \end{aligned}$$

for all $u \in K^{-}_{1}([0, 1])$ and $w\ne 0$. Here, $[\cdot ]_{2}$ refers to the second element of the vector, corresponding to $\langle Z \rangle $ given the order of the orthogonal basis elements in ${\mathscr {B}}$. Given two input sequences $u \ne v$ in $K^{-}_{1}([0, 1])$, consider two cases:

(i)
If $u_{0} \ne v_{0}$, choose $J = \frac{\pi }{4}$ such that $\cos ^2(2J) = 0$ and $\sin ^{2}(2J)=1$. Then
$$\begin{aligned} F^{T}(u) - F^{T}(v) = w(u_{0} - v_{0}) \ne 0. \end{aligned}$$
(ii)
If $u_{0} = v_{0}$,
$$\begin{aligned} F^{T}(u) - F^{T}(v) = w \sin ^2(2J) \sum _{j=0}^{\infty } \left( \cos ^2(2J) \right) ^{j}(u_{-j} - v_{-j}). \end{aligned}$$
Let $\theta = \cos ^{2}(2J)$, then given our choice of J, $0 \le \theta \le 1-\epsilon $ and $\sin ^2(2J) \ge \epsilon $ for some $0<\epsilon \le 1$. Consider the power series
$$\begin{aligned} f(\theta ) = \sum _{j=0}^{\infty } \theta ^{j} (u_{-j} - v_{-j}), \end{aligned}$$
since $\left| u_{-j} - v_{-j} \right| \le 1$, $f(\theta )$ has a nonzero radius of convergence R such that $(-1, 1) \subseteq R$. Moreover, $f(\theta )$ is non-constant and $f(0) = 0$. The separation of points follows from invoking Lemma 6.

Finally, the universality property of ${\mathscr {F}}_{S}$ follows from Theorem 3. $\square $

1.5 Detailed numerical experiment settings

In this section, we describe detailed formulas for the NARMA tasks, simulation of decoherence and experimental conditions for ESNs and the Volterra series.

1.5.1 The NARMA task

The general mth-order NARMA I/O map is described as [31]:

$$\begin{aligned} y_{k} = 0.3 y_{k-1} + 0.05 y_{k-1} \left( \sum _{j=0}^{\tau _{\mathrm{NARMA}}-1} y_{k-j-1} \right) + 1.5 u_{k-\tau _{\mathrm{NARMA}}} u_{k} + \gamma . \end{aligned}$$

where $\gamma \in {\mathbb {R}}$. In the main text, we consider $\tau _{\mathrm{NARMA}} = \{15, 20, 30, 40\}$. For $\tau _{\mathrm{NARMA}} = \{15, 20\}$, we set $\gamma = 0.1$. For $\tau _{\mathrm{NARMA}} = \{30, 40\}$, $\gamma $ is set to be 0.05 and 0.04, respectively. A random input sequence $u^{(r)}$, where each $u_{k}^{(r)}$ is randomly uniformly chosen from [0, 0.2], is deployed for all the computational tasks. This range is chosen to ensure stability of the NARMA tasks.

1.5.2 Decoherence

We consider the dephasing, decaying and generalized amplitude damping (GAD) noise, which are of experimental importance. The dephasing noise has the Kraus operators [38]:

$$\begin{aligned} M_{0} = \sqrt{\frac{1 + \sqrt{1-p}}{2}} I, M_{1} = \sqrt{\frac{1 - \sqrt{1-p}}{2}} Z, \end{aligned}$$

where $\sqrt{1-p} = e^{-2 \frac{\gamma }{S}\delta _{t}}$. Therefore, we implement single-qubit phase-flip for all n system and ancilla qubits. That is, for $j=1,\ldots , n+1$ the density operator $\rho $ for the system and ancilla qubits undergoes the evolution:

$$\begin{aligned} \rho \rightarrow \frac{1+e^{-2 \frac{\gamma }{S} \delta _{t}}}{2} \rho + \frac{1-e^{-2 \frac{\gamma }{S} \delta _{t}}}{2} Z^{(j)} \rho Z^{(j)}, \end{aligned}$$

where $Z^{(j)}$ denotes the Pauli Z operator for qubit j.

The generalized amplitude damping (GAD) channel captures the effect of dissipation to an environment at a finite temperature $\lambda \in [0, 1]$. Its Kraus operators are defined by

$$\begin{aligned} \begin{aligned}&M_{0} = \sqrt{\lambda }\begin{pmatrix} 1 &{} 0 \\ 0 &{} \sqrt{1-p} \end{pmatrix}, M_{2} = \sqrt{\lambda }\begin{pmatrix}0 &{} \sqrt{p}\\ 0 &{} 0 \end{pmatrix}, \\&M_{3} = \sqrt{1-\lambda } \begin{pmatrix} \sqrt{1-p} &{} 0 \\ 0 &{} 1 \end{pmatrix}, M_{4} = \sqrt{1-\lambda } \begin{pmatrix} 0 &{} 0 \\ \sqrt{p} &{} 0\end{pmatrix}. \end{aligned} \end{aligned}$$

When $\lambda =1$, the GAD channel corresponds to the amplitude damping channel (decaying noise). We simulate the generalized amplitude damping channel for $\lambda =\{0.2, 0.4, 0.6, 0.8\}$. To implement the GAD channel with the same noise strengths as the dephasing channel, we set $\sqrt{1-p} = e^{-2 \frac{\gamma }{S} \delta _{t}}, \sqrt{p} = \sqrt{1-e^{-4 \frac{\gamma }{S} \delta _t}}$ to be the same as the dephasing noise.

Following the discussion in Sect. 5.2, Fig. 8 plots the average SA NMSE for the LRPO, Missile, NARMA15 and NARMA20 tasks under the GAD channel for all the chosen temperature parameters. Figures 9 and 10 plot the average sum of modulus of off-diagonal elements in the system density operator, for the last 50 timesteps of the SA samples, under all noise types discussed above.

1.5.3 The echo state networks

An ESN with m reservoir nodes is a type of recurrent neural network with a $m \times 1$ input matrix $W_{i}$, a $m \times m$ reservoir matrix $W_{r}$ and an $1 \times m$ output matrix $W_{o}$. The state evolution and output are given by [19]

$$\begin{aligned} {\left\{ \begin{array}{ll} x_{k} = \tanh (W_{r}x_{k-1} + W_{i}u_{k}) \\ {\hat{y}}_{k} = W_{o} x_k + w_{c}, \end{array}\right. } \end{aligned}$$

where $w_{c}$ is a tunable constant and $\tanh (\cdot )$ is an element-wise operation.

In the numerical examples, lengths of washout, learning and evaluation phases for ESNs and SA are the same. Given an output sequence y to be learned, the output weights $w_{c}$ and $W_{o}$ are optimized via standard least squares to minimize $\sum _{k}|y_{k} - {\hat{y}}_{k}|^2$, for timesteps k during the training phase. We now detail the experimental conditions for ESNs in various subsections of the numerical experiments (Sect. 5).

For the comparison given in Sect. 5.1, we set the reservoir size to be $m \in {\mathscr {M}}=\{10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 800\}$. Here, the number of computational nodes is $m+1$ for each m. For each computational task and each m, the average NMSE of 100 ESNs is reported. The average NMSE for ESNs is obtained as follows. For each reservoir size m, we prepare 100 ESNs with elements of $W_{r}$ randomly uniformly chosen $[-2, 2]$. Let ${\mathscr {S}}$ denote the set of 10 points evenly spaced between [0.01, 0.99]. For each of the 100 ESNs, we scale the maximum singular value of $W_{r}$ to $\sigma _{\max }(W_{r})=s$ for all $s \in {\mathscr {S}}$. This ensures the convergence and fading memory property of ESNs [26]. For each of the chosen s, the elements of $W_{i}$ are randomly uniformly chosen within $[-\delta , \delta ]$, where $\delta $ is chosen from the set ${\mathscr {I}}$ of 10 points evenly spaced between [0.01, 1]. Now, for the ith ($i=1, \ldots , 100$) ESN with parameter $(m, s, \delta )$, we denote its associated NMSE to be ${\mathrm{NMSE}}_{(m, s, \delta , i)}$. For each reservoir size m, the average NMSE is computed as $\frac{1}{\mathscr {|S|}} \frac{1}{\mathscr {|I|}} \frac{1}{100} \sum _{s \in {\mathscr {S}}}\sum _{\delta \in {\mathscr {I}}} \sum _{i=1}^{100}{\mathrm{NMSE}}_{(m, s, \delta , i)}$. Figure 11 summarizes the average ESNs NMSE for the LRPO, Missile, NARMA15 and NARMA20 tasks.

Table 2 Values of o and p for the Volterra series and the corresponding number of computational nodes

Full size table

For the further comparison in Sect. 5.4, ESNs are simulated to approximate the LRPO, Missile, NARMA15, NARMA20, NARMA30 and NARMA40 tasks. The reservoir size of ESNs for each task is set to be $m \in {\mathscr {M}} = \{256, 300, 400, 500\}$. For each m, the number of computational nodes ${\mathscr {C}}$ for ESNs is

$$\begin{aligned} {\mathscr {C}} \in {\mathscr {N}}_{4} \cup {\mathscr {N}}_{5} \cup {\mathscr {N}}_{6} = \{5, 6, 7, 15, 21, 28, 35, 56, 70, 84, 126, 210, 252\}, \end{aligned}$$

where ${\mathscr {N}}_{n}$ denotes the chosen numbers of computational nodes for n-qubit SA defined as follows. Recall that in this experiment, 4-, 5- and 6-qubit SA with varying degrees R in the output are chosen. For 4-qubit SA, $R_{4}=\{1, \ldots , 6\}$ correspond to the number of computational nodes ${\mathscr {N}}_{4} = \{5, 15, 35, 70, 126, 210\}$. For 5-qubit SA, $R_{5}=\{1, \ldots , 5\}$, such that ${\mathscr {N}}_{5} = \{6, 21, 56, 126, 252\}$. For 6-qubit SA, $R_{6}=\{1, \ldots , 4\}$, such that ${\mathscr {N}}_{6} = \{7, 28, 84, 210\}$. To compute the output weights $W_{o}$ and $w_{c}$ when ${\mathscr {C}} < m+1$, we first optimize $W_{o}$ and $w_{c}$ by standard least squares. Then choose ${\mathscr {C}}-1$ elements of $W_{o}$ with the largest absolute values and their corresponding elements $x_{k}'$ from the state $x_k$. These ${\mathscr {C}}-1$ state elements $x'_k$ are used to re-optimize ${\mathscr {C}}-1$ elements $W'_{o}$ of $W_{o}$ and $w'_{c}$ via standard least squares. At each timestep k, the full state $x_k$ evolves, while the output is computed as ${\hat{y}}' = W'_{o} x'_{k} + w'_{c}$. For this numerical experiment, the chosen parameters ${\mathscr {S}}$ and ${\mathscr {I}}$ of ESNs are the same as above. For the ith ESN with parameter $(m, s, \delta )$, the number of computational nodes ${\mathscr {C}}$ varies. Let ${\mathrm{NMSE}}_{(m, {\mathscr {C}}, s, \delta , i)}$ denote the corresponding NMSE. For each m and each ${\mathscr {C}}$, we report the average NMSE computed as $\frac{1}{\mathscr {|S|}} \frac{1}{\mathscr {|I|}} \frac{1}{100} \sum _{s \in {\mathscr {S}}}\sum _{\delta \in {\mathscr {I}}} \sum _{i=1}^{100}{\mathrm{NMSE}}_{(m, {\mathscr {C}}, s, \delta , i)}$.

1.5.4 The Volterra series

The discrete-time finite Volterra series with kernel order o and memory p is given by [23]

$$\begin{aligned} {\hat{y}}_k = h_0 + \sum _{i=1}^{o} \sum _{j_1, \ldots , j_i = 0}^{p-1} h_{i}^{j_1, \ldots , j_i} \prod _{l = 1}^{i} u_{k-j_l}, \end{aligned}$$

where $u_{k - j}$ is the delayed input and $h_{0}$ and $h_{i}^{j_1, \ldots , j_i}$ are real-valued kernel coefficients (or output weights in our context). Notice that when memory $p=1$, the Volterra series is a map from the current input $u_{k}$ to the output ${\hat{y}}_{k}$. The kernel coefficients are optimized via linear least squares to minimize $\sum _{k} |y_{k} - {\hat{y}}_{k}|^2$ during the training phase, where y is the target output sequence to be learned.

The number of computational nodes, that is the number of kernel coefficients $h_{0}$ and $h_{i}^{j_1, \ldots , j_i}$, is given by $(p^{o+1} - p) / (p - 1) + 1$. We vary the parameters of the Volterra series as follows: For each $o = \{2, \ldots , 8 \}$, choose p from $\{2, \ldots , 27\}$ such that the maximum number of computational nodes does not exceed 801. Note that for $o=1$, the output of the Volterra series is a linear function of delayed inputs. Since we are interested in nonlinear I/O maps, we choose $o \ge 2$. Table 2 summarizes the number of computational nodes as o and p vary. Figure 12 shows the Volterra series NMSE according to the kernel order and memory.

It is observed in Fig. 12 that as the kernel order increases, the Volterra series task performance does not improve. On the other hand, as the memory increases for kernel order 2, the Volterra series task performance improves. The improvement is particularly significant as the memory p coincides with the delay for NARMA tasks, that is when $p = \tau _{\mathrm{NARMA}} + 1$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Nurdin, H.I. Learning nonlinear input–output maps with dissipative quantum systems. Quantum Inf Process 18, 198 (2019). https://doi.org/10.1007/s11128-019-2311-9

Download citation

Received: 17 October 2018
Accepted: 03 May 2019
Published: 15 May 2019
DOI: https://doi.org/10.1007/s11128-019-2311-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning nonlinear input–output maps with dissipative quantum systems

Abstract

Access this article

Similar content being viewed by others

Learning quantum systems

Stationary states of a dissipative two-qubit quantum channel and their applications for quantum machine learning

Quantum learning with noise and decoherence: a robust quantum neural network

Change history

10 October 2019

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 The convergence property

Definition 1

Definition 2

Theorem 1

Proof

1.2 The universality property

Lemma 1

Proof

Definition 3

Definition 4

Lemma 2

Theorem 2

Theorem 3

Proof

1.3 Fading memory property and polynomial algebra

Lemma 3

Proof

Lemma 4

Lemma 5

Proof

1.4 A universal class

Proposition 1

Proof

Lemma 6

1.5 Detailed numerical experiment settings

1.5.1 The NARMA task

1.5.2 Decoherence

1.5.3 The echo state networks

1.5.4 The Volterra series

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation