Taking Gradients Through Experiments: LSTMs and Memory Proximal Policy Optimization for Black-Box Quantum Control
Abstract
In this work we introduce a general method to solve quantum control tasks as an interesting reinforcement learning problem not yet discussed in the machine learning community. We analyze the structure of the reinforcement learning problems typically arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a versatile method to solving them. In this context we introduce a variant of the proximal policy optimization (PPO) algorithm called the memory proximal policy optimization (MPPO) which is based on the previous analysis. We argue that our method can by design be easily combined with numerical simulations as well as real experiments providing the reward signal. We demonstrate how the method can incorporate physical domain knowledge and present results of numerical experiments showing that it achieves state-of-the-art performance for several learning tasks in quantum control with discrete and continuous control parameters.
Keywords
Reinforcement learning Quantum control Numerical simulationReferences
- 1.August, M., Ni, X.: Using recurrent neural networks to optimize dynamical decoupling for quantum memory. Phys. Rev. A 95(1), 012335 (2017)CrossRefGoogle Scholar
- 2.Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)CrossRefGoogle Scholar
- 3.Bukov, M., Day, A.G., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Machine learning meets quantum state preparation. the phase diagram of quantum control. arXiv preprint arXiv:1705.00565 (2017)
- 4.Caneva, T., Calarco, T., Montangero, S.: Chopped random-basis quantum optimization. Phys. Rev. A 84(2), 022326 (2011)CrossRefGoogle Scholar
- 5.Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
- 6.Cohen, C., Tannoudji, B.D., Laloë, F.: Quantum Mechanics, vol. i and ii. Hermann and Wiley, Paris and Hoboken (1977)Google Scholar
- 7.Doria, P., Calarco, T., Montangero, S.: Optimal control technique for many-body quantum dynamics. Phys. Rev. Lett. 106, 190501 (2011). https://doi.org/10.1103/PhysRevLett.106.190501CrossRefGoogle Scholar
- 8.Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
- 9.Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T., Glaser, S.J.: Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms. J. Magn. Reson. 172(2), 296–305 (2005)CrossRefGoogle Scholar
- 10.Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
- 11.Melnikov, A.A., et al.: Active learning machine learns to create new quantum experiments. In: Proceedings of the National Academy of Sciences, p. 201714936 (2018)Google Scholar
- 12.Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
- 13.Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
- 14.Nielsen, M.A., Chuang, I.: Quantum computation and quantum information (2002)Google Scholar
- 15.Palittapongarnpim, P., Wittek, P., Zahedinejad, E., Vedaie, S., Sanders, B.C.: Learning in quantum control: high-dimensional global optimization for noisy quantum dynamics. Neurocomputing 268, 116–126 (2017)CrossRefGoogle Scholar
- 16.Quiroz, G., Lidar, D.A.: Optimized dynamical decoupling via genetic algorithms. Phys. Rev. A 88, 052306 (2013). https://doi.org/10.1103/PhysRevA.88.052306CrossRefGoogle Scholar
- 17.Robbins, H.: Some aspects of the sequential design of experiments. In: Lai, T.L., Siegmund, D. (eds.) Herbert Robbins Selected Papers, pp. 169–177. Springer, Newyork (1985)CrossRefGoogle Scholar
- 18.Sakurai, J.J., Commins, E.D.: Modern Quantum Mechanics, Revised edn. AAPT, College Park (1995)Google Scholar
- 19.Schollwöck, U.: The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 326(1), 96–192 (2011)MathSciNetCrossRefGoogle Scholar
- 20.Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)Google Scholar
- 21.Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
- 22.Silver, D., et al.: Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)
- 23.Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRefGoogle Scholar
- 24.Souza, A.M., Álvarez, G.A., Suter, D.: Robust dynamical decoupling for quantum computing and quantum memory. Phys. Rev. Lett. 106, 240501 (2011). https://doi.org/10.1103/PhysRevLett.106.240501CrossRefGoogle Scholar
- 25.Viola, L., Knill, E., Lloyd, S.: Dynamical decoupling of open quantum systems. Phys. Rev. Lett. 82, 2417–2421 (1999). https://doi.org/10.1103/PhysRevLett.82.2417MathSciNetCrossRefzbMATHGoogle Scholar
- 26.Wigley, P.B., et al.: Fast machine-learning online optimization of ultra-cold-atom experiments. Sci. Rep. 6, 25890 (2016)CrossRefGoogle Scholar
- 27.Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: Sutton, R.S. (ed.) Reinforcement Learning. SECS, vol. 173, pp. 5–32. Springer, Boston (1992). https://doi.org/10.1007/978-1-4615-3618-5_2CrossRefGoogle Scholar