Abstract
In recent years, people have combined deep learning with reinforcement learning to solve practical problems. However, due to the characteristics of neural networks, it is very easy to fall into local minima when facing small scale discrete space path planning problems. Traditional reinforcement learning uses continuous updating of a single agent when algorithm executes, which leads to a slow convergence speed. In order to solve the above problems, we combine asynchronous methods with existing tabular reinforcement learning algorithms, propose a parallel architecture to solve the discrete space path planning problems, and present four new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environment FrozenLake problem, and the experimental results show that these methods can solve discrete space path planning problems efficiently. One of these algorithms, which is called Asynchronous Dyna-Q, surpasses existing asynchronous reinforcement learning algorithms, can well balance the exploration and exploitation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing atari with deep reinforcement learning. In: Proceedings of Workshops at the 26th Neural Information Processing Systems Lake Tahoe, USA, pp. 201–220 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991)
Watkins, C.J.C.H.: Learning from delayed rewards. Robot. Auton. Syst. 15(4), 233–235 (1989)
Rummery, G.A., Niranjan, M.: On-Line Q-Learning Using Connectionist Systems. University of Cambridge, Department of Engineering (1994)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123–158 (1996)
Silver, D., Lever, G., Heess, N., et al.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning, pp. 387–395 (2014)
Schulman, J., Levine, S., Abbeel, P., et al.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1889–1897 (2015)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Nos. 61672522, 61379101).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, X., Ding, S., An, Y. (2019). A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms. In: Cao, J., Vong, C., Miche, Y., Lendasse, A. (eds) Proceedings of ELM-2017. ELM 2017. Proceedings in Adaptation, Learning and Optimization, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-01520-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-01520-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01519-0
Online ISBN: 978-3-030-01520-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)