A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

Zhao, Xingyu; Ding, Shifei; An, Yuexuan

doi:10.1007/978-3-030-01520-6_15

A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

Xingyu Zhao⁷,
Shifei Ding⁷ &
Yuexuan An⁷

Conference paper
First Online: 17 October 2018

361 Accesses

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 10))

Abstract

In recent years, people have combined deep learning with reinforcement learning to solve practical problems. However, due to the characteristics of neural networks, it is very easy to fall into local minima when facing small scale discrete space path planning problems. Traditional reinforcement learning uses continuous updating of a single agent when algorithm executes, which leads to a slow convergence speed. In order to solve the above problems, we combine asynchronous methods with existing tabular reinforcement learning algorithms, propose a parallel architecture to solve the discrete space path planning problems, and present four new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environment FrozenLake problem, and the experimental results show that these methods can solve discrete space path planning problems efficiently. One of these algorithms, which is called Asynchronous Dyna-Q, surpasses existing asynchronous reinforcement learning algorithms, can well balance the exploration and exploitation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing atari with deep reinforcement learning. In: Proceedings of Workshops at the 26th Neural Information Processing Systems Lake Tahoe, USA, pp. 201–220 (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991)
Article Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Robot. Auton. Syst. 15(4), 233–235 (1989)
Google Scholar
Rummery, G.A., Niranjan, M.: On-Line Q-Learning Using Connectionist Systems. University of Cambridge, Department of Engineering (1994)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123–158 (1996)
MATH Google Scholar
Silver, D., Lever, G., Heess, N., et al.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning, pp. 387–395 (2014)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., et al.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1889–1897 (2015)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61672522, 61379101).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Xingyu Zhao, Shifei Ding & Yuexuan An

Authors

Xingyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yuexuan An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding .

Editor information

Editors and Affiliations

Institute of Information and Control, Hangzhou Dianzi University, Zhejiang, China
Jiuwen Cao
Department of Computer and Information Science, University of Macau, Macau, China
Chi Man Vong
Nokia Bell Labs, Espoo, Finland
Yoan Miche
Department of Information and Logistics, College of Technology at the University of Houston, Houston, TX, USA
Amaury Lendasse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, X., Ding, S., An, Y. (2019). A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms. In: Cao, J., Vong, C., Miche, Y., Lendasse, A. (eds) Proceedings of ELM-2017. ELM 2017. Proceedings in Adaptation, Learning and Optimization, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-01520-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-01520-6_15
Published: 17 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01519-0
Online ISBN: 978-3-030-01520-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics