Skip to main content

A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

  • Conference paper
  • First Online:
  • 361 Accesses

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 10))

Abstract

In recent years, people have combined deep learning with reinforcement learning to solve practical problems. However, due to the characteristics of neural networks, it is very easy to fall into local minima when facing small scale discrete space path planning problems. Traditional reinforcement learning uses continuous updating of a single agent when algorithm executes, which leads to a slow convergence speed. In order to solve the above problems, we combine asynchronous methods with existing tabular reinforcement learning algorithms, propose a parallel architecture to solve the discrete space path planning problems, and present four new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environment FrozenLake problem, and the experimental results show that these methods can solve discrete space path planning problems efficiently. One of these algorithms, which is called Asynchronous Dyna-Q, surpasses existing asynchronous reinforcement learning algorithms, can well balance the exploration and exploitation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)

    Google Scholar 

  2. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing atari with deep reinforcement learning. In: Proceedings of Workshops at the 26th Neural Information Processing Systems Lake Tahoe, USA, pp. 201–220 (2013)

    Google Scholar 

  3. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  4. Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  5. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  6. Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991)

    Article  Google Scholar 

  7. Watkins, C.J.C.H.: Learning from delayed rewards. Robot. Auton. Syst. 15(4), 233–235 (1989)

    Google Scholar 

  8. Rummery, G.A., Niranjan, M.: On-Line Q-Learning Using Connectionist Systems. University of Cambridge, Department of Engineering (1994)

    Google Scholar 

  9. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  10. Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123–158 (1996)

    MATH  Google Scholar 

  11. Silver, D., Lever, G., Heess, N., et al.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning, pp. 387–395 (2014)

    Google Scholar 

  12. Schulman, J., Levine, S., Abbeel, P., et al.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1889–1897 (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61672522, 61379101).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifei Ding .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, X., Ding, S., An, Y. (2019). A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms. In: Cao, J., Vong, C., Miche, Y., Lendasse, A. (eds) Proceedings of ELM-2017. ELM 2017. Proceedings in Adaptation, Learning and Optimization, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-01520-6_15

Download citation

Publish with us

Policies and ethics