Integrating Learning and Planning

Zhang, Huaqing; Huang, Ruitong; Zhang, Shanghang

doi:10.1007/978-981-15-4095-0_9

Huaqing Zhang⁴,
Ruitong Huang⁵ &
Shanghang Zhang⁶

10k Accesses

Abstract

In this chapter, reinforcement learning is analyzed from the perspective of learning and planning. We initially introduce the concepts of model and model-based methods, with the highlight of advantages on model planning. In order to include the benefits of both model-based and model-free methods, we present the integration architecture combining learning and planning, with detailed illustration on Dyna-Q algorithm. Finally, for the integration of learning and planning, the simulation-based search applications are analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Another option can be adding all the new nodes in the trajectory into the search tree.

References

Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intel AI Games 4(1):1–43
Article Google Scholar
Hafner D, Lillicrap T, Ba J, Norouzi M (2019) Dream to control: learning behaviors by latent imagination. Preprint. arXiv:191201603
Google Scholar
Kaiser L, Babaeizadeh M, Milos P, Osinski B, Campbell RH, Czechowski K, Erhan D, Finn C, Kozakowski P, Levine S, Mohiuddin A, Sepassi R, Tucker G, Michalewski H (2019) Model-based reinforcement learning for Atari. Preprint. arXiv:1903.00374
Google Scholar
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V, et al (2018) Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation. Preprint. arXiv:180610293
Google Scholar
Silver D, Sutton RS, Müller M (2008) Sample-based learning and search with permanent and transient memories. In: Proceedings of the 25th international conference on machine learning. ACM, New York, pp 968–975
Google Scholar
Silver D, Sutton RS, Müller M (2012) Temporal-difference search in computer go. Mach Learn 87(2):183–219
Article MathSciNet Google Scholar
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull 2(4):160–163
Article Google Scholar

Download references

Author information

Authors and Affiliations

Google LLC, Mountain View, CA, USA
Huaqing Zhang
Borealis AI, Toronto, ON, Canada
Ruitong Huang
University of California, Berkeley, CA, USA
Shanghang Zhang

Authors

Huaqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruitong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shanghang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EECS, Peking University, Beijing, China
Hao Dong
CS, Imperial College London, London, UK
Zihan Ding
EECS, University of California, Berkeley, Berkeley, USA
Shanghang Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, H., Huang, R., Zhang, S. (2020). Integrating Learning and Planning. In: Dong, H., Ding, Z., Zhang, S. (eds) Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-4095-0_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-4095-0_9
Published: 30 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4094-3
Online ISBN: 978-981-15-4095-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics