Model-Based Exploration in Continuous State Spaces

Jong, Nicholas K.; Stone, Peter

doi:10.1007/978-3-540-73580-9_21

Nicholas K. Jong¹ &
Peter Stone¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4612))

Included in the following conference series:

International Symposium on Abstraction, Reformulation, and Approximation

737 Accesses
18 Citations

Abstract

Modern reinforcement learning algorithms effectively exploit experience data sampled from an unknown controlled dynamical system to compute a good control policy, but to obtain the necessary data they typically rely on naive exploration mechansisms or human domain knowledge. Approaches that first learn a model offer improved exploration in finite problems, but discrete model representations do not extend directly to continuous problems. This paper develops a method for approximating continuous models by fitting data to a finite sample of states, leading to finite representations compatible with existing model-based exploration mechanisms. Experiments with the resulting family of fitted-model reinforcement learning algorithms reveals the critical importance of how the continuous model is generalized from finite data. This paper demonstrates instantiations of fitted-model algorithms that lead to faster learning on benchmark problems than contemporary model-free RL algorithms that only apply generalization in estimating action values. Finally, the paper concludes that in continuous problems, the exploration-exploitation tradeoff is better construed as a balance between exploration and generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Watkins, C.: Learning From Delayed Rewards. PhD thesis, University of Cambridge (1989)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 260–268 (1998)
Google Scholar
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. Artificial Intelligence Review 11, 75–113 (1997)
Article Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Article Google Scholar
Gordon, G.J.: Stable function approximation in dynamic programming. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)
Article Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., West Sussex, England (1994)
MATH Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)
Google Scholar
Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (1995)
Google Scholar
Kekade, S.M.: On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London (2003)
Google Scholar
Dutech, A., Edmunds, T., Kok, J., Lagoudakis, M., Littman, M., Riedmiller, M., Russell, B., Scherrer, B., Sutton, R., Timmer, S., Vlassis, N., White, A., Whiteson, S.: Reinforcement learning benchmarks and bake-offs II (2005), http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/bakeoffs05.pdf
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Texas at Austin, Austin TX 78712, USA
Nicholas K. Jong & Peter Stone

Authors

Nicholas K. Jong
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ian Miguel Wheeler Ruml

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jong, N.K., Stone, P. (2007). Model-Based Exploration in Continuous State Spaces. In: Miguel, I., Ruml, W. (eds) Abstraction, Reformulation, and Approximation. SARA 2007. Lecture Notes in Computer Science(), vol 4612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73580-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-73580-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73579-3
Online ISBN: 978-3-540-73580-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics