Skip to main content

Model-Based Exploration in Continuous State Spaces

  • Conference paper
Abstraction, Reformulation, and Approximation (SARA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4612))

Abstract

Modern reinforcement learning algorithms effectively exploit experience data sampled from an unknown controlled dynamical system to compute a good control policy, but to obtain the necessary data they typically rely on naive exploration mechansisms or human domain knowledge. Approaches that first learn a model offer improved exploration in finite problems, but discrete model representations do not extend directly to continuous problems. This paper develops a method for approximating continuous models by fitting data to a finite sample of states, leading to finite representations compatible with existing model-based exploration mechanisms. Experiments with the resulting family of fitted-model reinforcement learning algorithms reveals the critical importance of how the continuous model is generalized from finite data. This paper demonstrates instantiations of fitted-model algorithms that lead to faster learning on benchmark problems than contemporary model-free RL algorithms that only apply generalization in estimating action values. Finally, the paper concludes that in continuous problems, the exploration-exploitation tradeoff is better construed as a balance between exploration and generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Watkins, C.: Learning From Delayed Rewards. PhD thesis, University of Cambridge (1989)

    Google Scholar 

  3. Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 260–268 (1998)

    Google Scholar 

  4. Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. Artificial Intelligence Review 11, 75–113 (1997)

    Article  Google Scholar 

  5. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)

    Article  Google Scholar 

  6. Gordon, G.J.: Stable function approximation in dynamic programming. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)

    Google Scholar 

  7. Brafman, R.I., Tennenholtz, M.: R-max – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)

    Article  Google Scholar 

  8. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., West Sussex, England (1994)

    MATH  Google Scholar 

  9. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  10. Littman, M.L., Dean, T.L., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (1995)

    Google Scholar 

  11. Kekade, S.M.: On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London (2003)

    Google Scholar 

  12. Dutech, A., Edmunds, T., Kok, J., Lagoudakis, M., Littman, M., Riedmiller, M., Russell, B., Scherrer, B., Sutton, R., Timmer, S., Vlassis, N., White, A., Whiteson, S.: Reinforcement learning benchmarks and bake-offs II (2005), http://www.cs.rutgers.edu/~mlittman/topics/nips05-mdp/bakeoffs05.pdf

  13. Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ian Miguel Wheeler Ruml

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jong, N.K., Stone, P. (2007). Model-Based Exploration in Continuous State Spaces. In: Miguel, I., Ruml, W. (eds) Abstraction, Reformulation, and Approximation. SARA 2007. Lecture Notes in Computer Science(), vol 4612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73580-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73580-9_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73579-3

  • Online ISBN: 978-3-540-73580-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics