Towards Finite-Sample Convergence of Direct Reinforcement Learning

Lim, Shiau Hong; DeJong, Gerald

doi:10.1007/11564096_25

Towards Finite-Sample Convergence of Direct Reinforcement Learning

Shiau Hong Lim²³ &
Gerald DeJong²³

Conference paper

5538 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3720))

Abstract

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.

Download to read the full chapter text

Chapter PDF

References

Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max, A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research 3, 213–231 (2002)
Article MathSciNet Google Scholar
Even-Dar, E., Mansour, Y.: Learning rates for Q-Learning. Journal of Machine Learning Research 5, 1–25 (2003)
MathSciNet Google Scholar
Kaelbling, L.: Learning in Embedded Systems. PhD thesis, Computer Science Department, Stanford University (1990)
Google Scholar
Kearns, M., Singh, S.: Near-Optimal Reinforcement Learning in Polynomial Time. In: Proc. of 15th ICML, pp. 260–268. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Kearns, M., Singh, S.: Finite-Sample Rates of Convergence for Q-Learning and Indirect Methods. In: Advances in Neural Information Processing Systems 11, pp. 996–1002. The MIT Press, Cambridge (1999)
Google Scholar
Koenig, S., Simmons, R.G.: The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement Learning Algorithms. Machine Learning 22(1/3), 227–250 (1996)
MATH Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. Report CUED/F-INFENG/TR 166, Cambridge University Engineering Dept. (1994)
Google Scholar
Sutton, R.: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In: Advances in Neural Information Processing Systems 8, pp. 1038–1044. MIT Press, Cambridge (1996)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge, England (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Illinois, Urbana-Champaign
Shiau Hong Lim & Gerald DeJong

Authors

Shiau Hong Lim
View author publications
You can also search for this author in PubMed Google Scholar
Gerald DeJong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, S.H., DeJong, G. (2005). Towards Finite-Sample Convergence of Direct Reinforcement Learning. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_25

Download citation

DOI: https://doi.org/10.1007/11564096_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics