Convergence and Divergence in Standard and Averaging Reinforcement Learning

Wiering, Marco A.

doi:10.1007/978-3-540-30115-8_44

Marco A. Wiering²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3201))

Included in the following conference series:

European Conference on Machine Learning

4083 Accesses
13 Citations

Abstract

Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques with function approximators can lead to divergence. In this paper we show why off-policy RL methods combined with linear function approximators can lead to divergence. Furthermore, we analyze two different types of updates; standard and averaging RL updates. Although averaging RL will not diverge, we show that they can converge to wrong value functions. In our experiments we compare standard to averaging value iteration (VI) with CMACs and the results show that for small values of the discount factor averaging VI works better, whereas for large values of the discount factor standard VI performs better, although it does not always converge.

Download to read the full chapter text

Chapter PDF

Exploiting Multi-step Sample Trajectories for Approximate Value Iteration

Monte Carlo Bias Correction in Q-Learning

Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Albus, J.S.: A theory of cerebellar function. Mathematical Biosciences 10, 25–61 (1975)
Article Google Scholar
Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Prieditis, A., Russell, S. (eds.) Machine Learning: Proceedings of the Twelfth International Conference, pp. 30–37. Morgan Kaufmann Publishers, San Francisco (1995)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 369–376. MIT Press, Cambridge (1995)
Google Scholar
Gordon, G.J.: Stable function approximation in dynamic programming. Technical Report CMU-CS-95-103, Carnegie Mellon University (1995)
Google Scholar
Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6, 1185–1201 (1994)
Article MATH Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Perkins, T.J., Precup, D.: A convergent form of approximate policy iteration. In: Todd, K., Leen, T.G. (eds.) Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems, vol. 13, MIT Press, Cambridge (2002)
Google Scholar
Reynolds, S.I.: The stability of general discounted reinforcement learning with linear function approximation. In: Proceedings of the UK Workshop on Computational Intelligence (UKCI 2002), pp. 139–146 (2002)
Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK (1994)
Google Scholar
Singh, S.P., Jaakkola, T., Littman, M.L., Szepesvari, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38(3), 287–308 (2000)
Article MATH Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 1038–1045. MIT Press, Cambridge (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT press, Cambridge (1998)
Google Scholar
Szepesvari, C., Smart, W.D.: Convergent value function approximation methods. In: The International Conference om Machine Learning, ICML 2004 (2004) (accepted in)
Google Scholar
Tesauro, G.J.: Temporal difference learning and TD-Gammon. Communications of the ACM 38, 58–68 (1995)
Article Google Scholar
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Machine Learning 16, 185–202 (1994)
MATH Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, England (1989)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Systems Group, Institute of Information and Computing Sciences, Utrecht University,
Marco A. Wiering

Authors

Marco A. Wiering
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiering, M.A. (2004). Convergence and Divergence in Standard and Averaging Reinforcement Learning. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-30115-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Convergence and Divergence in Standard and Averaging Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Exploiting Multi-step Sample Trajectories for Approximate Value Iteration

Monte Carlo Bias Correction in Q-Learning

Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Convergence and Divergence in Standard and Averaging Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Exploiting Multi-step Sample Trajectories for Approximate Value Iteration

Monte Carlo Bias Correction in Q-Learning

Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation