A general convergence method for Reinforcement Learning in the continuous case

  • Rémi Munos
Reinforcement Learning
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)


In this paper, we propose a general method for designing convergent Reinforcement Learning algorithms in the case of continuous state-space and time variables. The method is based on the discretization of the continuous process by convergent approximation schemes: the Hamilton-Jacobi-Bellman equation is replaced by a Dynamic Programming (DP) equation for some Markovian Decision Process (MDP). If the data of the MDP were known, we could compute the value of the DP equation by using some DP updating rules. However, in the Reinforcement Learning (RL) approach, the state dynamics as well as the reinforcement functions are a priori unknown, leading impossible to use DP rules.


Approximation Scheme Markovian Decision Process Contraction Property Dynamic Program Equation Discretized State Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [Bar94]
    Guy Barles. Solutions de viscosité des équations de Hamilton-Jacobi, volume 17 of Mathématiques et Applications. Springer-Verlag, 1994.Google Scholar
  2. [BBS95]
    Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, (72):81–138, 1995.Google Scholar
  3. [BS91]
    Guy Barles and P.E. Souganidis. Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic Analysis, 4:271–283, 1991.Google Scholar
  4. [FS93]
    Wendell H. Fleming and H. Mete Soner. Controlled Markov Processes and Viscosity Solutions. Applications of Mathematics. Springer-Verlag, 1993.Google Scholar
  5. [KD92]
    Harold J. Kushner and Dupuis. Numerical Methods for Stochastic Control Problems in Continuous Time. Applications of Mathematics. Springer-Verlag, 1992.Google Scholar
  6. [MB97]
    Rémi Munos and Paul Bourgine. Reinforcement learning for continuous stochastic control problems. Neural Information Processing Systems, 1997.Google Scholar
  7. [Mun96]
    Rémi Munos. A convergent reinforcement learning algorithm in the continuous case: the finite-element reinforcement learning. International Conference on Machine Learning, 1996.Google Scholar
  8. [Mun97]
    Rémi Munos. A convergent reinforcement learning algorithm in the continuous case based on a finite difference method. International Joint Conference on Artificial Intelligence, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Rémi Munos
    • 1
  1. 1.CEMAGREF, LISCAntony CedexFrance

Personalised recommendations