Using MDP Characteristics to Guide Exploration in Reinforcement Learning

  • Bohdana Ratitch
  • Doina Precup
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


We present a new approach for exploration in Reinforcement Learning (RL) based on certain properties of the Markov Decision Processes (MDP). Our strategy facilitates a more uniform visitation of the state space, a more extensive sampling of actions with potentially high variance of the action-value function estimates, and encourages the RL agent to focus on states where it has most control over the outcomes of its actions. Our exploration strategy can be used in combination with other existing exploration techniques, and we experimentally demonstrate that it can improve the performance of both undirected and directed exploration methods. In contrast to other directed methods, the exploration-relevant information can be precomputed beforehand and then used during learning without additional computation cost.


Reinforcement Learn Markov Decision Process Exploration Strategy Exploration Method Learning Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proc. AAAI, pp. 761–768 (1998)Google Scholar
  2. 2.
    Dearden, R., Friedman, N., Andre, D.: Model-Based Bayesian Exploration. In: Proc. of the 15th UAI Conference, pp. 150–159 (1999)Google Scholar
  3. 3.
    Kaelbling, L.P.: Learning in embedded systems. MIT Press, Cambridge (1993)Google Scholar
  4. 4.
    Kearns, M., Singh, S.: Near-Optimal Reinforcement Learning in Polynomial Time. In: Proc. of the 15th ICML, pp. 260–268 (1998)Google Scholar
  5. 5.
    Kirman, J.: Predicting Real-Time Planner Performance by Domain Characterization. Ph.D. Thesis, Brown University (1995) Google Scholar
  6. 6.
    Kumar, P.R.: A survey of some results in stochastic adaptive control. SIAM Journal of Control and Optimization 23, 329–338 (1985)zbMATHCrossRefGoogle Scholar
  7. 7.
    Meuleau, N., Bourgine, P.: Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty. Machine Learning 35(2), 117–154 (1999)zbMATHCrossRefGoogle Scholar
  8. 8.
    Piater, J.H., Cohen, P.R., Zhang, X., Atighetchi, M.: A Randomized ANOVA Procedure for Comparing Performance Curves. In: Proc. of the 15th ICML, pp. 430-438 (1998)Google Scholar
  9. 9.
    Ratitch, B., Precup, D.: Characterizing Markov Decision Processes. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 391–404. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Schmidhuber, J.H.: Adaptive Confidence and Adaptive Curiosity. Technical Report FKI-149-91, Technische Universitat Munchen (1991) Google Scholar
  11. 11.
    Singh, S., Jaakkola, T., Littman, M.L., Szepesvari, C.: Convergence Results for Single-Step On-Policy Reinforcement Learning Algorithms. Machine Learning 39, 287–308 (2000)CrossRefGoogle Scholar
  12. 12.
    Sutton, R.: Integrated architecture for learning, planning and reacting based on approximating dynamic programming. In: Proc. of the 7th ICML, pp. 216–224 (1990)Google Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. An Introduction. The MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Thrun, S.B.: Efficient Exploration in Reinforcement Learning. Technical Report CMU-CS- 92-102. School of Computer Science, Carnegie Mellon University (1992)Google Scholar
  15. 15.
    Vignat, C., Bercher, J.-F.: Un estimateur récursif de l’entropie. 17ème Colloque GRETSI, Vannes, 701–704 (1999)Google Scholar
  16. 16.
    Wiering, M.A., Schmidhuber, J.: Efficient Model-Based Exploration. In: Proc. of the 5th International Conference on Simulation of Adaptive Behavior, pp. 223–228 (1998)Google Scholar
  17. 17.
    Wiatt, J.: Exploration and Inference in Learning from Reinforcement. Ph.D. Thesis. University of Edingburg (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Bohdana Ratitch
    • 1
  • Doina Precup
    • 1
  1. 1.McGill UniversityMontrealCanada

Personalised recommendations