A New Way to Introduce Knowledge into Reinforcement Learning

  • Pascal Garcia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


We present in this paper a method to introduce a priori knowledge into reinforcement learning using temporally extended actions. The aim of our work is to reduce the learning time of the Q-learning algorithm. This introduction of initial knowledge is done by constraining the set of available actions in some states. But at the same time, we can formulate that if the agent is in some particular states (called exception states), we have to relax those constraints. We define a mechanism called the propagation mechanism to get out of blocked situations induced by the initial knowledge constraints. We give some formal properties of our method and test it on a complex grid-world task. On this task, we compare our method with Q-learning and show that the learning time is drastically reduced for a very simple initial knowledge which would not be sufficient, by itself, to solve the task without the definition of exception situations and the propagation mechanism.


  1. 1.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press/Bradford Books (1998)Google Scholar
  2. 2.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement Learning with Soft State Aggregation. NIPS 7, pp. 361–368. MIT Press, Cambridge (1995)Google Scholar
  3. 3.
    Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)zbMATHCrossRefGoogle Scholar
  4. 4.
    Sutton, R.S., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112, 181–211 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Parr, R.: Hierarchical control and learning for Markov decision processes. PhD thesis, University of California, Berkeley, California (1998)Google Scholar
  6. 6.
    Dietterich, T.G.: An Overview of MAXQ Hierarchical Reinforcement Learning. SARA, pp. 26–44 (2000)Google Scholar
  7. 7.
    Randlov, J.: Learning Macro-Actions in Reinforcement Learning. NIPS 11. MIT Press, Cambridge (1999)Google Scholar
  8. 8.
    Stone, P., Sutton, R.S.: Scaling Reinforcement Learning Toward RoboCup Soccer. In: Proceedings of the 18th International Conference on Machine Learning (2001)Google Scholar
  9. 9.
    Menache, I., Mannor, S., Shimki, N.: Q-Cut - Dynamic Discovery of Subgoals in Reinforcement Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD Thesis. University of Cambridge, England (1989)Google Scholar
  11. 11.
    Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)zbMATHCrossRefGoogle Scholar
  12. 12.
    Moore, A.W., Atkeson, C.G.: The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces. In: Advances in Neural Information Processing Systems (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Pascal Garcia
    • 1
  1. 1.INSA de Rennes/IRISARennes CedexFrance

Personalised recommendations