Abstract
We present in this paper a method to introduce a priori knowledge into reinforcement learning using temporally extended actions. The aim of our work is to reduce the learning time of the Q-learning algorithm. This introduction of initial knowledge is done by constraining the set of available actions in some states. But at the same time, we can formulate that if the agent is in some particular states (called exception states), we have to relax those constraints. We define a mechanism called the propagation mechanism to get out of blocked situations induced by the initial knowledge constraints. We give some formal properties of our method and test it on a complex grid-world task. On this task, we compare our method with Q-learning and show that the learning time is drastically reduced for a very simple initial knowledge which would not be sufficient, by itself, to solve the task without the definition of exception situations and the propagation mechanism.
Download to read the full chapter text
Chapter PDF
References
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press/Bradford Books (1998)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement Learning with Soft State Aggregation. NIPS 7, pp. 361–368. MIT Press, Cambridge (1995)
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence 112, 181–211 (1999)
Parr, R.: Hierarchical control and learning for Markov decision processes. PhD thesis, University of California, Berkeley, California (1998)
Dietterich, T.G.: An Overview of MAXQ Hierarchical Reinforcement Learning. SARA, pp. 26–44 (2000)
Randlov, J.: Learning Macro-Actions in Reinforcement Learning. NIPS 11. MIT Press, Cambridge (1999)
Stone, P., Sutton, R.S.: Scaling Reinforcement Learning Toward RoboCup Soccer. In: Proceedings of the 18th International Conference on Machine Learning (2001)
Menache, I., Mannor, S., Shimki, N.: Q-Cut - Dynamic Discovery of Subgoals in Reinforcement Learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002)
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD Thesis. University of Cambridge, England (1989)
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Moore, A.W., Atkeson, C.G.: The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces. In: Advances in Neural Information Processing Systems (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garcia, P. (2003). A New Way to Introduce Knowledge into Reinforcement Learning. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive