Abstract
This paper introduces an approach to improving an approximate solution in reinforcement learning by augmenting it with a small overriding patch. Many approximate solutions are smaller and easier to produce than a flat solution, but the best solution within the constraints of the approximation may fall well short of global optimality. We present a technique for efficiently learning a small patch to reduce this gap. Empirical evaluation demonstrates the effectiveness of patching, producing combined solutions that are much closer to global optimality.
Chapter PDF
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, King’s College, Oxford (1989)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)
Korf, R.E.: Real-time heuristic search. Artificial Intelligence 42, 189–211 (1990)
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artificial Intelligence 72, 81–138 (1995)
Bowling, M., Veloso, M.: Reusing learned policies between similar problems. In: Proceedings of the AI*IA-1998 Workshop on New Trends in Robotics, Padua, Italy (1998)
Taylor, M.E., Stone, P.: Behavior transfer for value-function-based reinforcement learning. In: The 4th International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 53–59. ACM Press, New York (2005)
Kok, J.R., Vlassis, N.: Sparse cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning, pp. 481–488. ACM, New York (2004)
Kok, J.R., Hoen, P.J., Bakker, B., Vlassis, N.: Utile coordination: Learning interdependencies among cooperative agents. In: IEEE Symposium on Computational Intelligence and Games (2005)
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)
Kim, M.S., Uther, W.: Patching approximate solutions in reinforcement learning. Technical Report 0610, School of Computer Science and Engineering, University of New South Wales (2006)
Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, M.S., Uther, W. (2006). Patching Approximate Solutions in Reinforcement Learning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_27
Download citation
DOI: https://doi.org/10.1007/11871842_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)