Abstract
Temporally extended actions are usually effective in speeding up reinforcement learning. In this paper we present a mechanism for automatically constructing such actions, expressed as options [24], in a finite Markov Decision Process (MDP). To do this, we compute a bisimulation metric [7] between the states in a small MDP and the states in a large MDP, which we want to solve. The shape of this metric is then used to completely define a set of options for the large MDP. We demonstrate empirically that our approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Anderson, J.R.: Act: A simple theory of complex cognition. American Psychologist 51, 355–365 (1996)
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)
Castro, P.S., Precup, D.: Using bisimulation for policy transfer in MDPs. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI 2010), pp. 1065–1070 (2010)
Comanici, G., Precup, D.: Optimal policy switching algorithms in reinforcement learning. In: Proceedings of AAMAS (2010)
Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Ferns, N., Castro, P.S., Precup, D., Panangaden, P.: Methods for computing state similarity in Markov Decision Processes. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI 2006), pp. 174–181 (2006)
Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), pp. 162–169 (2004)
Givan, R., Dean, T., Greig, M.: Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence 147(1-2), 163–223 (2003)
Jonsson, A., Barto, A.G.: Causal graph based decomposition of factored MDPs. Journal of Machine Learning Research 7, 2259–2301 (2006)
Konidaris, G., Kuindersma, S., Barto, A.G., Grupen, R.A.: Constructing skill trees for reinforcement learning agents from demonstration trajectories. In: Advances in Neural Information Processing Systems 23, pp. 1162–1170 (2010)
Laird, J., Bloch, M.K., the Soar Group: Soar home page (2011)
Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004 (2004)
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the 18th International Conference on Machine Learning, ICML 2001 (2001)
Mehta, N., Ray, S., Tapadalli, P., Dietterich, T.: Automatic discovery and transfer of maxq hierarchies. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008 (2008)
Mugan, J., Kuipers, B.: Autonomously learning an action hierarchy using a learned qualitative state representation. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (2009)
Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Advances in Neural Information Processing Systems, NIPS 1998 (1998)
Precup, D.: Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst (2000)
Ravindran, B., Barto, A.G.: Relativized options: Choosing the right transformation. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003 (2003)
Soni, V., Singh, S.: Using Homomorphism to Transfer Options across Reinforcement Learning Domains. In: Proceedings of AAAI Conference on Artificial Intelligence, AAAI 2006 (2006)
Sorg, J., Singh, S.: Transfer via Soft Homomorphisms. In: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2009 (2009)
Stolle, M., Precup, D.: Learning Options in Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, p. 212. Springer, Heidelberg (2002)
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for robocup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Taylor, J., Precup, D., Panangaden, P.: Bounding performance loss in approximate MDP homomorphisms. In: Proceedings of the Conference on Advances in Neural Information Processing Systems, NIPS 2009 (2009)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10, 1633–1685 (2009)
Šimšek, Ö., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005 (2005)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Wolfe, A.P., Barto, A.G.: Defining object types and options using MDP homomorphisms. In: Proceedings of the ICML 2006 Workshop on Structural Knowledge Transfer for Machine Learning (2006)
Zang, P., Zhou, P., Minnen, D., Isbell, C.: Discovering options from example trajectories. In: Proceedings of the 26th International Conference on Machine Learning, ICML 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Castro, P.S., Precup, D. (2012). Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-29946-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)