Theoretical results on reinforcement learning with temporally abstract options

Precup, Doina; Sutton, Richard S.; Singh, Satinder

doi:10.1007/BFb0026709

Theoretical results on reinforcement learning with temporally abstract options

Doina Precup¹,
Richard S. Sutton¹ &
Satinder Singh²

Reinforcement Learning
Conference paper
First Online: 01 January 2005

886 Accesses
15 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1398))

Abstract

We present new theoretical results on planning within the framework of temporally abstract reinforcement learning (Precup & Sutton, 1997; Sutton, 1995). Temporal abstraction is a key step in any decision making system that involves planning and prediction. In temporally abstract reinforcement learning, the agent is allowed to choose among “options”, whole courses of action that may be temporally extended, stochastic, and contingent on previous events. Examples of options include closed-loop policies such as picking up an object, as well as primitive actions such as joint torques. Knowledge about the consequences of options is represented by special structures called multi-time models. In this paper we focus on the theory of planning with multi-time models. We define new Bellman equations that are satisfied for sets of multi-time models. As a consequence, multi-time models can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration.

Download to read the full chapter text

Chapter PDF

References

Dimitri P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, Englewood Cliffs, NJ, 1987.
Google Scholar
Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5:613–624, 1993.
Google Scholar
Peter Dayan and Geoff E. Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, volume 5, pages 271–278, Cambridge, MA, 1993. MIT Press.
Google Scholar
Thomas G. Dietterich. Hierarchical reinfrecement learning with maxq value function decomposition. Technical report, Computer Science Department, Oregon State University, 1997.
Google Scholar
Manfred Huber and Roderic A. Grupen. Learning to coordinate controllers — reinforcement learning on a control basis. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, IJCAI-97, San Francisco, CA, 1997. Morgan Kaufmann.
Google Scholar
Leslie P. Kaelbling. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning ICML'93, pages 167–173, San Mateo, CA, 1993. Morgan Kaufmann.
Google Scholar
Richard E. Korf. Learning to Solve Problems by Searching for Macro-Operators. Pitman Publishing Ltd, London, 1985.
Google Scholar
John E. Laird, Paul S. Rosenbloom, and Allan Newell. Chunking in SOAR: The anatomy of a general learning mechanism. Machine Learning, 1:11–46, 1986.
Google Scholar
Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55(2–3):311–365, 1992
Article Google Scholar
Amy McGovern, Richard S. Sutton, and Andrew H. Fagg. Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper Celebration of Women in Computing, pages 13–18, 1997.
Google Scholar
Andrew W. Moore and Chris G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103–130, 1993
Google Scholar
Ronald Parr and Stuart Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, volume 10, Cambridge, MA, 1998. MIT Press.
Google Scholar
Jing Peng and John Williams. Efficient learning and planning within the Dyna framework. Adaptive Behavior, 4:323–334, 1993.
Google Scholar
Doina Precup and Richard S. Sutton. Multi-Time models for temporally abstract planning. In Advances in Neural Information Processing Systems, volume 10, Cambridge, MA, 1998. MIT Press.
Google Scholar
Martin L. Puterman. Markov Decision Processes. Wiley-Interscience, New York, NY, 1994.
Google Scholar
Earl D. Sacerdoti. A Structure for Plans and Behavior. Elsevier, North-Holland, NY, 1977.
Google Scholar
Satinder P. Singh. Scaling reinforcement learning by learning variable temporal resolution models. In Proceedings of the Ninth International Conference on Machine Learning ICML'92, pages 202–207, San Mateo, CA, 1992. Morgan Kaufmann.
Google Scholar
Richard S. Sutton. Integrating architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning ICML'90, pages 216–224, San Mateo, CA, 1990. Morgan Kaufmann.
Google Scholar
Richard S. Sutton. TD models: Modeling the world as a mixture of time scales. In Proceedings of the Twelfth International Conference on Machine Learning ICML'9S, pages 531–539, San Mateo, CA, 1995. Morgan Kaufmann.
Google Scholar
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning. An Introduction. MIT Press, Cambridge, MA, 1998.
Google Scholar
Richard S. Sutton and Brian Pinette. The learning of world models by connectionist networks. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 54–64, 1985.
Google Scholar
Christopher J. C. H. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, 01003-4610, Amherst, MA
Doina Precup & Richard S. Sutton
Department of Computer Science, University of Colorado Boulder, 80309-0430, CO
Satinder Singh

Authors

Doina Precup
View author publications
You can also search for this author in PubMed Google Scholar
Richard S. Sutton
View author publications
You can also search for this author in PubMed Google Scholar
Satinder Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Claire Nédellec Céline Rouveirol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Precup, D., Sutton, R.S., Singh, S. (1998). Theoretical results on reinforcement learning with temporally abstract options. In: Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol 1398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026709

Download citation

DOI: https://doi.org/10.1007/BFb0026709
Published: 16 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64417-0
Online ISBN: 978-3-540-69781-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics