An Introduction to Reinforcement Learning

Kaelbling, Leslie Pack; Littman, Michael L.; Moore, Andrew W.

doi:10.1007/978-3-642-79629-6_5

Leslie Pack Kaelbling²,
Michael L. Littman² &
Andrew W. Moore³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 144))

381 Accesses
8 Citations

Abstract

This paper surveys the historical basis of reinforcement learning and some of the current work from a computer scientist’s point of view. It is an outgrowth of a number of talks given by the authors, including a NATO Advanced Study Institute and tutorials at AAAI’94 and Machine Learning’94. Reinforcement learning is a popular model of the learning problems that are encountered by an agent that learns behavior through trial-and-error interactions with a dynamic environment. It has a strong family resemblance to work in psychology, but differs considerably in the details and in the use of the word “reinforcement.” It is appropriately thought of as a class of problems, rather than as a set of techniques. The paper addresses a variety of subproblems in reinforcement learning, including exploration vs. exploitation, learning from delayed reinforcement, learning and using models, generalization and hierarchy, and hidden state. It concludes with a survey of some practical systems and an assessment of the practical utility of current reinforcement-learning systems

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ackley, D. H. k Littman, M. L. (1989), Generalization and scaling in reinforcement learning, in Advances in Neural Information Processing 2, Morgan Kaufmann, San Mateo, CA
Google Scholar
Anderson, C. W. (1986), Learning and Problem Solving with Multilayer Connectionist Systems, PhD thesis, University of Massachusetts, Amherst, MA Barto, A. G., Bradtke, S. J. k Singh, S. P. (1993), Learning to act using real-time dynamic programming, Technical Report 93–02, Department of Computer and Information Science, University of Massachusetts, Amherst, MA
Google Scholar
Bellman, R. (1957), Dynamic Programming, Princeton University Press, Princeton, NJ
Google Scholar
Berry, D. A. k Fristedt, B. (1985), Bandit Problems: equential Allocation of Experiments, Chapman and Hall, London, UK
Google Scholar
Bertsekas, D. P. (1987), Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall
MATH Google Scholar
Bertsekas, D. P. k Tsitsiklis, J. N. (1989), Parallel and Distributed Computation:Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ
MATH Google Scholar
Cassandra, A. R., Kaelbling, L. P. k Littman, M. L. (1994), Acting optimally in partially observable stochastic domains, in Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA
Google Scholar
Chapman, D. k Kaelbling, L. P. (1991), Input generalization in delayed reinforcement learning: An algorithm and performance comparisons, in Proceedings of the International Joint Conference on Artificial Intelligence, Sydney, Australia
Google Scholar
Cleveland, W. S. Sz Delvin, S. J. (1988), Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting, Journal of the American Statistical Association 83(403), 596–610 Dayan, P. (1992), The convergence of TD(A) for general A, Machine Learning 8 (3), 341–362
Google Scholar
Dayan, P. Sz Hinton, G. E. (1993), Feudal reinforcement learning, in Advances in Neural Information Processing Systems 5, Morgan Kaufmann, San Mateo, CA
Google Scholar
Dayan, P.; Sejnowski, T. J. (1994), TD(A) converges with probability 1, Machine Learning
Google Scholar
Dean, T., Kaelbling, L. P., Kirman, J. Sz Nicholson, A. (1993), Planning with deadlines in stochastic domains, in Proceedings of the Eleventh National Conference on Artificial Intelligence, Washington, DC
Google Scholar
Gullapalli, V. (1990), A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3, 671–692
Article Google Scholar
Gullapalli, V. (1992), Reinforcement Learning and its application to control, PhD thesis, University of Massachusetts, Amherst, MA
Google Scholar
Howard, R. A. (1960), Dynamic Programming and Markov Processes, The MIT Press, Cambridge, MA
MATH Google Scholar
Kaelbling, L. P. ( 1993 a), ierarchical learning in stochastic domains: Preliminary results, in Proceedings of the Tenth International Conference on Machine Learning, Morgan Kaufmann, Amherst, MA
Google Scholar
Kaelbling, L. P. ( 1993 b), Learning in Embedded Systems, The MIT Press, Cambridge, MA
Google Scholar
Kaelbling, L. P. ( 1994 a), Associative reinforcement learning: A generate and test algorithm, Machine Learning
Google Scholar
Kaelbling, L. P. ( 1994 b), Associative reinforcement learning: Functions in fc-DNF, Machine Learning
Google Scholar
Lin, L.-J. (1993 a), Hierachical learning of robot skills by reinforcement, in Proceedings of the International Conference on Neural Networks
Google Scholar
Lin, L.-J. ( 1993 b), Reinforcement Learning for Robots Using Neural Networks, PhD thesis, Carnegie Mellon University, Pittsburgh, PA
Google Scholar
Lin, L.-J. Sz Mitchell, T. M. (1992), Memory approaches to reinforcement learning in non-markovian domains, Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science
Google Scholar
Littman, M. L. (1994), Memoryless policies: Theoretical limitations and practical results, in From Animals to Animats 3, Brighton, UK
Google Scholar
Lovejoy, W. S. (1991), A survey of algorithmic methods for partially observed markov decision processes, Annals of Operations Research 28 (1), 47–65
Article MathSciNet MATH Google Scholar
Maes, P. Brooks, R. A. (1990), Learning to coordinate behaviors, in Proceedings Eighth National Conference on Artificial Intelligence, AAAI, Morgan Kaufmann, pp. 796–802
Google Scholar
Mahadevan, S. Connell, J. ( 1991 a), Automatic programming of behavior-based robots using reinforcement learning, in Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA
Google Scholar
Mahadevan, S. Sz Connell, J. (1991 b), Scaling reinforcement learning to robotics by exploiting the subsumption architecture, in Proceedings of the Eighth International Workshop on Machine Learning, pp. 328–332
Google Scholar
Mataric, M. J. (1994), Reward Functions for Accelerated Learning, in W. W. Cohen k H. Hirsh (eds.) Proceedings of the Eleventh International Conference on Machine Learning, Morgan Kaufmann
Google Scholar
Monahan, G. E. (1982), A survey of partially observable Markov decision processes:Theory, models, and algorithms, Management Science 28 (1), 1–16
Article MathSciNet MATH Google Scholar
Moore, A. W. (1991), Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued spaces, in Proc. Eighth International Machine Learning Workshop
Google Scholar
Moore, A. W. (1994), The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces, in S. J. Hanson, J. D. Cowan k C. L. Giles (eds.) Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA
Google Scholar
Moore, A. W. k Atkeson, C. G. (1992), An Investigation of Memory-based Function Approximators for Learning Control, Technical Report, MIT Artifical Intelligence Laboratory
Google Scholar
Moore, A. W. k Atkeson, C. G. (1993), Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time, Machine Learning
Google Scholar
Narendra, K. k Thathachar, M. A. L. (1989), Learning Automata: An Introduction,
Google Scholar
Prentice–Hall, Englewood Cliffs, NJ
Google Scholar
Peng, J. k Williams, R. J. (1993), Efficient learning and planning within the dyna framework, Adaptive Behavior 1 (4), 437–454
Google Scholar
Peng, J. Williams, R. J. (1994), Incremental multi-step Q-learning, in Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, New Brunswick, New Jersey
Google Scholar
Sage, A. P. k White, C. C. (1977), Optimum Systems Control, Prentice Hall
MATH Google Scholar
Samuel, A. L. (1959), Some studies in machine learning using the game of checkers,IBM Journal of Research and Development 3, 211–229
MathSciNet Google Scholar
Schaal, S. k Atkeson, C. (1994), Robot Juggling: An Implementation of Memory-based Learning, Control Systems Magazine
Google Scholar
Schmidhuber, J. H. (1991), Reinforcement learning in markovian and non mar kovian environments, in D. S. Lippman, J. E. Moody k D. S. Touretzky (eds.) Advances in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, CA, pp. 500–506
Google Scholar
Schraudolph, N. N., Dayan, P. k Sejnowski, T. J. (1994), Using the td(lambda) algorithm to learn an evaluation function for the game of go, in Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA
Google Scholar
Singh, S. P. ( 1992 a), Reinforcement learning with a hierarchy of abstract models, in Proceedings of the Tenth National Conference on Artificial Intelligence, A A AI Press, San Jose, CA, pp. 202–207
Google Scholar
Singh, S. P. (1992 b), Transfer of learning by composing solutions of elemental sequential tasks, Machine Learning 8(3), 323–340
Google Scholar
Singh, S. P., Jaakkola, T. k Jordan, M. I. (1994), Model-free reinforcement learning for non-Markovian decision problems, in Proceedings of the Machine Learning Conference
Google Scholar
Sutton, R. S. (1984), Temporal Credit Assignment in Reinforcement Learning, PhD thesis, University of Massachusetts, Amherst, MA
Google Scholar
Sutton, R. S. (1988), Learning to predict by the method of temporal differences,Machine Learning 3 (1), 9–44
Google Scholar
Sutton, R. S. (1990), Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, in Proceedings of the Seventh International Conference on Machine Learning, Morgan Kaufmann, Austin, TX
Google Scholar
Sutton, R. S. (1991), Reinforcement learning architectures for animats, in Proceedings of the International Workshop on the Simulation of Adaptive Behavior: From Animals to Animats, The MIT Press, Cambridge, MA, pp. 288–296
Google Scholar
Tesauro, G. (1992), Practical issues in temporal difference learning, Machine Learning 8, 257–277
MATH Google Scholar
Tesauro, G. (To appear), TD-Gammon, a sel-teaching backgammon program, achieves master-level play, Neural Computation
Google Scholar
Thrun, S. (1994), Personal Communication
Google Scholar
Thrun, S. B. (1992), The role of exploration in learning control, in D. A. White D. A. Sofge (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, NYWatkins, C. J. C. H. ( 1989 ), Learning from Delayed Rewards, PhD thesis, King’s College, Cambridge, UK
Google Scholar
Williams, R. J. (1992), Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (3), 229–256
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Brown University, Box 1910, Providence, RI, 02912, USA
Leslie Pack Kaelbling & Michael L. Littman
School of Computer Science and Robotics Institute, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Andrew W. Moore

Authors

Leslie Pack Kaelbling
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Littman
View author publications
You can also search for this author in PubMed Google Scholar
Andrew W. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Artificial Intelligence Laboratory, Department of Computer Science, University of Brussels (Vrije Universiteit Brussel), Pleinlaan 2, B-1050, Brussels, Belgium
Luc Steels

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaelbling, L.P., Littman, M.L., Moore, A.W. (1995). An Introduction to Reinforcement Learning. In: Steels, L. (eds) The Biology and Technology of Intelligent Autonomous Agents. NATO ASI Series, vol 144. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79629-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-79629-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-79631-9
Online ISBN: 978-3-642-79629-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics