Abstract
This paper surveys the historical basis of reinforcement learning and some of the current work from a computer scientist’s point of view. It is an outgrowth of a number of talks given by the authors, including a NATO Advanced Study Institute and tutorials at AAAI’94 and Machine Learning’94. Reinforcement learning is a popular model of the learning problems that are encountered by an agent that learns behavior through trial-and-error interactions with a dynamic environment. It has a strong family resemblance to work in psychology, but differs considerably in the details and in the use of the word “reinforcement.” It is appropriately thought of as a class of problems, rather than as a set of techniques. The paper addresses a variety of subproblems in reinforcement learning, including exploration vs. exploitation, learning from delayed reinforcement, learning and using models, generalization and hierarchy, and hidden state. It concludes with a survey of some practical systems and an assessment of the practical utility of current reinforcement-learning systems
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ackley, D. H. k Littman, M. L. (1989), Generalization and scaling in reinforcement learning, in Advances in Neural Information Processing 2, Morgan Kaufmann, San Mateo, CA
Anderson, C. W. (1986), Learning and Problem Solving with Multilayer Connectionist Systems, PhD thesis, University of Massachusetts, Amherst, MA Barto, A. G., Bradtke, S. J. k Singh, S. P. (1993), Learning to act using real-time dynamic programming, Technical Report 93–02, Department of Computer and Information Science, University of Massachusetts, Amherst, MA
Bellman, R. (1957), Dynamic Programming, Princeton University Press, Princeton, NJ
Berry, D. A. k Fristedt, B. (1985), Bandit Problems: equential Allocation of Experiments, Chapman and Hall, London, UK
Bertsekas, D. P. (1987), Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall
Bertsekas, D. P. k Tsitsiklis, J. N. (1989), Parallel and Distributed Computation:Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ
Cassandra, A. R., Kaelbling, L. P. k Littman, M. L. (1994), Acting optimally in partially observable stochastic domains, in Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA
Chapman, D. k Kaelbling, L. P. (1991), Input generalization in delayed reinforcement learning: An algorithm and performance comparisons, in Proceedings of the International Joint Conference on Artificial Intelligence, Sydney, Australia
Cleveland, W. S. Sz Delvin, S. J. (1988), Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting, Journal of the American Statistical Association 83(403), 596–610 Dayan, P. (1992), The convergence of TD(A) for general A, Machine Learning 8 (3), 341–362
Dayan, P. Sz Hinton, G. E. (1993), Feudal reinforcement learning, in Advances in Neural Information Processing Systems 5, Morgan Kaufmann, San Mateo, CA
Dayan, P.; Sejnowski, T. J. (1994), TD(A) converges with probability 1, Machine Learning
Dean, T., Kaelbling, L. P., Kirman, J. Sz Nicholson, A. (1993), Planning with deadlines in stochastic domains, in Proceedings of the Eleventh National Conference on Artificial Intelligence, Washington, DC
Gullapalli, V. (1990), A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3, 671–692
Gullapalli, V. (1992), Reinforcement Learning and its application to control, PhD thesis, University of Massachusetts, Amherst, MA
Howard, R. A. (1960), Dynamic Programming and Markov Processes, The MIT Press, Cambridge, MA
Kaelbling, L. P. ( 1993 a), ierarchical learning in stochastic domains: Preliminary results, in Proceedings of the Tenth International Conference on Machine Learning, Morgan Kaufmann, Amherst, MA
Kaelbling, L. P. ( 1993 b), Learning in Embedded Systems, The MIT Press, Cambridge, MA
Kaelbling, L. P. ( 1994 a), Associative reinforcement learning: A generate and test algorithm, Machine Learning
Kaelbling, L. P. ( 1994 b), Associative reinforcement learning: Functions in fc-DNF, Machine Learning
Lin, L.-J. (1993 a), Hierachical learning of robot skills by reinforcement, in Proceedings of the International Conference on Neural Networks
Lin, L.-J. ( 1993 b), Reinforcement Learning for Robots Using Neural Networks, PhD thesis, Carnegie Mellon University, Pittsburgh, PA
Lin, L.-J. Sz Mitchell, T. M. (1992), Memory approaches to reinforcement learning in non-markovian domains, Technical Report CMU-CS-92-138, Carnegie Mellon University, School of Computer Science
Littman, M. L. (1994), Memoryless policies: Theoretical limitations and practical results, in From Animals to Animats 3, Brighton, UK
Lovejoy, W. S. (1991), A survey of algorithmic methods for partially observed markov decision processes, Annals of Operations Research 28 (1), 47–65
Maes, P. Brooks, R. A. (1990), Learning to coordinate behaviors, in Proceedings Eighth National Conference on Artificial Intelligence, AAAI, Morgan Kaufmann, pp. 796–802
Mahadevan, S. Connell, J. ( 1991 a), Automatic programming of behavior-based robots using reinforcement learning, in Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA
Mahadevan, S. Sz Connell, J. (1991 b), Scaling reinforcement learning to robotics by exploiting the subsumption architecture, in Proceedings of the Eighth International Workshop on Machine Learning, pp. 328–332
Mataric, M. J. (1994), Reward Functions for Accelerated Learning, in W. W. Cohen k H. Hirsh (eds.) Proceedings of the Eleventh International Conference on Machine Learning, Morgan Kaufmann
Monahan, G. E. (1982), A survey of partially observable Markov decision processes:Theory, models, and algorithms, Management Science 28 (1), 1–16
Moore, A. W. (1991), Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued spaces, in Proc. Eighth International Machine Learning Workshop
Moore, A. W. (1994), The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces, in S. J. Hanson, J. D. Cowan k C. L. Giles (eds.) Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA
Moore, A. W. k Atkeson, C. G. (1992), An Investigation of Memory-based Function Approximators for Learning Control, Technical Report, MIT Artifical Intelligence Laboratory
Moore, A. W. k Atkeson, C. G. (1993), Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time, Machine Learning
Narendra, K. k Thathachar, M. A. L. (1989), Learning Automata: An Introduction,
Prentice–Hall, Englewood Cliffs, NJ
Peng, J. k Williams, R. J. (1993), Efficient learning and planning within the dyna framework, Adaptive Behavior 1 (4), 437–454
Peng, J. Williams, R. J. (1994), Incremental multi-step Q-learning, in Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, New Brunswick, New Jersey
Sage, A. P. k White, C. C. (1977), Optimum Systems Control, Prentice Hall
Samuel, A. L. (1959), Some studies in machine learning using the game of checkers,IBM Journal of Research and Development 3, 211–229
Schaal, S. k Atkeson, C. (1994), Robot Juggling: An Implementation of Memory-based Learning, Control Systems Magazine
Schmidhuber, J. H. (1991), Reinforcement learning in markovian and non mar kovian environments, in D. S. Lippman, J. E. Moody k D. S. Touretzky (eds.) Advances in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, CA, pp. 500–506
Schraudolph, N. N., Dayan, P. k Sejnowski, T. J. (1994), Using the td(lambda) algorithm to learn an evaluation function for the game of go, in Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA
Singh, S. P. ( 1992 a), Reinforcement learning with a hierarchy of abstract models, in Proceedings of the Tenth National Conference on Artificial Intelligence, A A AI Press, San Jose, CA, pp. 202–207
Singh, S. P. (1992 b), Transfer of learning by composing solutions of elemental sequential tasks, Machine Learning 8(3), 323–340
Singh, S. P., Jaakkola, T. k Jordan, M. I. (1994), Model-free reinforcement learning for non-Markovian decision problems, in Proceedings of the Machine Learning Conference
Sutton, R. S. (1984), Temporal Credit Assignment in Reinforcement Learning, PhD thesis, University of Massachusetts, Amherst, MA
Sutton, R. S. (1988), Learning to predict by the method of temporal differences,Machine Learning 3 (1), 9–44
Sutton, R. S. (1990), Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, in Proceedings of the Seventh International Conference on Machine Learning, Morgan Kaufmann, Austin, TX
Sutton, R. S. (1991), Reinforcement learning architectures for animats, in Proceedings of the International Workshop on the Simulation of Adaptive Behavior: From Animals to Animats, The MIT Press, Cambridge, MA, pp. 288–296
Tesauro, G. (1992), Practical issues in temporal difference learning, Machine Learning 8, 257–277
Tesauro, G. (To appear), TD-Gammon, a sel-teaching backgammon program, achieves master-level play, Neural Computation
Thrun, S. (1994), Personal Communication
Thrun, S. B. (1992), The role of exploration in learning control, in D. A. White D. A. Sofge (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, NYWatkins, C. J. C. H. ( 1989 ), Learning from Delayed Rewards, PhD thesis, King’s College, Cambridge, UK
Williams, R. J. (1992), Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (3), 229–256
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaelbling, L.P., Littman, M.L., Moore, A.W. (1995). An Introduction to Reinforcement Learning. In: Steels, L. (eds) The Biology and Technology of Intelligent Autonomous Agents. NATO ASI Series, vol 144. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79629-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-79629-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-79631-9
Online ISBN: 978-3-642-79629-6
eBook Packages: Springer Book Archive