Abstract
Reinforcement-Learning is learning how to best-react to situations, through trial and error. In the Machine-Learning community Reinforcement-Learning is researched with respect to artificial (machine) decision-makers, referred to as agents. The agents are assumed to be situated within an environment which behaves as a Markov Decision Process. This chapter provides a brief introduction to Reinforcement-Learning, and establishes its relation to Data-Mining. Specifically, the Reinforcement-Learning problem is defined; a few key ideas for solving it are described; the relevance to Data-Mining is explained; and an instructive example is presented.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bellman R. Dynamic Programming. Princeton University Press, 1957.
Bertsekas D.P. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, 1987.
Bertsekas D.P., Tsitsiklis J.N. Neuro-Dynamic Programming. Athena Scientific, 1996.
Claus C, Boutilier, C. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI-97 Workshop on Multiagent Learning, 1998.
Crites R.H., Barto A.G. Improving Elevator Performance Using Reinforcement Learning. Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, 1996.
Filar J., Vriez K. Competitive Markov Decision Processes. Springer, 1997.
Hong J, Prabhu V.V. Distributed Reinforcement Learning for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems. Applied Intelligence, 2004; 20:71–87.
Howard, R.A. Dynamic Programming and Markov Processes, M.I.T Press, 1960.
Hu J., Wellman M.R Multiagent Reinforcement Learning: Theoretical Framework and Algorithm. In Proceedings of the 15th International Conference on Machine Learning, 1998.
Jaakkola T, Jordan M.I., Singh S.P. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Computation, 1994; 6:1185–201.
Kaelbling L.P., Littman L.M., Moore A.W. Reinforcement Learning: a Survey. Journal of Artificial Intelligence Research 1996; 4:237–85.
Littman M.L., Boyan J.A. A Distributed Reinforcement Learning Scheme for Network Routing. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications, 1993.
Littman M.L. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Proceedings of the 7th International Conference on Machine Learning, 1994.
Littman M. L. Friend-or-Foe Q-Learning in General-Sum Games. Proceedings of the 18th International Conference on Machine Learning, 2001.
Pednault E., Abe N., Zadrozny B. Sequential Cost-Sensitive Decision making with Reinforcement-Learning. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
Puterman M.L. Markov Decision Processes. Wiley, 1994
Ross S. Introduction to Stochastic Dynamic Programming. Academic Press. 1983.
Sen S., Sekaran M., Hale J. Learning to Coordinate Without Sharing Information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 1994.
Sutton R.S., Barto A.G. Reinforcement Learning, an Introduction. MIT Press, 1998.
Szepesvári C, Littman M.L. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms. Neural Computation, 1999; 11: 2017–60.
Tesauro G.T. TD-Gammon, a Self Teaching Backgammon Program, Achieves Master Level Play. Neural Computation, 1994; 6:215–19.
Tesauro G.T. Temporal Difference Learning and TD-Gammon. Communications of the ACM, 1995; 38:58–68.
Watkins C.J.C.H. Learning from Delayed Rewards. Ph.D. thesis; Cambridge University, 1989.
Watkins C.J.C.H., Dayan P. Technical Note: Q-Learning. Machine Learning, 1992; 8:279–92.
Zhang W., Dietterich T.G. High Performance Job-Shop Scheduling With a Time Delay TD(A) Network. Advances in Neural Information Processing Systems, 1996; 8:1024–30.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Cohen, S., Maimon, O. (2005). Reinforcement-Learning: An Overview from a Data Mining Perspective. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_21
Download citation
DOI: https://doi.org/10.1007/0-387-25465-X_21
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)