Reinforcement-Learning: An Overview from a Data Mining Perspective

Cohen, Shahar; Maimon, Oded

doi:10.1007/0-387-25465-X_21

Reinforcement-Learning: An Overview from a Data Mining Perspective

Shahar Cohen² &
Oded Maimon²

Chapter

20k Accesses

Abstract

Reinforcement-Learning is learning how to best-react to situations, through trial and error. In the Machine-Learning community Reinforcement-Learning is researched with respect to artificial (machine) decision-makers, referred to as agents. The agents are assumed to be situated within an environment which behaves as a Markov Decision Process. This chapter provides a brief introduction to Reinforcement-Learning, and establishes its relation to Data-Mining. Specifically, the Reinforcement-Learning problem is defined; a few key ideas for solving it are described; the relevance to Data-Mining is explained; and an instructive example is presented.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman R. Dynamic Programming. Princeton University Press, 1957.
Google Scholar
Bertsekas D.P. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, 1987.
Google Scholar
Bertsekas D.P., Tsitsiklis J.N. Neuro-Dynamic Programming. Athena Scientific, 1996.
Google Scholar
Claus C, Boutilier, C. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. AAAI-97 Workshop on Multiagent Learning, 1998.
Google Scholar
Crites R.H., Barto A.G. Improving Elevator Performance Using Reinforcement Learning. Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, 1996.
Google Scholar
Filar J., Vriez K. Competitive Markov Decision Processes. Springer, 1997.
Google Scholar
Hong J, Prabhu V.V. Distributed Reinforcement Learning for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems. Applied Intelligence, 2004; 20:71–87.
Article Google Scholar
Howard, R.A. Dynamic Programming and Markov Processes, M.I.T Press, 1960.
Google Scholar
Hu J., Wellman M.R Multiagent Reinforcement Learning: Theoretical Framework and Algorithm. In Proceedings of the 15th International Conference on Machine Learning, 1998.
Google Scholar
Jaakkola T, Jordan M.I., Singh S.P. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Computation, 1994; 6:1185–201.
Google Scholar
Kaelbling L.P., Littman L.M., Moore A.W. Reinforcement Learning: a Survey. Journal of Artificial Intelligence Research 1996; 4:237–85.
Google Scholar
Littman M.L., Boyan J.A. A Distributed Reinforcement Learning Scheme for Network Routing. In Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications, 1993.
Google Scholar
Littman M.L. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In Proceedings of the 7th International Conference on Machine Learning, 1994.
Google Scholar
Littman M. L. Friend-or-Foe Q-Learning in General-Sum Games. Proceedings of the 18th International Conference on Machine Learning, 2001.
Google Scholar
Pednault E., Abe N., Zadrozny B. Sequential Cost-Sensitive Decision making with Reinforcement-Learning. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
Google Scholar
Puterman M.L. Markov Decision Processes. Wiley, 1994
Google Scholar
Ross S. Introduction to Stochastic Dynamic Programming. Academic Press. 1983.
Google Scholar
Sen S., Sekaran M., Hale J. Learning to Coordinate Without Sharing Information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 1994.
Google Scholar
Sutton R.S., Barto A.G. Reinforcement Learning, an Introduction. MIT Press, 1998.
Google Scholar
Szepesvári C, Littman M.L. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms. Neural Computation, 1999; 11: 2017–60.
Article Google Scholar
Tesauro G.T. TD-Gammon, a Self Teaching Backgammon Program, Achieves Master Level Play. Neural Computation, 1994; 6:215–19.
Google Scholar
Tesauro G.T. Temporal Difference Learning and TD-Gammon. Communications of the ACM, 1995; 38:58–68.
Article Google Scholar
Watkins C.J.C.H. Learning from Delayed Rewards. Ph.D. thesis; Cambridge University, 1989.
Google Scholar
Watkins C.J.C.H., Dayan P. Technical Note: Q-Learning. Machine Learning, 1992; 8:279–92.
Google Scholar
Zhang W., Dietterich T.G. High Performance Job-Shop Scheduling With a Time Delay TD(A) Network. Advances in Neural Information Processing Systems, 1996; 8:1024–30.
Google Scholar

Download references

Author information

Authors and Affiliations

Tel-Aviv University, Israel
Shahar Cohen & Oded Maimon

Authors

Shahar Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Oded Maimon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Oded Maimon & Lior Rokach &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cohen, S., Maimon, O. (2005). Reinforcement-Learning: An Overview from a Data Mining Perspective. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_21

Download citation

DOI: https://doi.org/10.1007/0-387-25465-X_21
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics