Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets

Gabel, Thomas; Riedmiller, Martin

doi:10.1007/978-3-540-89722-4_7

Thomas Gabel³ &
Martin Riedmiller³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

European Workshop on Reinforcement Learning

1075 Accesses
1 Citations

Abstract

DEC-MDPs with changing action sets and partially ordered transition dependencies have recently been suggested as a sub-class of general DEC-MDPs that features provably lower complexity. In this paper, we investigate the usability of a coordinated batch-mode reinforcement learning algorithm for this class of distributed problems. Our agents acquire their local policies independent of the other agents by repeated interaction with the DEC-MDP and concurrent evolvement of their policies, where the learning approach employed builds upon a specialized variant of a neural fitted Q iteration algorithm, enhanced for use in multi-agent settings. We applied our learning approach to various scheduling benchmark problems and obtained encouraging results that show that problems of current standards of difficulty can very well approximately, and in some cases optimally be solved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beasley, J.: OR-Library (2005), http://people.brunel.ac.uk/~mastjjb/jeb/info.html
Becker, R., Zilberstein, S., Lesser, V.: Decentralized Markov Decision Processes with Event-Driven Interactions. In: Proceedings of AAMAS 2004, pp. 302–309. ACM Press, New York (2004)
Google Scholar
Becker, R., Zilberstein, S., Lesser, V., Goldman, C.: Solving Transition Independent Decentralized MDPs. Journal of AI Research 22, 423–455 (2004)
MATH Google Scholar
Bernstein, D., Givan, D., Immerman, N., Zilberstein, S.: The Complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)
Article MathSciNet MATH Google Scholar
Boutilier, C.: Sequential Optimality and Coordination in Multiagent Systems. In: Proceedings of IJCAI 1999, Sweden, pp. 478–485. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Brafman, R., Tennenholtz, M.: Learning to Cooperate Efficiently: A Model-Based Approach. Journal of AI Research 19, 11–23 (2003)
MATH Google Scholar
Buffet, O., Dutech, A., Charpillet, F.: Shaping Multi-Agent Systems with Gradient Reinforcement Learning. Autonomous Agent and Multi-Agent System Journal 15(2), 197–220 (2007)
Article Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-Based Batch Mode Reinforcement Learning. Journal of Machine Learning Research (6), 504–556 (2005)
Google Scholar
Gabel, T., Riedmiller, M.: Adaptive Reactive Job-Shop Scheduling with Learning Agents. International Journal of Information Technology and Intelligent Computing 2(4) (2007)
Google Scholar
Gabel, T., Riedmiller, M.: Reinforcement Learning for DEC-MDPs with Changing Action Sets and Partially Ordered Dependencies. In: Proceedings of AAMAS 2008, Estoril, Portugal, pp. 1333–1336. IFAAMAS (2008)
Google Scholar
Goldman, C., Zilberstein, S.: Optimizing Information Exchange in Cooperative Multi-Agent Systems. In: Proceedings of AAMAS 2003, Melbourne, Australia, pp. 137–144. ACM Press, New York (2003)
Google Scholar
Lauer, M., Riedmiller, M.: An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In: Proceedings of ICML 2000, Stanford, USA, pp. 535–542. AAAI Press, Menlo Park (2000)
Google Scholar
Pinedo, M.: Scheduling. Theory, Algorithms, and Systems. Prentice Hall, Englewood Cliffs (2002)
MATH Google Scholar
Riedmiller, M.: Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS, vol. 3720, pp. 317–328. Springer, Heidelberg (2005)
Chapter Google Scholar
Riedmiller, M., Braun, H.: A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In: Ruspini, H. (ed.) Proceedings of ICNN, San Francisco, USA, pp. 586–591 (1993)
Google Scholar
Szer, D., Charpillet, F.: Coordination through Mutual Notification in Cooperative Multiagent RL. In: Proceedings of AAMAS 2004, pp. 1254–1255. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Verbeeck, K., Nowe, A., Tuyls, K.: Coordinated Exploration in Multi-Agent Reinforcement Learning: An Application to Load-Balancing. In: Proceedings of AAMAS 2005, Utrecht, The Netherlands, pp. 1105–1106. ACM Press, New York (2005)
Google Scholar
Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Neuroinformatics Group Department of Mathematics and Computer Science, University of Osnabrück, 49069, Osnabrück, Germany
Thomas Gabel & Martin Riedmiller

Authors

Thomas Gabel
View author publications
You can also search for this author in PubMed Google Scholar
Martin Riedmiller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gabel, T., Riedmiller, M. (2008). Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics