Abstract
We propose joint equilibrium policy search as a multi-agent learning algorithm for decentralized Markov decision processes with changing action sets. In its basic form, it relies on stochastic agent-specific policies parameterized by probability distributions defined for every state as well as on a heuristic that tells whether a joint equilibrium could be obtained. We also suggest an extended version where each agent employs a global policy parameterization which renders the approach applicable to larger-scale problems. Joint-equilibrium policy search is well suited for production planning, traffic control, and other application problems. In support of this, we apply our algorithms to a number of challenging scheduling benchmark problems, finding that solutions of very high quality can be obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beasley, J.: Or-library (2005), http://people.brunel.ac.uk/~mastjjb/jeb/info.html
Bernstein, D., Givan, D., Immerman, N., Zilberstein, S.: The Complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)
Brafman, R., Tennenholtz, M.: Learning to Cooperate Efficiently: A Model-Based Approach. Journal of Artificial Intelligence Research 19, 11–23 (2003)
Brucker, P., Knust, S.: Complex Scheduling. Springer, Berlin (2006)
Fulda, N., Ventura, D.: Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, pp. 1121–1125. IEEE Computer Society Press, Los Alamitos (2004)
Gabel, T., Riedmiller, M.: Adaptive Reactive Job-Shop Scheduling with Learning Agents. International Journal of Information Technology and Intelligent Computing 2(4) (2008)
Gabel, T., Riedmiller, M.: Reinforcement Learning for DEC-MDPs with Changing Action Sets and Partially Ordered Dependencies. In: Proceedings of the 7th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2008), Estoril, Portugal (to appear, 2008)
Lauer, M., Riedmiller, M.: An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In: Proceedings of the International Conference on Machine Learning (ICML 2000), Stanford, USA, pp. 535–542. AAAI Press, Menlo Park (2000)
Mascis, A., Pacciarelli, D.: Job-Shop Scheduling with Blocking and No-Wait Constraints. European Journal of Operational Research 143, 498–517 (2002)
Pinedo, M.: Scheduling. Theory, Algorithms, and Systems. Prentice Hall, USA (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gabel, T., Riedmiller, M. (2008). Joint Equilibrium Policy Search for Multi-Agent Scheduling Problems. In: Bergmann, R., Lindemann, G., Kirn, S., Pěchouček, M. (eds) Multiagent System Technologies. MATES 2008. Lecture Notes in Computer Science(), vol 5244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87805-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-87805-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87804-9
Online ISBN: 978-3-540-87805-6
eBook Packages: Computer ScienceComputer Science (R0)