Abstract
We present an anytime multiagent learning approach to satisfy any given optimality criterion in repeated game self-play. Our approach is opposed to classical learning approaches for repeated games: namely, learning of equilibrium, Pareto-efficient learning, and their variants. The comparison is given from a practical (or engineering) standpoint, i.e., from a point of view of a multiagent system designer whose goal is to maximize the system’s overall performance according to a given optimality criterion. Extensive experiments in a wide variety of repeated games demonstrate the efficacy of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of AAAI 1998 (1998)
Hu, J., Wellman, M.: Nash Q-learning for general-sum stochastic games. Journal of ML Research 4, 1039–1069 (2003)
Banerjee, B., Peng, J.: Performance bounded reinforcement learning in strategic interactions. In: Proceedings of AAAI 2004 (2004)
Greenwald, A.: Correlated-Q learning. In: AAAI Spring Symposium (2003)
Crandall, J., Goodrich, M.: Learning to compete, compromise, and cooperate in repeated general-sum games. In: Proceedings ICML 2005 (2005)
Littman, M., Stone, P.: A polynomial-time Nash equilibrium algorithm for repeated games. Decision Support Systems 39(1), 55–66 (2005)
Nash, J.: The Bargaining Problem. Econometrica 18(2), 155–162 (1950)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
de Farias, D., Megiddo, N., Cambridge, M., San Jose, C.: Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments. In: Advances in Neural Information Processing Systems 17: Proceedings of The 2004 Conference. MIT Press, Cambridge (2005)
Singh, S., Jaakkola, T., Littman, M., Szepesvári, C.: Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. Machine Learning 38(3), 287–308 (2000)
Chalkiadakis, G., Boutilier, C.: Coordination in multiagent reinforcement learning: A bayesian approach. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2003), Melbourne, Australia (2003)
Brams, S.: Theory of Moves. American Scientist 81(6), 562–570 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burkov, A., Chaib-draa, B. (2009). Anytime Self-play Learning to Satisfy Functional Optimality Criteria. In: Rossi, F., Tsoukias, A. (eds) Algorithmic Decision Theory. ADT 2009. Lecture Notes in Computer Science(), vol 5783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04428-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-04428-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04427-4
Online ISBN: 978-3-642-04428-1
eBook Packages: Computer ScienceComputer Science (R0)