Multi-agent Learning and the Reinforcement Gradient

Kaisers, Michael; Bloembergen, Daan; Tuyls, Karl

doi:10.1007/978-3-642-34799-3_10

Multi-agent Learning and the Reinforcement Gradient

Michael Kaisers²²,
Daan Bloembergen²² &
Karl Tuyls²²

Conference paper

925 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7541))

Abstract

This article shows that seemingly diverse implementations of multi-agent reinforcement learning share the same basic building block in their learning dynamics: a mathematical term that is closely related to the gradient of the expected reward. Gradient Ascent on the expected reward has been used to derive strong convergence results in two-player two-action games, at the expense of strong assumptions such as full information on the game that is being played. Variations of Gradient Ascent, such as Infinitesimal Gradient Ascent (IGA), Win-or-Learn-Fast IGA, and Weighted Policy Learning (WPL), assume a known value function for which the reinforcement gradient can be computed directly. In contrast, independent multi-agent reinforcement learning algorithms that assume less information on the game being played such as Cross learning, variations of Q-learning and Regret minimization base their learning on feedback from discrete interactions with the environment, requiring neither an explicit representation of the value function nor its gradient. Despite this much stricter limitation on information available to these algorithms, they yield dynamics which are very similar to Gradient Ascent and exhibit equivalent convergence behavior. In addition to the formal derivation, directional field plots of the learning dynamics in representative classes of two-player two-action games illustrate the similarities and strengthen the theoretical findings.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdallah, S., Lesser, V.: A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research 33(1), 521–549 (2008)
MathSciNet MATH Google Scholar
Blum, A., Mansour, Y.: Learning, regret minimization and equilibria. Cambridge University Press (2007)
Google Scholar
Börgers, T., Sarin, R.: Learning through reinforcement and replicator dynamics. Journal of Economic Theory 77(1) (November 1997)
Google Scholar
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Article MathSciNet MATH Google Scholar
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156–172 (2008)
Article Google Scholar
Crandall, J.W., Ahmed, A., Goodrich, M.A.: Learning in repeated games with minimal information: The effects of learning bias. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Google Scholar
Cross, J.G.: A stochastic learning model of economic behavior. The Quarterly Journal of Economics 87(2), 239 (1973)
Article Google Scholar
Gibbons, R.: A Primer in Game Theory. Pearson Education (1992)
Google Scholar
Gintis, H.: Game Theory Evolving, 2nd edn. University Press, Princeton (2009)
MATH Google Scholar
Hofbauer, J., Sigmund, K.: Evolutionary Games and Population Dynamics. Cambridge University Press (2002)
Google Scholar
Kaisers, M., Tuyls, K.: Frequency adjusted multi-agent Q-learning. In: Proc. of 9th Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), May 10-14, pp. 309–315 (2010)
Google Scholar
Kaisers, M., Tuyls, K.: Faq-learning in matrix games: Demonstrating convergence near nash equilibria, and bifurcation of attractors in the battle of sexes. In: Proceedings of the Workshop on Interactive Decision Theory and Game Theory (2011)
Google Scholar
Klos, T., van Ahee, G.J., Tuyls, K.: Evolutionary Dynamics of Regret Minimization. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 82–96. Springer, Heidelberg (2010)
Chapter Google Scholar
Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246(2), 15–18 (1973)
Article Google Scholar
Narendra, K.S., Thathachar, M.A.L.: Learning automata - a survey. IEEE Transactions on Systems, Man, and Cybernetics 4(4), 323–334 (1974)
Article MathSciNet MATH Google Scholar
Sandholm, W.H.: Population Games and Evolutionary Dynamics. The MIT Press, Cambridge (2010)
MATH Google Scholar
Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. In: Proc. of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 541–548 (2000)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An introduction. MIT Press, Cambridge (1998)
Google Scholar
Thathachar, M.A.L., Sastry, P.S.: Varieties of learning automata: An overview. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 32(6), 711–722 (2002)
Article Google Scholar
Tuyls, K., Jan’t Hoen, P., Vanschoenwinkel, B.: An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-Agent Systems 12, 115–153 (2006)
Article Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Knowledge Engineering, Maastricht University, The Netherlands
Michael Kaisers, Daan Bloembergen & Karl Tuyls

Authors

Michael Kaisers
View author publications
You can also search for this author in PubMed Google Scholar
Daan Bloembergen
View author publications
You can also search for this author in PubMed Google Scholar
Karl Tuyls
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Viale delle Scienze, ICAR-CNR, Ed. 11, 90128, Palermo, Italy
Massimo Cossentino
Departement of Knowledge Engineering, Maastricht University, Bouillonstraat 8-10, 6211, Maastricht, LH, The Netherlands
Michael Kaisers
Department of Knowledge Engineering, Maastricht University, Bouillonstraat 8-10, 6211, Maastricht, LH, The Netherlands
Karl Tuyls & Gerhard Weiss &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaisers, M., Bloembergen, D., Tuyls, K. (2012). Multi-agent Learning and the Reinforcement Gradient. In: Cossentino, M., Kaisers, M., Tuyls, K., Weiss, G. (eds) Multi-Agent Systems. EUMAS 2011. Lecture Notes in Computer Science(), vol 7541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34799-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-34799-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34798-6
Online ISBN: 978-3-642-34799-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics