Multi-agent Reinforcement Learning in Stochastic Single and Multi-stage Games

  • Katja Verbeeck
  • Ann Nowé
  • Maarten Peeters
  • Karl Tuyls
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3394)


In this paper we report on a solution method for one of the most challenging problems in Multi-agent Reinforcement Learning, i.e. coordination. In previous work we reported on a new coordinated exploration technique for individual reinforcement learners, called Exploring Selfish Reinforcement Rearning (ESRL). With this technique, agents may exclude one or more actions from their private action space, so as to coordinate their exploration in a shrinking joint action space. Recently we adapted our solution mechanism to work in tree structured common interest multi-stage games. This paper is a roundup on the results for stochastic single and multi-stage common interest games.


Nash Equilibrium Joint Action Multiagent System Equilibrium Path Learning Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 478–485 (1999)Google Scholar
  2. 2.
    Carpenter, M., Kudenko, D.: Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems. In: Kudenko, D., Kazakov, D., Alonso, E. (eds.) AAMAS 2004. LNCS, vol. 3394, pp. 55–72. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the fiftheenth National Conference on Artificial Intelligence, pp. 746–752 (1998)Google Scholar
  4. 4.
    Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Kapetanakis, S., Kudenko, D., Strens, M.: Learning to coordinate using commitment sequences in cooperative multi-agent systems. In: Kudenko, D., Kazakov, D., Alonso, E. (eds.) AAMAS 2004. LNCS, vol. 3394, pp. 106–118. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 535–542 (2000)Google Scholar
  7. 7.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 322–328 (2001)Google Scholar
  8. 8.
    Narendra, K.S., Parthasarathy, K.: Learning automata approach to hierarchical multiobjective analysis. Technical Report No. 8811, Electrical Engineering. Yale University., New Haven, Connecticut (1988)Google Scholar
  9. 9.
    Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall International, Inc., Englewood Cliffs (1989)Google Scholar
  10. 10.
    Nowé, A., Parent, J., Verbeeck, K.: Social agents playing a periodical policy. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany. LNCS (LNAI), vol. 2168, pp. 382–393. Springer, Heidelberg (2001)Google Scholar
  11. 11.
    Osborne, J.O., Rubinstein, A.: A course in game theory. MIT Press, Cambridge (1994)zbMATHGoogle Scholar
  12. 12.
    Parent, J., Verbeeck, K., Nowe, A., Steenhaut, K., Lemeire, J., Dirkx, E.: Adaptive load balancing of parallel applications with social reinforcement learning on heterogeneous systems. Scientific Programming (2004) (to appear)Google Scholar
  13. 13.
    Peeters, M., Verbeeck, K., Nowé, A.: Multi-agent learning in conflicting multi-level games with incomplete information. In: Proceedings of the 2004 American Association for Artificial Intelligence (AAAI) Fall Symposium on Artificial Multi-Agent Learning (2004)Google Scholar
  14. 14.
    Hoen, P.J.’t., Tuyls, K.: Analyzing multi-agent reinforcement learning using evolutionary dynamics. In: Boulicaut, J.-F., et al. (eds.) ECML 2004. LNCS, vol. 3201, pp. 168–179. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Thathachar, M.A.L., Sastry, P.S.: Varieties of learning automata: An overview. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 32(6), 711–722 (2002)CrossRefGoogle Scholar
  16. 16.
    Tuyls, K., Nowe, A., Lenaerts, T., Manderick, B.: An evolutionary game theoretic perspective on learning in multi-agent systems. Synthese, section Knowledge, Rationality and Action 139(2), 297–330 (2004)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Verbeeck, K., Nowé, A., Parent, J., Tuyls, K.: Exploring selfish reinforcement learning in non-zero sum games (2004) (submitted)Google Scholar
  18. 18.
    Verbeeck, K., Nowé, A., Peeters, M.: Multi-agent coordination in tree structured multi-stage games. In: Proceedings of the Fourth Symposium on Adaptive Agents and Multi-agent Systems (AISB 2004) Society for the study of Artificial Intelligence and Simulation of Behaviour, pp. 63–74 (2004)Google Scholar
  19. 19.
    Verbeeck, K., Nowé, A., Tuyls, K.: Coordinated exploration in stochastic common interest games. In: Proceedings of the Third Symposium on Adaptive Agents and Multi-agent Systems (AISB 2003) Society for the study of Artificial Intelligence and Simulation of Behaviour (2003)Google Scholar
  20. 20.
    Wolpert, D.H., Wheller, K.R., Tumer, K.: General principles of learning-based multi-agent systems. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) Proceedings of the Third International Conference on Autonomous Agents (Agents 1999), Seattle, WA, USA, pp. 77–83. ACM Press, New York (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Katja Verbeeck
    • 1
  • Ann Nowé
    • 1
  • Maarten Peeters
    • 1
  • Karl Tuyls
    • 2
  1. 1.Computational Modeling LabVrije Universiteit BrusselBelgium
  2. 2.Theoretical Computer Science GroupUniversity of LimburgBelgium

Personalised recommendations