FMR-GA – A Cooperative Multi-agent Reinforcement Learning Algorithm Based on Gradient Ascent

Zhang, Zhen; Wang, Dongqing; Zhao, Dongbin; Song, Tingting

doi:10.1007/978-3-319-70087-8_86

Zhen Zhang¹⁸,
Dongqing Wang¹⁸,
Dongbin Zhao¹⁹ &
…
Tingting Song¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

International Conference on Neural Information Processing

4708 Accesses
3 Citations

Abstract

Gradient ascent methods combined with Multi-Agent Reinforcement Learning (MARL) have been studied for years as a potential direction to design new MARL algorithms. This paper proposes a gradient-based MARL algorithm – Frequency of the Maximal Reward based on Gradient Ascent (FMR-GA). The aim is to reach the maximal total reward in repeated games. To achieve this goal and simplify the stability analysis procedure, we have made effort in two aspects. Firstly, the probability of getting the maximal total reward is selected as the objective function, which simplifies the expression of the gradient and facilitates reaching the learning goal. Secondly, a factor is designed and is added to the gradient. This will produce the desired stable critical points corresponding to the optimal joint strategy. We propose a MARL algorithm called Probability of Maximal Reward based on Infinitsmall Gradient Ascent (PMR-IGA), and analyze its convergence in two-player two-action and two-player three-action repeated games. Then we derive a practical MARL algorithm FMR-GA from PMR-IGA. Theoretical and simulation results show that FMR-GA will converge to the optimal strategy in the cases presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Busoniu, L., Babuska, R., De Schutter.: A comprehensive survey of multi-agent reinforcement learning. IEEE Trans. Syst. Man Cybern. C, Appl. Rev. 38(2), 156–172 (2008)
Google Scholar
Zhang, Z., Zhao, D.: Clique-based cooperative multiagent reinforcement learning using factor graphs. IEEE/CAA J. Autom. Sinica 1(3), 248–256 (2014)
Article MathSciNet Google Scholar
Zhao, D., Zhang, Z., Dai, Y.: Self-teaching adaptive dynamic programming for gomoku. Neurocomputing 78(1), 23–29 (2012)
Article Google Scholar
Waltman, L., Kaymak, U.: A theoretical analysis of cooperative behavior in multi-agent Q-learning. In: Proceedings of the 2007 IEEE Symposium on ADPRL, pp. 84–91 (2007)
Google Scholar
Tuyls, K., Verbeeck, K., Lenaerts, T.: A Selection-mutation model for Q-Learning in multi-agent systems. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 693–700. ACM (2003)
Google Scholar
Tuyls, K., Parsons, S.: What evolutionary game theory tells us about multiagent learning. Artif. Intell. 171(7), 406–416 (2007)
Article MATH MathSciNet Google Scholar
Bloembergen, D., Tuyls, K., Hennes, D., et al.: Evolutionary dynamics of multi-agent learning: a survey. J. Artif. Intell. Res. 53, 659–697 (2015)
MATH MathSciNet Google Scholar
Kianercy, A., Galstyan, A.: Dynamics of boltzmann Q-Learning in two-player two-action games. Phys. Rev. E 85(4), 1145–1154 (2012)
Article Google Scholar
Babes, M., Wunder, M., Littman, M.: Q-Learning in two-player two-action games. In: AAMAS (2009)
Google Scholar
Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. In: Proceedings of UAI, pp. 541–548 (2000)
Google Scholar
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artif. Intell. 136(2), 215–250 (2002)
Article MATH MathSciNet Google Scholar
Zhang, Z., Zhao, D., Gao, J., et al.: FMRQ - a multiagent reinforcement learning algorithm for fully cooperative tasks. IEEE Trans. Cybern. 47(6), 1367–1379 (2017)
Article Google Scholar

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (61573353, 61533017, 61573205), and Foundation of Shandong Province under Grant (ZR2017PF005, ZR2015FM017).

Author information

Authors and Affiliations

School of Automation and Electrical Engineering, Qingdao University, Qingdao, 266071, China
Zhen Zhang, Dongqing Wang & Tingting Song
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Dongbin Zhao

Authors

Zhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongbin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Zhang .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Wang, D., Zhao, D., Song, T. (2017). FMR-GA – A Cooperative Multi-agent Reinforcement Learning Algorithm Based on Gradient Ascent. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_86

Download citation

DOI: https://doi.org/10.1007/978-3-319-70087-8_86
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics