SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Zhang, Chengwei; Li, Xiaohong; Hao, Jianye; Chen, Siqi; Tuyls, Karl; Xue, Wanli; Feng, Zhiyong

doi:10.1007/s10458-019-09411-3

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Published: 15 May 2019

Volume 33, pages 403–429, (2019)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Chengwei Zhang ORCID: orcid.org/0000-0002-9157-6050^1,2,
Xiaohong Li¹,
Jianye Hao¹,
Siqi Chen¹,
Karl Tuyls³,
Wanli Xue⁴ &
…
Zhiyong Feng¹

1021 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer’s perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner’s dilemma game and the coordination game) are analyzed in detail. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

Article 17 May 2014

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

FMR-GA – A Cooperative Multi-agent Reinforcement Learning Algorithm Based on Gradient Ascent

References

Abdallah, S., & Lesser, V. (2008). A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33(1), 521–549.
Article MathSciNet MATH Google Scholar
Alvard, M. S. (2004) The ultimatum game, fairness, and cooperation among big game hunters. In Foundations of human sociality (pp. 413–435).
Andreoni, J., & Croson, R. (1998). Partners versus strangers: Random rematching in public goods experiments. Amsterdam: Elsevier.
Google Scholar
Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 686–692).
Banerjee, B., & Peng, J. (2004). The role of reactivity in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 538–545).
Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 60–66).
Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in Prisoner’s Dilemma using conditional joint action learning. In AAMAS’07 (pp. 91–108).
Bloembergen, D., Tuyls, K., Hennes, D., & Kaisers, M. (2015). Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53, 659–697.
Article MathSciNet MATH Google Scholar
Bowling, M. (2004). Convergence and no-regret in multiagent learning. In International conference on neural information processing systems (pp. 209–216).
Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.
Article MathSciNet MATH Google Scholar
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.
Article Google Scholar
Chakraborty, D., & Stone, P. (2014). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213.
Article Google Scholar
Coddington, E. A., & Levinson, N. (1955). Theory of ordinary differential equations. New York: McGraw-Hill.
MATH Google Scholar
Conitzer, V., & Sandholm, T. (2007). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43.
Article Google Scholar
Crandall, J. W. (2013). Just add pepper: Extending learning algorithms for repeated matrix games to repeated Markov games. In International conference on autonomous agents and multiagent systems (pp. 399–406).
Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. CoRR arXiv:1709.04326.
Hauert, C., & Szab, G. (2003). Prisoner’s Dilemma and public goods games in different geometries: Compulsory versus voluntary interactions. Complexity, 8(4), 31–38.
Article MathSciNet Google Scholar
Hu, J., & Wellman, M. P. (2003). Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4, 1039–1069.
MathSciNet MATH Google Scholar
Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A.G., et al. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in neural information processing systems (pp. 3330–3340).
Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp. 535–542).
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning, (pp. 322–328).
Littman, M. L. (2001). Friend-or-foe q-learning in general-sum games. In ICML (Vol. 1, pp. 322–328).
Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review, 27(01), 1–31.
Article Google Scholar
Peysakhovich, A., & Lerer, A. (2017). Prosocial learning agents solve generalized stag hunts better than selfish ones. CoRR arXiv:1709.02865.
Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI (Vol. 5, pp. 817–822).
Rodrigues Gomes, E., & Kowalczyk, R. (2009). Dynamic analysis of multiagent q-learning with \(\varepsilon \)-greedy exploration. In Proceedings of the 26th annual international conference on machine learning (pp. 369–376). ACM.
Shilnikov, L. P., Shilnikov, A. L., Turaev, D. V., & Chua, L. O. (2001). Methods of qualitative theory in nonlinear dynamics (Vol. 5). Singapore: World Scientific.
Book MATH Google Scholar
Shivshankar, S., & Jamalipour, A. (2015). An evolutionary game theory-based approach to cooperation in vanets under different network conditions. IEEE Transactions on Vehicular Technology, 64(5), 2015–2022.
Article Google Scholar
Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann.
Tuyls, K., Hoen, P. J., & Vanschoenwinkel, B. (2006). An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-agent Systems, 12(1), 115–153.
Article Google Scholar
Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for q-learning in multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 693–700). ACM.
Vohra, R. V., & Wellman, M. P. (2007). Foundations of multi-agent learning. Artificial Intelligence, 171(7), 363–452.
Article MathSciNet MATH Google Scholar
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Robotics & Autonomous Systems, 15(4), 233–235.
Google Scholar
Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.
MATH Google Scholar
Wei, G., Zhu, P., Vasilakos, A. V., & Mao, Y. (2013). Cooperation dynamics on collaborative social networks of heterogeneous population. IEEE Journal on Selected Areas in Communications, 31(6), 1135–1146.
Article Google Scholar
Zhang, C., & Lesser, V. R. (2010). Multi-agent learning with policy prediction. In Proceedings of the twenty-fourth AAAI conference on artificial intelligence (pp. 927–934).
Zhang, Z., Zhao, D., Gao, J., Wang, D., & Dai, Y. (2017). Fmrq—A multiagent reinforcement learning algorithm for fully cooperative tasks. IEEE Transactions on Cybernetics, 47(6), 1367–1379.
Article Google Scholar
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936).

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (Grant Nos. 61702362, U1836214, 61572349, 61872262, 61602391), Special Program of Artificial Intelligence, Tianjin Research Program of Application Foundation and Advanced Technology (No. 16JCQNJC00100), and Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission (No. 569 17ZXRGGX00150), and the Fundamental Research Funds for the Central Universities (No. 3132019207).

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen & Zhiyong Feng
College of Information Science and Technology, Dalian Maritime University, Dalian, China
Chengwei Zhang
University of Liverpool, Liverpool, UK
Karl Tuyls
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
Wanli Xue

Authors

Chengwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianye Hao
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Karl Tuyls
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Xue
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianye Hao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Li, X., Hao, J. et al. SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes. Auton Agent Multi-Agent Syst 33, 403–429 (2019). https://doi.org/10.1007/s10458-019-09411-3

Download citation

Published: 15 May 2019
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10458-019-09411-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Abstract

Access this article

Similar content being viewed by others

Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

FMR-GA – A Cooperative Multi-agent Reinforcement Learning Algorithm Based on Gradient Ascent

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Abstract

Access this article

Similar content being viewed by others

Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

Adaptive Multiagent Reinforcement Learning with Non-positive Regret

FMR-GA – A Cooperative Multi-agent Reinforcement Learning Algorithm Based on Gradient Ascent

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation