Adaptive Multiagent Reinforcement Learning with Non-positive Regret

Nguyen, Duong D.; White, Langford B.; Nguyen, Hung X.

doi:10.1007/978-3-319-50127-7_3

Duong D. Nguyen²¹,
Langford B. White²¹ &
Hung X. Nguyen²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3212 Accesses

Abstract

We propose a novel adaptive reinforcement learning (RL) procedure for multi-agent non-cooperative repeated games. Most existing regret-based algorithms only use positive regrets in updating their learning rules. In this paper, we adopt both positive and negative regrets in reinforcement learning to improve its convergence behaviour. We prove theoretically that the empirical distribution of the joint play converges to the set of correlated equilibrium. Simulation results demonstrate that our proposed procedure outperforms the standard regret-based RL approach and a well-known state-of-the-art RL scheme in the literature in terms of both computational requirements and system fairness. Further experiments demonstrate that the performance of our solution is robust to variations in the total number of agents in the system; and that it can achieve markedly better fairness performance when compared to other relevant methods, especially in a large-scale multiagent system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Bhatnagar, S., Prasad, H., Prashanth, L.: Reinforcement learning. In: Bhatnagar, S., Prasad, H., Prashanth, L. (eds.) Stochastic Recursive Algorithms for Optimization, pp. 187–220. Springer, London (2013)
Chapter MATH Google Scholar
Sandholm, T.W., Crites, R.H.: Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems 37(1–2), 147–166 (1996)
Article Google Scholar
Hart, S., Mas-Colell, A.: A reinforcement procedure leading to correlated equilibrium. In: Debreu, G., Neuefeind, W., Trockel, W. (eds.) Economics Essays, pp. 181–200. Springer, Berlin (2001). doi:10.1007/978-3-662-04623-4_12
Chapter Google Scholar
Tembine, H.: Fully distributed learning for global optima. In: Distributed Strategic Learning for Wireless Engineers, pp. 317–359. CRC Press, UK (2012)
Google Scholar
Kalathi, D., Borkar, V.S., Jain, R.: Blackwell’s approachability in stackelberg stochastic games: a learning version. In: 53rd IEEE Conference on Decision and Control, pp. 4467–4472 (2014)
Google Scholar
Bravo, M., Faure, M.: Reinforcement learning with restrictions on the action set. SIAM J. Control Optim. 53(1), 287–312 (2015)
Article MathSciNet MATH Google Scholar
Borowski, H.P., Marden, J.R., Shamma, J.S.: Learning efficient correlated equilibria. In: 53rd IEEE Conference on Decision and Control, pp. 6836–6841 (2014)
Google Scholar
Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5), 1127–1150 (2000)
Article MathSciNet MATH Google Scholar
Bowling, M.: Convergence and no-regret in multiagent learning. Adv. Neural Inf. Process. Syst. 17, 209–216 (2005)
Google Scholar
Cigler, L., Faltings, B.: Reaching correlated equilibria through multi-agent learning. In: The 10th International Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 509–516 (2011)
Google Scholar
Aumann, R.J.: Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55(1), 1 (1987)
Article MathSciNet MATH Google Scholar
Benam, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions, part II: applications. Math. OR 31(4), 673–695 (2006)
Article MATH Google Scholar
Apt, K.R., Grädel, E.: A primer on strategic games. In: Apt, K.R., Grädel, E. (eds.) Lectures in Game Theory for Computer Scientists, pp. 1–37. Cambridge University Press (2011)
Google Scholar

Download references

Acknowledgment

This research is partially supported by the Australian Research Council Linkage Grant LP100200493.

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA, 5005, Australia
Duong D. Nguyen & Langford B. White
Teletraffic Research Centre, The University of Adelaide, Adelaide, SA, 5005, Australia
Hung X. Nguyen

Authors

Duong D. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Langford B. White
View author publications
You can also search for this author in PubMed Google Scholar
Hung X. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duong D. Nguyen .

Editor information

Editors and Affiliations

University of Tasmania, Hobart, Australia
Byeong Ho Kang
Auckland University of Technology, Auckland, New Zealand
Quan Bai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, D.D., White, L.B., Nguyen, H.X. (2016). Adaptive Multiagent Reinforcement Learning with Non-positive Regret. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-50127-7_3
Published: 29 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics