Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games

Moriyama, Koichi

doi:10.1007/978-3-642-01665-3_23

Koichi Moriyama²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5559))

Included in the following conference series:

KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications

1488 Accesses

Abstract

There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium that is not desirable in games like the Prisoner’s Dilemma (PD). The author proposed the utility-based Q-learning (UB-Q) for PD that used utilities instead of rewards so as to maintain mutual cooperation once it had occurred. However, UB-Q has to know the payoffs of the game to calculate the utilities and works only in PD. Since a Q-learning agent’s action depends on the relation of Q-values, the mutual cooperation can also be maintained by adjusting the learning rate. Thus, this paper deals with the learning rate directly and introduces another Q-learning method called the learning-rate adjusting Q-learning (LRA-Q). It calculates the learning rate from received payoffs and works in other kinds of two-person two-action symmetric games as well as PD. Numeric verification showed success of LRA-Q, but, it also revealed a side-effect.

This work was supported by KAKENHI No.18700145 from MEXT, Japan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Axelrod, R.: The Evolution of Cooperation. Basic Books, New York (1984)
MATH Google Scholar
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Article MathSciNet MATH Google Scholar
Hu, J., Wellman, M.P.: Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research 4, 1039–1069 (2003)
MathSciNet MATH Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proc. ML 1994, pp. 157–163 (1994)
Google Scholar
Littman, M.L.: Friend-or-Foe Q-Learning in General-Sum Games. In: Proc. ICML 2001, pp. 322–328 (2001)
Google Scholar
Moriyama, K.: Utility Based Q-learning to Maintain Cooperation in Prisoner’s Dilemma Games. In: Proc. IAT 2007, pp. 146–152 (2007)
Google Scholar
Poundstone, W.: Prisoner’s Dilemma. Doubleday, New York (1992)
Google Scholar
Sandholm, T.W., Crites, R.H.: Multiagent reinforcement learning in the Iterated Prisoner’s Dilemma. BioSystems 37, 147–166 (1996)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge, MA (1998)
Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

The Institute of Scientific and Industrial Research, Osaka University, 8-1, Mihogaoka, Ibaraki, Osaka, 567-0047, Japan
Koichi Moriyama

Authors

Koichi Moriyama
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Systems Science, Stockholm University, Forum 100, 164 40, Kista, Sweden
Anne Håkansson
Institute of Informatics, Wroclaw University of Technology, Str. Janiszweskiego 11/17, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen
Department of Computer Science, Franklin University, 201 South Grant Ave., 43215, Columbus, Ohio, USA
Ronald L. Hartung
School of Environment and Technology, Centre for SMART Systems, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, SA, 5095, Mawson Lakes, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moriyama, K. (2009). Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games. In: Håkansson, A., Nguyen, N.T., Hartung, R.L., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2009. Lecture Notes in Computer Science(), vol 5559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01665-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-01665-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01664-6
Online ISBN: 978-3-642-01665-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics