Abstract
There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium that is not desirable in games like the Prisoner’s Dilemma (PD). The author proposed the utility-based Q-learning (UB-Q) for PD that used utilities instead of rewards so as to maintain mutual cooperation once it had occurred. However, UB-Q has to know the payoffs of the game to calculate the utilities and works only in PD. Since a Q-learning agent’s action depends on the relation of Q-values, the mutual cooperation can also be maintained by adjusting the learning rate. Thus, this paper deals with the learning rate directly and introduces another Q-learning method called the learning-rate adjusting Q-learning (LRA-Q). It calculates the learning rate from received payoffs and works in other kinds of two-person two-action symmetric games as well as PD. Numeric verification showed success of LRA-Q, but, it also revealed a side-effect.
This work was supported by KAKENHI No.18700145 from MEXT, Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Axelrod, R.: The Evolution of Cooperation. Basic Books, New York (1984)
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Hu, J., Wellman, M.P.: Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research 4, 1039–1069 (2003)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proc. ML 1994, pp. 157–163 (1994)
Littman, M.L.: Friend-or-Foe Q-Learning in General-Sum Games. In: Proc. ICML 2001, pp. 322–328 (2001)
Moriyama, K.: Utility Based Q-learning to Maintain Cooperation in Prisoner’s Dilemma Games. In: Proc. IAT 2007, pp. 146–152 (2007)
Poundstone, W.: Prisoner’s Dilemma. Doubleday, New York (1992)
Sandholm, T.W., Crites, R.H.: Multiagent reinforcement learning in the Iterated Prisoner’s Dilemma. BioSystems 37, 147–166 (1996)
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge, MA (1998)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moriyama, K. (2009). Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games. In: HÃ¥kansson, A., Nguyen, N.T., Hartung, R.L., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2009. Lecture Notes in Computer Science(), vol 5559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01665-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-01665-3_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01664-6
Online ISBN: 978-3-642-01665-3
eBook Packages: Computer ScienceComputer Science (R0)