Abstract
Most of multi-agent reinforcement learning algorithms aim to converge to a Nash equilibrium, but a Nash equilibrium does not necessarily mean a desirable result. On the other hand, there are several methods aiming to depart from unfavorable Nash equilibria, but they are effective only in limited games. Based on them, the authors proposed an agent learning appropriate actions in PD-like and non-PD-like games through self-evaluations in a previous paper [11]. However, the experiments we had conducted were static ones in which there was only one state. The versatility for PD-like and non-PD-like games is indispensable in dynamic environments in which there exist several states transferring one after another in a trial. Therefore, we have conducted new experiments in each of which the agents played a game having multiple states. The experiments include two kinds of game; the one notifies the agents of the current state and the other does not. We report the results in this paper.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. In: Proc. 15th National Conference on Artificial Intelligence, AAAI 1998, Madison, Wisconsin, U.S.A., pp. 746–752 (1998)
Hardin, G.: The Tragedy of the Commons. Science 162, 1243–1248 (1968)
Hu, J., Wellman, M.P.: Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In: Proc. 15th International Conference on Machine Learning, ICML 1998, Madison, Wisconsin, U.S.A., pp. 242–250 (1998)
Ishida, T., Yokoi, H., Kakazu, Y.: Self-Organized Norms of Behavior under Interactions of Selfish Agents. In: Proc. 1999 IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan (1999)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proc. 11th International Conference on Machine Learning, ML 1994, New Brunswick, New Jersey, U.S.A., pp. 157–163 (1994)
Mikami, S., Kakazu, Y.: Co-operation of Multiple Agents Through Filtering Payoff. In: Proc. 1st European Workshop on Reinforcement Learning, EWRL-1, Brussels, Belgium, pp. 97–107 (1994)
Mikami, S., Kakazu, Y., Fogarty, T.C.: Co-operative Reinforcement Learning By Payoff Filters. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS (LNAI), vol. 912, pp. 319–322. Springer, Heidelberg (1995)
Moriyama, K., Numao, M.: Constructing an Autonomous Agent with an Interdependent Heuristics. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS (LNAI), vol. 1886, pp. 329–339. Springer, Heidelberg (2000)
Moriyama, K., Numao, M.: Construction of a Learning Agent Handling Its Rewards According to Environmental Situations. In: Proc. 1st International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2002, Bologna, Italy, pp. 1262–1263 (2002)
Moriyama, K., Numao, M.: Generating Self-Evaluations to Learn Appropriate Actions in Various Games. Technical Report TR03-0002, Department of Computer Science, Tokyo Institute of Technology (2003)
Mundhe, M., Sen, S.: Evolving agent societies that avoid social dilemmas. In: Proc. Genetic and Evolutionary Computation Conference, GECCO 2000, Las Vegas, Nevada, U.S.A., pp. 809–816 (2000)
Nagayuki, Y., Ishii, S., Doya, K.: Multi-Agent Reinforcement Learning: An Approach Based on the Other Agent’s Internal Model. In: Proc. 4th International Conference on MultiAgent Systems, ICMAS 2000, Boston, Massachusetts, U.S.A., pp. 215–221 (2000)
Poundstone, W.: Prisoner’s Dilemma. Doubleday, New York (1992)
Rilling, J.K., Gutman, D.A., Zeh, T.R., Pagnoni, G., Berns, G.S., Kilts, C.D.: A Neural Basis for Social Cooperation. Neuron 35, 395–405 (2002)
Sakaguchi, Y., Takano, M.: Learning to Switch Behaviors for Different Environments: A Computational Model for Incremental Modular Learning. In: Proc. 2001 International Symposium on Nonlinear Theory and its Applications, NOLTA 2001, Zao, Miyagi, Japan, pp. 383–386 (2001)
Schmidhuber, J., Zhao, J., Schraudolph, N.N.: Reinforcement Learning with Self- Modifying Policies. In: [19], pp. 293–309 (1997)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Thrun, S., Pratt, L. (eds.): Learning to Learn. Kluwer Academic Publishers, Norwell (1997)
Watkins, C.J.C.H., Dayan, P.: Technical Note: Q-learning. Machine Learning 8, 279–292 (1992)
Weibull, J.W.: Evolutionary Game Theory. MIT Press, Cambridge (1995)
Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996)
Wolpert, D.H., Tumer, K.: Collective Intelligence, Data Routing and Braess’ Paradox. Journal of Artificial Intelligence Research 16, 359–387 (2002)
Zhao, J., Schmidhuber, J.: Solving a Complex Prisoner’s Dilemma with Self-Modifying Policies. In: From Animals to Animats 5: Proc. 5th International Conference on Simulation of Adaptive Behavior, Zurich, Switzerland, pp. 177–182 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moriyama, K., Numao, M. (2003). Self-evaluated Learning Agent in Multiple State Games. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive