Self-evaluated Learning Agent in Multiple State Games

  • Koichi Moriyama
  • Masayuki Numao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


Most of multi-agent reinforcement learning algorithms aim to converge to a Nash equilibrium, but a Nash equilibrium does not necessarily mean a desirable result. On the other hand, there are several methods aiming to depart from unfavorable Nash equilibria, but they are effective only in limited games. Based on them, the authors proposed an agent learning appropriate actions in PD-like and non-PD-like games through self-evaluations in a previous paper [11]. However, the experiments we had conducted were static ones in which there was only one state. The versatility for PD-like and non-PD-like games is indispensable in dynamic environments in which there exist several states transferring one after another in a trial. Therefore, we have conducted new experiments in each of which the agents played a game having multiple states. The experiments include two kinds of game; the one notifies the agents of the current state and the other does not. We report the results in this paper.


Nash Equilibrium Reinforcement Learning Symmetric Game Reinforcement Learning Method Altruistic Action 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems. In: Proc. 15th National Conference on Artificial Intelligence, AAAI 1998, Madison, Wisconsin, U.S.A., pp. 746–752 (1998)Google Scholar
  3. 3.
    Hardin, G.: The Tragedy of the Commons. Science 162, 1243–1248 (1968)CrossRefGoogle Scholar
  4. 4.
    Hu, J., Wellman, M.P.: Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm. In: Proc. 15th International Conference on Machine Learning, ICML 1998, Madison, Wisconsin, U.S.A., pp. 242–250 (1998)Google Scholar
  5. 5.
    Ishida, T., Yokoi, H., Kakazu, Y.: Self-Organized Norms of Behavior under Interactions of Selfish Agents. In: Proc. 1999 IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan (1999)Google Scholar
  6. 6.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proc. 11th International Conference on Machine Learning, ML 1994, New Brunswick, New Jersey, U.S.A., pp. 157–163 (1994)Google Scholar
  7. 7.
    Mikami, S., Kakazu, Y.: Co-operation of Multiple Agents Through Filtering Payoff. In: Proc. 1st European Workshop on Reinforcement Learning, EWRL-1, Brussels, Belgium, pp. 97–107 (1994)Google Scholar
  8. 8.
    Mikami, S., Kakazu, Y., Fogarty, T.C.: Co-operative Reinforcement Learning By Payoff Filters. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS (LNAI), vol. 912, pp. 319–322. Springer, Heidelberg (1995)Google Scholar
  9. 9.
    Moriyama, K., Numao, M.: Constructing an Autonomous Agent with an Interdependent Heuristics. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS (LNAI), vol. 1886, pp. 329–339. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Moriyama, K., Numao, M.: Construction of a Learning Agent Handling Its Rewards According to Environmental Situations. In: Proc. 1st International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2002, Bologna, Italy, pp. 1262–1263 (2002)Google Scholar
  11. 11.
    Moriyama, K., Numao, M.: Generating Self-Evaluations to Learn Appropriate Actions in Various Games. Technical Report TR03-0002, Department of Computer Science, Tokyo Institute of Technology (2003)Google Scholar
  12. 12.
    Mundhe, M., Sen, S.: Evolving agent societies that avoid social dilemmas. In: Proc. Genetic and Evolutionary Computation Conference, GECCO 2000, Las Vegas, Nevada, U.S.A., pp. 809–816 (2000)Google Scholar
  13. 13.
    Nagayuki, Y., Ishii, S., Doya, K.: Multi-Agent Reinforcement Learning: An Approach Based on the Other Agent’s Internal Model. In: Proc. 4th International Conference on MultiAgent Systems, ICMAS 2000, Boston, Massachusetts, U.S.A., pp. 215–221 (2000)Google Scholar
  14. 14.
    Poundstone, W.: Prisoner’s Dilemma. Doubleday, New York (1992)Google Scholar
  15. 15.
    Rilling, J.K., Gutman, D.A., Zeh, T.R., Pagnoni, G., Berns, G.S., Kilts, C.D.: A Neural Basis for Social Cooperation. Neuron 35, 395–405 (2002)CrossRefGoogle Scholar
  16. 16.
    Sakaguchi, Y., Takano, M.: Learning to Switch Behaviors for Different Environments: A Computational Model for Incremental Modular Learning. In: Proc. 2001 International Symposium on Nonlinear Theory and its Applications, NOLTA 2001, Zao, Miyagi, Japan, pp. 383–386 (2001)Google Scholar
  17. 17.
    Schmidhuber, J., Zhao, J., Schraudolph, N.N.: Reinforcement Learning with Self- Modifying Policies. In: [19], pp. 293–309 (1997)Google Scholar
  18. 18.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Thrun, S., Pratt, L. (eds.): Learning to Learn. Kluwer Academic Publishers, Norwell (1997)Google Scholar
  20. 20.
    Watkins, C.J.C.H., Dayan, P.: Technical Note: Q-learning. Machine Learning 8, 279–292 (1992)zbMATHGoogle Scholar
  21. 21.
    Weibull, J.W.: Evolutionary Game Theory. MIT Press, Cambridge (1995)zbMATHGoogle Scholar
  22. 22.
    Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996)Google Scholar
  23. 23.
    Wolpert, D.H., Tumer, K.: Collective Intelligence, Data Routing and Braess’ Paradox. Journal of Artificial Intelligence Research 16, 359–387 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Zhao, J., Schmidhuber, J.: Solving a Complex Prisoner’s Dilemma with Self-Modifying Policies. In: From Animals to Animats 5: Proc. 5th International Conference on Simulation of Adaptive Behavior, Zurich, Switzerland, pp. 177–182 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Koichi Moriyama
    • 1
  • Masayuki Numao
    • 2
  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyoJapan
  2. 2.The Institute of Scientific and Industrial ResearchOsaka UniversityOsakaJapan

Personalised recommendations