Policy Gradient Approach for Learning of Soccer Player Agents

Pass Selection of Midfielders
  • Harukazu IgarashiEmail author
  • Hitoshi Fukuoka
  • Seiji Ishihara
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 70)


This research develops a learning method for the pass selection problem of midfielders in RoboCup Soccer Simulation games. A policy gradient method is applied as a learning method to solve this problem because it can easily represent the various heuristics of pass selection in a policy function. We implement the learning function in the midfielders’ programs of a well-known team, UvA Trilearn Base 2003. Experimental results show that our method effectively achieves clever pass selection by midfielders in full games. Moreover, in this method’s framework, dribbling is learned as a pass technique, in essence to and from the passer itself. It is also shown that the improvement in pass selection by our learning helps to make a team much stronger.


Multi-agent system Pass selection Policy gradient method Reinforcement learning RoboCup 


  1. 1.
    Weiss, G., Sen, S.: Adaption and Learning in Multi-agent System. Springer-Verlag, Berlin (1996) CrossRefGoogle Scholar
  2. 2.
    Sen, S., Weiss, G.: Learning in multiagent systems. In: Weiss, G. (ed.) Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, pp. 259–298. The MIT Press, Cambridge (1999) Google Scholar
  3. 3.
    Arai, S., Miyazaki, K.: Learning robust policies for uncertain and stochastic multi-agent domains. In: 7th International Symposium on Artificial Life and Robotics, pp. 179–182 (2002) Google Scholar
  4. 4.
    Lovejoy, W.S.: A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28, 47–66 (1991) MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Igarashi, H., Nakamura, K., Ishihara, S.: Learning of soccer player agents using a policy gradient method: coordination between kicker and receiver during free kicks. In: 2008 International Joint Conference on Neural Networks (IJCNN 2008), pp. 46–52 (2008) Google Scholar
  6. 6.
    Igarashi, H., Fukuoka, H., Ishihara, S.: Learning of soccer player agents using a policy gradient method: pass selection. In: Lecture Notes in Engineering and Computer Science: Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010, Hong Kong, 17–19 March 2010, pp. 31–35 (2010) Google Scholar
  7. 7.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning. The MIT Press, Cambridge (1998) Google Scholar
  8. 8.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996) Google Scholar
  9. 9.
    Andou, T.: Refinement of soccer agents’ positions using reinforcement learning. In: Kitano, H. (ed.) RoboCup-97: Robot Soccer World Cup I, pp. 373–388. Springer-Verlag, Berlin (1998) CrossRefGoogle Scholar
  10. 10.
    Riedmiller, M., Gabel, T.: On experiences in a complex and competitive gaming domain-reinforcement learning meets RoboCup. In: The 2007 IEEE Symposium on Computational Intelligence and Games (CIG2007), pp. 17–23 (2007) Google Scholar
  11. 11.
    Stone, P., Kuhlmann, G., Taylor, M.E., Liu, Y.: Keepaway soccer: from machine learning test bed to benchmark. In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005: Robot Soccer World Cup IX, pp. 93–105. Springer-Verlag, New York (2006) CrossRefGoogle Scholar
  12. 12.
    Kalyanakrishnan, S., Liu, Y., Stone, P.: Half field offense in RoboCup soccer – A multiagent reinforcement learning case study. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup-2006: Robot Soccer World Cup X, pp. 72–85. Springer-Verlag, Berlin (2007) CrossRefGoogle Scholar
  13. 13.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992) zbMATHGoogle Scholar
  14. 14.
    Igarashi, H., Ishihara, S., Kimura, M.: Reinforcement learning in non-Markov decision processes – statistical properties of characteristic eligibility. IEICE Trans. Inform. Syst. J90-D(9), 2271–2280 (2007) (in Japanese). This paper is translated into English and included in The Research Reports of Shibaura Institute of Technology, Nat. Sci. Eng. 52(2), 1–7 (2008). ISSN 0386-3115 Google Scholar
  15. 15.
    Ishihara, S., Igarashi, H.: Applying the policy gradient method to behavior learning in multiagent systems: the pursuit problem. Syst. Comput. Jpn. 37(10), 101–109 (2006) CrossRefGoogle Scholar
  16. 16.
    Peshkin, L., Kim, K.E., Meuleau, N., Kaelbling, L.P.: Learning to cooperate via policy search. In: 16th Conference on Uncertainty in Artificial Intelligence (UAI2000), pp. 489–496 (2000) Google Scholar
  17. 17.
  18. 18.
    Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (NIPS’99), pp. 1057–1063 (2000) Google Scholar
  19. 19.
    Conda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12 (NIPS’99), pp. 1008–1014 (2000) Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Harukazu Igarashi
    • 1
    Email author
  • Hitoshi Fukuoka
    • 2
  • Seiji Ishihara
    • 3
  1. 1.Shibaura Institute of TechnologyTokyoJapan
  2. 2.Kinki UniversityHigashi-Hiroshima City, HiroshimaJapan
  3. 3.Kinki UniversityHigashi-Hiroshima City, HiroshimaJapan

Personalised recommendations