Intelligent Control and Computer Engineering pp 137-148 | Cite as

# Policy Gradient Approach for Learning of Soccer Player Agents

- 682 Downloads

## Abstract

This research develops a learning method for the pass selection problem of midfielders in RoboCup Soccer Simulation games. A policy gradient method is applied as a learning method to solve this problem because it can easily represent the various heuristics of pass selection in a policy function. We implement the learning function in the midfielders’ programs of a well-known team, UvA Trilearn Base 2003. Experimental results show that our method effectively achieves clever pass selection by midfielders in full games. Moreover, in this method’s framework, dribbling is learned as a pass technique, in essence to and from the passer itself. It is also shown that the improvement in pass selection by our learning helps to make a team much stronger.

## Keywords

Multi-agent system Pass selection Policy gradient method Reinforcement learning RoboCup## References

- 1.Weiss, G., Sen, S.: Adaption and Learning in Multi-agent System. Springer-Verlag, Berlin (1996) CrossRefGoogle Scholar
- 2.Sen, S., Weiss, G.: Learning in multiagent systems. In: Weiss, G. (ed.) Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, pp. 259–298. The MIT Press, Cambridge (1999) Google Scholar
- 3.Arai, S., Miyazaki, K.: Learning robust policies for uncertain and stochastic multi-agent domains. In: 7th International Symposium on Artificial Life and Robotics, pp. 179–182 (2002) Google Scholar
- 4.Lovejoy, W.S.: A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res.
**28**, 47–66 (1991) MathSciNetzbMATHCrossRefGoogle Scholar - 5.Igarashi, H., Nakamura, K., Ishihara, S.: Learning of soccer player agents using a policy gradient method: coordination between kicker and receiver during free kicks. In: 2008 International Joint Conference on Neural Networks (IJCNN 2008), pp. 46–52 (2008) Google Scholar
- 6.Igarashi, H., Fukuoka, H., Ishihara, S.: Learning of soccer player agents using a policy gradient method: pass selection. In: Lecture Notes in Engineering and Computer Science: Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010, Hong Kong, 17–19 March 2010, pp. 31–35 (2010) Google Scholar
- 7.Sutton, R.S., Barto, A.G.: Reinforcement Learning. The MIT Press, Cambridge (1998) Google Scholar
- 8.Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. J. Artif. Intell. Res.
**4**, 237–285 (1996) Google Scholar - 9.Andou, T.: Refinement of soccer agents’ positions using reinforcement learning. In: Kitano, H. (ed.) RoboCup-97: Robot Soccer World Cup I, pp. 373–388. Springer-Verlag, Berlin (1998) CrossRefGoogle Scholar
- 10.Riedmiller, M., Gabel, T.: On experiences in a complex and competitive gaming domain-reinforcement learning meets RoboCup. In: The 2007 IEEE Symposium on Computational Intelligence and Games (CIG2007), pp. 17–23 (2007) Google Scholar
- 11.Stone, P., Kuhlmann, G., Taylor, M.E., Liu, Y.: Keepaway soccer: from machine learning test bed to benchmark. In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005: Robot Soccer World Cup IX, pp. 93–105. Springer-Verlag, New York (2006) CrossRefGoogle Scholar
- 12.Kalyanakrishnan, S., Liu, Y., Stone, P.: Half field offense in RoboCup soccer – A multiagent reinforcement learning case study. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup-2006: Robot Soccer World Cup X, pp. 72–85. Springer-Verlag, Berlin (2007) CrossRefGoogle Scholar
- 13.Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn.
**8**, 229–256 (1992) zbMATHGoogle Scholar - 14.Igarashi, H., Ishihara, S., Kimura, M.: Reinforcement learning in non-Markov decision processes – statistical properties of characteristic eligibility. IEICE Trans. Inform. Syst.
**J90-D**(9), 2271–2280 (2007) (in Japanese). This paper is translated into English and included in The Research Reports of Shibaura Institute of Technology, Nat. Sci. Eng.**52**(2), 1–7 (2008). ISSN 0386-3115 Google Scholar - 15.Ishihara, S., Igarashi, H.: Applying the policy gradient method to behavior learning in multiagent systems: the pursuit problem. Syst. Comput. Jpn.
**37**(10), 101–109 (2006) CrossRefGoogle Scholar - 16.Peshkin, L., Kim, K.E., Meuleau, N., Kaelbling, L.P.: Learning to cooperate via policy search. In: 16th Conference on Uncertainty in Artificial Intelligence (UAI2000), pp. 489–496 (2000) Google Scholar
- 17.UvA Trilearn 2003: http://staff.science.uva.nl/~jellekok/robocup/2003/
- 18.Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12 (NIPS’99), pp. 1057–1063 (2000) Google Scholar
- 19.Conda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems 12 (NIPS’99), pp. 1008–1014 (2000) Google Scholar