Making a Robot Learn to Play Soccer Using Reward and Punishment

Müller, Heiko; Lauer, Martin; Hafner, Roland; Lange, Sascha; Merke, Artur; Riedmiller, Martin

doi:10.1007/978-3-540-74565-5_18

Heiko Müller²,
Martin Lauer¹,
Roland Hafner¹,
Sascha Lange¹,
Artur Merke² &
…
Martin Riedmiller¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4667))

Included in the following conference series:

Annual Conference on Artificial Intelligence

1601 Accesses
9 Citations

Abstract

In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn intercepting a rolling ball. Main focus is on how to adapt the Q-learning algorithm to the needs of learning strategies for real robots and how to transfer strategies learned in simulation onto real robots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Vision-based reinforcement learning for purposive behavior acquisition. In: Proc. of IEEE Int. Conf. on Robotics and Automation, pp. 146–153. IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Machine Learning, pp. 30–37 (1995)
Google Scholar
Behnke, S., Egorova, A., Gloye, A., Rojas, R., Simon, M.: Predicting away robot control latency. In: Polani, D., Browning, B., Bonarini, A., Yoshida, K. (eds.) RoboCup 2003. LNCS (LNAI), vol. 3020, pp. 712–719. Springer, Heidelberg (2004)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Google Scholar
Gabel, T., Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Bridging the gap: Learning in the robocup simulation and midsize league. In: Controlo 2006. Proc. 7th Portuguese Conference on Automatic Control (2006)
Google Scholar
Gabel, T., Riedmiller, M.: Learning a partial behavior for a competitive robotic soccer agent. Künstliche Intelligenz 20(2), 18–23 (2006)
Google Scholar
Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Brainstormers Tribots team description. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup-2006. LNCS(LNAI), vol. 4434, Springer, Heidelberg (2006)
Google Scholar
Howard, R.A.: Dynamic programming and Markov processes. MIT Press, Cambridge (1960)
MATH Google Scholar
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E., Matsubara, H.: RoboCup: A challenge problem for AI. AI Magazine 18(1), 73–85 (1997)
Google Scholar
Lauer, M.: Ego-motion estimation and collision detection for omnidirectional robots. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup-2006. LNCS(LNAI), vol. 4434, Springer, Heidelberg (2006)
Google Scholar
Lauer, M., Lange, S., Riedmiller, M.: Motion estimation of moving objects for autonomous mobile robots. Künstliche Intelligenz 20(1), 11–17 (2006)
Google Scholar
Merke, A., Schoknecht, R.: A necessary condition of convergence for reinforcement learning with function approximation. In: Proceedings of the 19th International Conference on Machine Learning, pp. 411–418 (2002)
Google Scholar
Munos, R., Moore, A.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: International Joint Conferenece on Artificial Intelligence, pp. 1348–1355 (1999)
Google Scholar
Pareigis, S.: Adaptive choice of grid and time in reinforcement learning. Advances inNeural Information Processing Systems 10, 1036–1042 (1997)
Google Scholar
Schoknecht, R., Merke, A.: Convergent combinations of reinforcement learning with linear function approximation. Advances in Neural Information Processing Systems 15 (2003)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Suzuki, S., Kato, T., Asada, M., Hosoda, K.: Behavior learning for a mobile robot with omnidirectional vision enhanced by an active zoom mechanism. In: IAS-5. Proc. of Intelligent Autonomous System 5, pp. 242–249 (1998)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: Analysis of temporal-diffference learning with function approximation. In: Advances in Neural Information Processing Systems 1996, pp. 1075–1081 (1996)
Google Scholar
Uchibe, E., Asada, M., Hosoda, K.: Behavior learning for a mobile robot with omnidirectional vision enhanced by an active zoom mechanism. In: Birk, A., Demiris, J. (eds.) Learning Robots. LNCS (LNAI), vol. 1545, Springer, Heidelberg (1998)
Chapter Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Neuroinformatics Group, Institute of Computer Science and Institute of Cognitive Science, University of Osnabrück, 49069 Osnabrück, Germany
Martin Lauer, Roland Hafner, Sascha Lange & Martin Riedmiller
Lehrstuhl Informatik 1, University of Dortmund, 44221 Dortmund, Germany
Heiko Müller & Artur Merke

Authors

Heiko Müller
View author publications
You can also search for this author in PubMed Google Scholar
Martin Lauer
View author publications
You can also search for this author in PubMed Google Scholar
Roland Hafner
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Lange
View author publications
You can also search for this author in PubMed Google Scholar
Artur Merke
View author publications
You can also search for this author in PubMed Google Scholar
Martin Riedmiller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joachim Hertzberg Michael Beetz Roman Englert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller, H., Lauer, M., Hafner, R., Lange, S., Merke, A., Riedmiller, M. (2007). Making a Robot Learn to Play Soccer Using Reward and Punishment. In: Hertzberg, J., Beetz, M., Englert, R. (eds) KI 2007: Advances in Artificial Intelligence. KI 2007. Lecture Notes in Computer Science(), vol 4667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74565-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-74565-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74564-8
Online ISBN: 978-3-540-74565-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics