Abstract
In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn intercepting a rolling ball. Main focus is on how to adapt the Q-learning algorithm to the needs of learning strategies for real robots and how to transfer strategies learned in simulation onto real robots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asada, M., Noda, S., Tawaratsumida, S., Hosoda, K.: Vision-based reinforcement learning for purposive behavior acquisition. In: Proc. of IEEE Int. Conf. on Robotics and Automation, pp. 146–153. IEEE Computer Society Press, Los Alamitos (1995)
Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Machine Learning, pp. 30–37 (1995)
Behnke, S., Egorova, A., Gloye, A., Rojas, R., Simon, M.: Predicting away robot control latency. In: Polani, D., Browning, B., Bonarini, A., Yoshida, K. (eds.) RoboCup 2003. LNCS (LNAI), vol. 3020, pp. 712–719. Springer, Heidelberg (2004)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Gabel, T., Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Bridging the gap: Learning in the robocup simulation and midsize league. In: Controlo 2006. Proc. 7th Portuguese Conference on Automatic Control (2006)
Gabel, T., Riedmiller, M.: Learning a partial behavior for a competitive robotic soccer agent. Künstliche Intelligenz 20(2), 18–23 (2006)
Hafner, R., Lange, S., Lauer, M., Riedmiller, M.: Brainstormers Tribots team description. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup-2006. LNCS(LNAI), vol. 4434, Springer, Heidelberg (2006)
Howard, R.A.: Dynamic programming and Markov processes. MIT Press, Cambridge (1960)
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E., Matsubara, H.: RoboCup: A challenge problem for AI. AI Magazine 18(1), 73–85 (1997)
Lauer, M.: Ego-motion estimation and collision detection for omnidirectional robots. In: Lakemeyer, G., Sklar, E., Sorrenti, D.G., Takahashi, T. (eds.) RoboCup-2006. LNCS(LNAI), vol. 4434, Springer, Heidelberg (2006)
Lauer, M., Lange, S., Riedmiller, M.: Motion estimation of moving objects for autonomous mobile robots. Künstliche Intelligenz 20(1), 11–17 (2006)
Merke, A., Schoknecht, R.: A necessary condition of convergence for reinforcement learning with function approximation. In: Proceedings of the 19th International Conference on Machine Learning, pp. 411–418 (2002)
Munos, R., Moore, A.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: International Joint Conferenece on Artificial Intelligence, pp. 1348–1355 (1999)
Pareigis, S.: Adaptive choice of grid and time in reinforcement learning. Advances inNeural Information Processing Systems 10, 1036–1042 (1997)
Schoknecht, R., Merke, A.: Convergent combinations of reinforcement learning with linear function approximation. Advances in Neural Information Processing Systems 15 (2003)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Suzuki, S., Kato, T., Asada, M., Hosoda, K.: Behavior learning for a mobile robot with omnidirectional vision enhanced by an active zoom mechanism. In: IAS-5. Proc. of Intelligent Autonomous System 5, pp. 242–249 (1998)
Tsitsiklis, J.N., Van Roy, B.: Analysis of temporal-diffference learning with function approximation. In: Advances in Neural Information Processing Systems 1996, pp. 1075–1081 (1996)
Uchibe, E., Asada, M., Hosoda, K.: Behavior learning for a mobile robot with omnidirectional vision enhanced by an active zoom mechanism. In: Birk, A., Demiris, J. (eds.) Learning Robots. LNCS (LNAI), vol. 1545, Springer, Heidelberg (1998)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Müller, H., Lauer, M., Hafner, R., Lange, S., Merke, A., Riedmiller, M. (2007). Making a Robot Learn to Play Soccer Using Reward and Punishment. In: Hertzberg, J., Beetz, M., Englert, R. (eds) KI 2007: Advances in Artificial Intelligence. KI 2007. Lecture Notes in Computer Science(), vol 4667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74565-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-74565-5_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74564-8
Online ISBN: 978-3-540-74565-5
eBook Packages: Computer ScienceComputer Science (R0)