A Gaussian Process Reinforcement Learning Algorithm with Adaptability and Minimal Tuning Requirements

Strahl, Jonathan; Honkela, Timo; Wagner, Paul

doi:10.1007/978-3-319-11179-7_47

Jonathan Strahl²¹,
Timo Honkela^21,22 &
Paul Wagner²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

International Conference on Artificial Neural Networks

4356 Accesses
2 Citations

Abstract

We present a novel Bayesian reinforcement learning algorithm that addresses model bias and exploration overhead issues. The algorithm combines different aspects of several state-of-the-art reinforcement learning methods that use Gaussian Processes model-based approaches to increase the use of the online data samples. The algorithm uses a smooth reward function requiring the reward value to be derived from the environment state. It works with continuous states and actions in a coherent way with a minimized need for expert knowledge in parameter tuning. We analyse and discuss the practical benefits of the selected approach in comparison to more traditional methodological choices, and illustrate the use of the algorithm in a motor control problem involving a two-link simulated arm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbeel, P., Coates, A., Quigley, M., Ng, A.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, vol. 19, pp. 1–8 (2007)
Google Scholar
Głowacka, D., Ruotsalo, T., Konuyshkova, K., Kaski, S., Jacucci, G.: Directing exploratory search: Reinforcement learning from user interactions with keywords. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 117–128. ACM (2013)
Google Scholar
Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 569–600. Springer, Heidelberg (2012)
Chapter Google Scholar
Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Transactions on Neural Networks 15(3), 639–652 (2004)
Article Google Scholar
Montazeri, H., Moradi, S., Safabakhsh, R.: Continuous state/action reinforcement learning: A growing self-organizing map approach. Neurocomputing 74(7), 1069–1082 (2011)
Article Google Scholar
Graziano, V., Koutník, J., Schmidhuber, J.: Unsupervised modeling of partially observable environments. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS (LNAI), vol. 6911, pp. 503–515. Springer, Heidelberg (2011)
Chapter Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Cambridge Univ. Press (1998)
Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical report, Cambridge University Engineering Dept. (1994)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E.: PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the International Conference on Machine Learning (2011)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E.: Efficient reinforcement learning for motor control. In: Proceedings of the 10th International PhD Workshop on Systems and Control, Hlubok nad Vltavou, Czech Republic (2009)
Google Scholar
Körding, K.P., Wolpert, D.M.: The loss function of sensorimotor learning. Proceedings of the National Academy of Sciences of the United States of America 101(26), 9839–9842 (2004)
Google Scholar
Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Advances in Neural Information Processing Systems 16, pp. 751–759. MIT Press (2004)
Google Scholar
Jakab, H., Csató, L.: Improving Gaussian process value function approximation in policy gradient algorithms. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 221–228. Springer, Heidelberg (2011)
Chapter Google Scholar
Englert, P., Paraschos, A., Peters, J., Deisenroth, M.P.: Model-based imitation learning by probabilistic trajectory matching. In: Proceedings of 2013 IEEE International Conference on Robotics and Automation (ICRA) (2013)
Google Scholar
Ko, J., Klein, D.J.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: IEEE Intl. Conf. on Robotics and Automation (ICRA) (2007)
Google Scholar
Ghavamzadeh, M., Engel, Y.: Bayesian actor-critic algorithms. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 297–304. ACM, New York (2007)
Google Scholar
Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic Gaussian kernels for value function approximation. Auton. Robots 25(3), 287–304 (2008)
Article Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 201–208. ACM, New York (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University, P.O. Box 15400, 00076, Aalto, Finland
Jonathan Strahl, Timo Honkela & Paul Wagner
Department of Modern Languages, University of Helsinki, P.O. Box 24, 00014, Helsinki, Finland
Timo Honkela

Authors

Jonathan Strahl
View author publications
You can also search for this author in PubMed Google Scholar
Timo Honkela
View author publications
You can also search for this author in PubMed Google Scholar
Paul Wagner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Stefan Wermter , Cornelius Weber & Sven Magg , &
Department of Informatics, Nicolaus Compernicus University, ul. Grudziądzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Modern Languages, University of Helsinki, P.O. Box 24, 00014, Helsinki, Finland
Timo Honkela
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl. 25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89069, Oberer Eselsberg, Ulm, Germany
Günther Palm
Department of Information Systems, Quartier UNIL-Dorigny, Bâtiment Internef, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Strahl, J., Honkela, T., Wagner, P. (2014). A Gaussian Process Reinforcement Learning Algorithm with Adaptability and Minimal Tuning Requirements. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-11179-7_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics