Toward Faster Reinforcement Learning for Robotics: Using Gaussian Processes

  • Ali Younes
  • Aleksandr I. PanovEmail author
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11866)


Standard robotic control works perfectly in case of ordinary conditions, but in the case of a change in the conditions (e.g. damaging of one of the motors), the robot won’t achieve its task anymore. We need an algorithm that provide the robot with the ability of adaption to unforeseen situations. Reinforcement learning provide a framework corresponds with that requirements, but it needs big data sets to learn robotic tasks, which is impractical. We discuss using Gaussian processes to improve the efficiency of the Reinforcement learning, where a Gaussian Process will learn a state transition model using data from the robot (interaction) phase, and after that use the learned GP model to simulate trajectories and optimize the robot’s controller in a (simulation) phase. PILCO algorithm considered as the most data efficient RL algorithm. It gives promising results in Cart-pole task, where a working controller was learned after seconds of (interaction) on the real robot, but the whole training time, considering the training in the (simulation) was longer. In this work, we will try to leverage the abilities of the computational graphs to produce a ROS friendly python implementation of PILCO, and discuss a case study of a real world robotic task.


Robot learning Reinforcement learning Gaussian process Data efficient 


  1. 1.
    McFarlane, D.C., Glover, K.: Robust Controller Design Procedure Using Normalized Coprime Factor Plant Descriptions. Lecture Notes in Control and Information Sciences, vol. 138. Springer, Heidelberg (1990). Scholar
  2. 2.
    Rocco, P.: Stability of PID control for industrial robot arms. IEEE Trans. Robot. Autom. 12(4), 606–614 (1996)CrossRefGoogle Scholar
  3. 3.
    Åström, K.J., Wittenmark, B.: Adaptive Control. Courier Corporation, Mineola (2013)zbMATHGoogle Scholar
  4. 4.
    Wen, J.T., Murphy, S.H.: PID control for robot manipulators. Rensselaer Polytechnic Institute (1990)Google Scholar
  5. 5.
    Teixeira, R.A., Braga, A.D.P., De Menezes, B.R.: Control of a robotic manipulator using artificial neural networks with on-line adaptation. Neural Process. Lett. 12(1), 19–31 (2000)CrossRefGoogle Scholar
  6. 6.
    Nesnas, I.A., et al.: CLARAty: challenges and steps toward reusable robotic software. Int. J. Adv. Robot. Syst. 3(1), 5 (2006)CrossRefGoogle Scholar
  7. 7.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  8. 8.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996) CrossRefGoogle Scholar
  9. 9.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897, June 2015Google Scholar
  10. 10.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
  11. 11.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  12. 12.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  13. 13.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  14. 14.
    Deisenroth, M.P., Neumann, G., Peters, J.: A survey on policy search for robotics. Found. Trends® Robot. 2(1–2), 1–142 (2013)CrossRefGoogle Scholar
  15. 15.
    Carlson, J., Murphy, R.R.: How UGVs physically fail in the field. IEEE Trans. Robot. 21(3), 423–437 (2005)CrossRefGoogle Scholar
  16. 16.
    Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503 (2015)CrossRefGoogle Scholar
  17. 17.
    Nagatani, K., et al.: Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots. J. Field Robot. 30(1), 44–63 (2013)CrossRefGoogle Scholar
  18. 18.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  19. 19.
    Ko, J., Klein, D.J., Fox, D., Haehnel, D.: Gaussian processes and reinforcement learning for identification and control of an autonomous blimp. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 742–747. IEEE, April 2007Google Scholar
  20. 20.
    Wilson, A., Fern, A., Tadepalli, P.: Incorporating domain models into Bayesian optimization for RL. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 467–482. Springer, Heidelberg (2010). Scholar
  21. 21.
    Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 154–161 (2003)Google Scholar
  22. 22.
    Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)CrossRefGoogle Scholar
  23. 23.
    Matthews, D.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(1), 1299–1304 (2017)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
  25. 25.
  26. 26.
  27. 27.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Bauman Moscow State Technical UniversityMoscowRussia
  2. 2.Artificial Intelligence Research InstituteFederal Research Center “Computer Science and Control” of the Russian Academy of SciencesMoscowRussia
  3. 3.Moscow Institute of Physics and TechnologyMoscowRussia

Personalised recommendations