Autonomous Robots

, Volume 43, Issue 4, pp 875–896 | Cite as

An ensemble inverse optimal control approach for robotic task learning and adaptation

  • Hang YinEmail author
  • Francisco S. Melo
  • Ana Paiva
  • Aude Billard


This paper contributes a novel framework to efficiently learn cost-to-go function representations for robotic tasks with latent modes. The proposed approach relies on the principle behind ensemble methods, where improved performance is obtained by aggregating a group of simple models, each of which can be efficiently learnedq. The maximum-entropy approximation is adopted as an effective initialization and the quality of this surrogate is guaranteed by a theoretical bound. Our approach also provides an alternative perspective to view the popular mixture of Gaussians under the framework of inverse optimal control. We further propose to enforce a dynamics on the model ensemble, using Kalman estimation to infer and modulate model modes. This allows robots to exploit the demonstration redundancy and to adapt to human interventions, especially in tasks where sensory observations are non-Markovian. The framework is demonstrated with a synthetic inverted pendulum example and online adaptation tasks, which include robotic handwriting and mail delivery.


Learning from demonstrations Human-robot collaboration Ensemble methods Inverse optimal control 



This work is partially funded by Swiss National Center of Robotics Research and national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and the doctoral Grant (ref. SFRH/BD/51933/2012) under IST-EPFL Joint Doctoral Initiative.

Supplementary material

Supplementary material 1 (wmv 25252 KB)


  1. Abdolmaleki, A., Lau, N., Paulo Reis, L., & Neumann, G. (2016). Contextual stochastic search. In Proceedings of the 2016 on genetic and evolutionary computation conference companion, GECCO ’16 companion (pp. 29–30). New York, NY: ACM.Google Scholar
  2. Akgun, B., Cakmak, M., Yoo, J. W., & Thomaz, A. L. (2012). Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp 391–398). New York, NY.Google Scholar
  3. Bagnell, J. A. D. (2015). An invitation to imitation. Tech. Rep. CMU-RI-TR-15-08, Robotics Institute, Pittsburgh, PA.Google Scholar
  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.zbMATHGoogle Scholar
  5. Calinon, S. (2015). Robot learning with task-parameterized generative models. In Proceedings of the international symposium of robotics research (ISRR).Google Scholar
  6. Calinon, S., Pervez, A., & Caldwell, D. G. (2012). Multi-optima exploration with adaptive Gaussian mixture model. In Proceedings of the international conference on development and learning (ICDL-EpiRob). San Diego, CA.Google Scholar
  7. Calinon, S., Bruno, D., & Caldwell, D. G. (2014). A task-parameterized probabilistic model with minimal intervention control. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 3339–3344).Google Scholar
  8. Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2–3), 81–227.zbMATHGoogle Scholar
  9. Dvijotham, K., & Todorov, E. (2010). Inverse optimal control with linearly-solvable mdps. In Proceedings of the international conference on machine learning (ICML) (pp. 335–342).Google Scholar
  10. Englert, P., Paraschos, A., Peters, J., & Deisenroth, M. P. (2013). Model-based imitation learning by probabilistic trajectory matching. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1922–1927).Google Scholar
  11. Ewerton, M., Neumann, G., Lioutikov, R., Ben Amor, H., Peters, J., & Maeda, G. (2015). Learning multiple collaborative tasks with a mixture of interaction primitives. In IEEE international conference on robotics and automation (pp. 1535–1542).Google Scholar
  12. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In Proceedings of the international conference on machine learning (ICML) abs/1603.00448.Google Scholar
  13. Frigola, R., Chen, Y., & Rasmussen, C. E. (2014). Variational Gaussian Process state-space models. In Proceedings of neural information processing systems (NIPS).Google Scholar
  14. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefzbMATHGoogle Scholar
  15. Kalakrishnan, M., Pastor, P., Righetti, L., & Schaal, S. (2013). Learning objective functions for manipulation. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 1331–1336).Google Scholar
  16. Kalman, R. E. (1964). When is a linear control system optimal. Journal of Basic Engineering, 86, 51–60.CrossRefGoogle Scholar
  17. Kappen, H. J., Gmez, V., & Opper, M. (2012). Optimal control as a graphical model inference problem. Machine Learning, 87(2), 159–182.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Khansari, M., Kronander, K., & Billard, A. (2014). Modeling robot discrete movements with state-varying stiffness and damping: A framework for integrated motion generation and impedance control. In Proceedings of robotics: Science and systems (RSS).Google Scholar
  19. Kobilarov, M. (2012). Cross-entropy motion planning. International Journal of Robotics Research, 31(7), 855–871.CrossRefGoogle Scholar
  20. Kukliski, K., Fischer, K., Marhenke, I., Kirstein, F., aus der Wieschen, M. V., Sølvason, D., Krüger, N., & Savarimuthu, T. R. (2014). Teleoperation for learning by demonstration: Data glove versus object manipulation for intuitive robot control. In Ultra modern telecommunications and control systems and workshops (ICUMT), 2014 6th international congress on (pp. 346 – 351). IEEE.Google Scholar
  21. Levine, S., & Koltun, V. (2012). Continuous inverse optimal control with locally optimal examples. In Proceedings of the international conference on machine learning (ICML).Google Scholar
  22. Levine, S., Popovic, Z., & Koltun, V. (2011). Nonlinear inverse reinforcement learning with gaussian processes. In Proceedings of neural information processing systems (NIPS) (pp. 19–27). Curran Associates, Inc.Google Scholar
  23. Monfort, M., Liu, A., & Ziebart, B. D. (2015). Intent prediction and trajectory forecasting via predictive inverse linear-quadratic regulation. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence, AAAI’15 (pp 3672–3678). AAAI Press.Google Scholar
  24. Nehaniv, C. L., & Dautenhahn, K. (2002). The correspondence problem. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 41–61). Cambridge, MA: MIT Press.Google Scholar
  25. Nikolaidis, S., Ramakrishnan, R., Gu, K., & Shah, J. (2015). Efficient model learning from joint-action demonstrations for human-robot collaborative tasks. In Proceedings of the ACM/IEEE international conference on human-robot interaction (HRI) (pp. 189–196). New York, NY: ACM.Google Scholar
  26. Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1), 88–97.CrossRefGoogle Scholar
  27. Ratliff, N., Bagnell, J. A. D., & Zinkevich, M. (2006). Maximum margin planning. In Proceedings of the international conference on machine learning (ICML).Google Scholar
  28. Rozo, L., Bruno, D., Calinon, S., & Caldwell, D. G. (2015). Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS) Hamburg, Germany (pp. 1024–1030).Google Scholar
  29. Todorov, E. (2009). Compositionality of optimal control laws. In Proceedings of neural information processing systems (NIPS) (pp. 1856–1864). Curran Associates Inc., USA.Google Scholar
  30. Watter, M., Springenberg, J. T., Boedecker, J., & Riedmiller, M. A. (2015). Embed to control: A locally linear latent dynamics model for control from raw images. CoRR abs/1506.07365.Google Scholar
  31. Wulfmeier, M., Ondruska, P., & Posner, I. (2015). Maximum entropy deep inverse reinforcement learning. CoRR abs/1507.04888.Google Scholar
  32. Yin, H., Alves-Olivera, P., Melo, F. S., Billard, A., & Paiva, A. (2016). Synthesizing robotic handwriting motion by learning from human demonstrations. In Proceedings of international joint conference on artificial intelligence (IJCAI).Google Scholar
  33. Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the national conference on artificial intelligence (AAAI) (pp. 1433–1438).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Learning Algorithms and Systems LaboratoryÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland
  2. 2.Intelligent Agents and Synthetic Characters Group, INESC-ID and ISTUniversity of LisbonLisbonPortugal

Personalised recommendations