Efficient and Robust Learning on Elaborated Gaits with Curriculum Learning

  • Bo ZhouEmail author
  • Hongsheng Zeng
  • Fan Wang
  • Rongzhong Lian
  • Hao Tian
Conference paper
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


Developing efficient walking gaits for biomechanical robots is a difficult task that requires optimizing parameters in a continuous, multidimensional space. In this paper we present a new framework for learning complex gaits with musculoskeletal models. We use Deep Deterministic Policy Gradient which is driven by the external control command, and apply curriculum learning to acquire a reasonable starting policy. We accelerate the learning process with large-scale distributed training and bootstrapped deep exploration paradigm. As a result, our approach won the NeurIPS 2018: AI for Prosthetics competition, scoring more than 30 points than the second placed solution.


  1. 1.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)Google Scholar
  2. 2.
    Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp. 1329–1338 (2016)Google Scholar
  3. 3.
    Huang, Z., Zhou, S., Zhuang, B., Zhou, X.: Learning to run with actor-critic ensemble. CoRR abs/1712.08987 (2017). URL
  4. 4.
    Karpathy, A., van de Panne, M.: Curriculum learning for motor skills. In: L. Kosseim, D. Inkpen (eds.) Advances in Artificial Intelligence, pp. 325–330. Springer Berlin Heidelberg, Berlin, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Kidzinski, L., Mohanty, S.P., Ong, C.F., Huang, Z., Zhou, S., Pechenko, A., Stelmaszczyk, A., Jarosik, P., Pavlov, M., Kolesnikov, S., Plis, S.M., Chen, Z., Zhang, Z., Chen, J., Shi, J., Zheng, Z., Yuan, C., Lin, Z., Michalewski, H., Milos, P., Osinski, B., Melnik, A., Schilling, M., Ritter, H.J., Carroll, S.F., Hicks, J.L., Levine, S., Salathé, M., Delp, S.L.: Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. CoRR abs/1804.00361 (2018). URL
  6. 6.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)Google Scholar
  7. 7.
    Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. arXiv preprint arXiv:1706.08090 (2017)Google Scholar
  8. 8.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814 (2010)Google Scholar
  9. 9.
    Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped dqn. In: Advances in neural information processing systems, pp. 4026–4034 (2016)Google Scholar
  10. 10.
    Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)Google Scholar
  11. 11.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)Google Scholar
  12. 12.
    Seth, A., Hicks, J.L., Uchida, T.K., Habib, A., Dembia, C.L., Dunne, J.J., Ong, C.F., DeMers, M.S., Rajagopal, A., Millard, M., et al.: Opensim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement. PLoS computational biology 14(7), e1006223 (2018)CrossRefGoogle Scholar
  13. 13.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484 (2016)CrossRefGoogle Scholar
  14. 14.
    Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, O.X., Duan, Y., Schulman, J., DeTurck, F., Abbeel, P.: # exploration: A study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2753–2762 (2017)Google Scholar
  15. 15.
    Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Bo Zhou
    • 1
    Email author
  • Hongsheng Zeng
    • 1
  • Fan Wang
    • 1
  • Rongzhong Lian
    • 1
  • Hao Tian
    • 1
  1. 1.Baidu Inc.ShenzhenChina

Personalised recommendations