Advertisement

Learning to Run Challenge Solutions: Adapting Reinforcement Learning Methods for Neuromusculoskeletal Environments

  • Łukasz KidzińskiEmail author
  • Sharada Prasanna Mohanty
  • Carmichael F. Ong
  • Zhewei Huang
  • Shuchang Zhou
  • Anton Pechenko
  • Adam Stelmaszczyk
  • Piotr Jarosik
  • Mikhail Pavlov
  • Sergey Kolesnikov
  • Sergey Plis
  • Zhibo Chen
  • Zhizheng Zhang
  • Jiale Chen
  • Jun Shi
  • Zhuobin Zheng
  • Chun Yuan
  • Zhihui Lin
  • Henryk Michalewski
  • Piotr Milos
  • Blazej Osinski
  • Andrew Melnik
  • Malte Schilling
  • Helge Ritter
  • Sean F. Carroll
  • Jennifer Hicks
  • Sergey Levine
  • Marcel Salathé
  • Scott Delp
Conference paper
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)

Abstract

In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). URL https://www.tensorflow.org/. Software available from tensorflow.org
  2. Anonymous: Distributional policy gradients. International Conference on Learning Representations (2018). URL https://openreview.net/forum?id=SyZipzbCb
  3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)Google Scholar
  4. Bratko, I., Urbancic, T., Sammut, C.: Behavioural cloning: Phenomena, results and problems. IFAC Proceedings Volumes 28(21), 143–149 (1995). DOI https://doi.org/10.1016/S1474-6670(17)46716-4. URL http://www.sciencedirect.com/science/article/pii/S1474667017467164 CrossRefGoogle Scholar
  5. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015)Google Scholar
  6. Dhariwal, P., Hesse, C., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: OpenAI Baselines. https://github.com/openai/baselines (2017)
  7. Dietterich, T.G., et al.: Ensemble methods in machine learning. Multiple classifier systems 1857, 1–15 (2000)CrossRefGoogle Scholar
  8. Dorigo, M., Colombetti, M.: Robot Shaping: An Experiment in Behavior Engineering. MIT Press, Cambridge, MA, USA (1997)Google Scholar
  9. Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., Riedmiller, M., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)Google Scholar
  10. Heess, N., Wayne, G., Tassa, Y., Lillicrap, T.P., Riedmiller, M.A., Silver, D.: Learning and transfer of modulated locomotor controllers. CoRR abs/1610.05182 (2016). URL http://arxiv.org/abs/1610.05182
  11. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep Reinforcement Learning that Matters. ArXiv e-prints (2017)Google Scholar
  12. Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017)Google Scholar
  13. Ijspeert, A., Nakanishi, J., Pastor, P., Hoffmann, H., Schaal, S.: Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation 25, 328–373 (2013). URL http://www-clmc.usc.edu/publications/I/ijspeert-NC2013.pdf. Clmc
  14. Jaśkowski, W., Lykkebø, O.R., Toklu, N.E., Trifterer, F., Buk, Z., Koutník, J., Gomez, F.: Reinforcement Learning to Run…Fast. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)Google Scholar
  15. Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35(1), 221–231 (2013)CrossRefGoogle Scholar
  16. Kidziński, Ł., Sharada, M.P., Ong, C., Hicks, J., Francis, S., Levine, S., Salathé, M., Delp, S.: Learning to run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)Google Scholar
  17. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). URL http://arxiv.org/abs/1412.6980
  18. Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. arXiv preprint arXiv:1706.02515 (2017)Google Scholar
  19. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)Google Scholar
  20. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). URL http://arxiv.org/abs/1312.5602
  21. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  22. Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped dqn. In: Advances in Neural Information Processing Systems, pp. 4026–4034 (2016)Google Scholar
  23. Pavlov, M., Kolesnikov, S., Plis, S.M.: Run, skeleton, run: skeletal model in a physics-based simulation. ArXiv e-prints (2017)Google Scholar
  24. Plappert, M.: keras-rl. https://github.com/matthiasplappert/keras-rl (2016)
  25. Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2) (2017)Google Scholar
  26. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chap. Learning Internal Representations by Error Propagation, pp. 318–362. MIT Press, Cambridge, MA, USA (1986). URL http://dl.acm.org/citation.cfm?id=104279.104293
  27. Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., Hadsell, R.: Progressive neural networks. CoRR abs/1606.04671 (2016). URL http://arxiv.org/abs/1606.04671
  28. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution Strategies as a Scalable Alternative to Reinforcement Learning. ArXiv e-prints (2017)Google Scholar
  29. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Starter code for evolution strategies. https://github.com/openai/evolution-strategies-starter (2017)
  30. Schaal, S.: Dynamic movement primitives -a framework for motor control in humans and humanoid robotics. In: H. Kimura, K. Tsuchiya, A. Ishiguro, H. Witte (eds.) Adaptive Motion of Animals and Machines, pp. 261–280. Springer Tokyo, Tokyo (2006). DOI 10.1007/4-431-31381-8_23. URL https://doi.org/10.1007/4-431-31381-8_23 CrossRefGoogle Scholar
  31. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)Google Scholar
  32. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). URL http://arxiv.org/abs/1707.06347
  33. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. ArXiv e-prints (2017)Google Scholar
  34. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)Google Scholar
  35. Stelmaszczyk, A., Jarosik, P.: Our NIPS 2017: Learning to Run source code. https://github.com/AdamStelmaszczyk/learning2run (2017)
  36. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999)MathSciNetCrossRefGoogle Scholar
  37. Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Physical review 36(5), 823 (1930)CrossRefGoogle Scholar
  38. Wiering, M., Schmidhuber, J.: HQ-learning. Adaptive Behaviour 6 (1997)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Łukasz Kidziński
    • 1
    Email author
  • Sharada Prasanna Mohanty
    • 3
  • Carmichael F. Ong
    • 1
  • Zhewei Huang
    • 4
  • Shuchang Zhou
    • 4
  • Anton Pechenko
    • 9
  • Adam Stelmaszczyk
    • 7
  • Piotr Jarosik
    • 10
  • Mikhail Pavlov
    • 5
  • Sergey Kolesnikov
    • 5
  • Sergey Plis
    • 5
  • Zhibo Chen
    • 6
  • Zhizheng Zhang
    • 6
  • Jiale Chen
    • 6
  • Jun Shi
    • 6
  • Zhuobin Zheng
    • 8
  • Chun Yuan
    • 8
  • Zhihui Lin
    • 8
  • Henryk Michalewski
    • 7
  • Piotr Milos
    • 7
  • Blazej Osinski
    • 7
  • Andrew Melnik
    • 11
  • Malte Schilling
    • 11
  • Helge Ritter
    • 11
  • Sean F. Carroll
    • 3
  • Jennifer Hicks
    • 1
  • Sergey Levine
    • 2
  • Marcel Salathé
    • 3
  • Scott Delp
    • 1
  1. 1.Department of BioengineeringStanford UniversityStanfordUSA
  2. 2.Department of Electrical Engineering and Computer SciencesUniversity of CaliforniaBerkeleyUSA
  3. 3.Ecole Polytechnique Federale de LausanneLausanneSwitzerland
  4. 4.Beijing UniversityBeijingChina
  5. 5.reason8.aiSan FranciscoUSA
  6. 6.Immersive Media Computing LabUniversity of Science and Technology of ChinaHefeiChina
  7. 7.University of WarsawWarsawPoland
  8. 8.Tunghai UniversityTaichung CityTaiwan
  9. 9.YandexMoscowRussia
  10. 10.Institute of Fundamental Technological Research, Polish Academy of SciencesWarsawPoland
  11. 11.CITEC, Bielefeld UniversityBielefeldGermany

Personalised recommendations