Simulation and Transfer of Reinforcement Learning Algorithms for Autonomous Obstacle Avoidance

  • Max Lenk
  • Paula HilsendegenEmail author
  • Silvan Michael Müller
  • Oliver Rettig
  • Marcus Strand
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 867)


The explicit programming of obstacle avoidance by an autonomous robot can be a computationally expensive undertaking. The application of reinforcement learning algorithms promises a reduction of programming effort. However, these algorithms build on iterative training processes and therefore are time-consuming. In order to overcome this drawback we propose to displace the training process to abstract simulation scenarios. In this study we trained four different reinforcement algorithms (Q-Learning, Deep-Q-Learning, Deep Deterministic Policy Gradient and Asynchronous Advantage-Actor-Critic) in different abstract simulation scenarios and transferred the learning results to an autonomous robot. Except for the Asynchronous Advantage-Actor-Critic we achieved good obstacle avoidance during the simulation. Without further real-world training the policies learned by Q-Learning and Deep-Q-Learning achieved immediately obstacle avoidance when transferred to an autonomous robot.


Reinforcement learning Machine learning Obstacle avoidance Collision avoidance Simulation 


  1. 1.
    Siegwart, R., Nourbakhsh, I.R., Scaramuzza, D.: Introduction to Autonomous Mobile Robots. MIT Press, Cambridge (2011)Google Scholar
  2. 2.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Rob. Res. 32(11), 1238–1274 (2013)CrossRefGoogle Scholar
  3. 3.
    Azouaoui, O.: Reinforcement learning (RL) based collision avoidance approach for multiple Autonomous Robotic Systems (ARS). In: Proceedings 10th IEEE International Conference on Advanced Robotics, pp. 561–566 (2001)Google Scholar
  4. 4.
    Xie, L., Wang, S., Markham, A., Trigoni, N.: Towards monocular vision based obstacle avoidance through deep reinforcement learning. arXiv preprint arXiv:1706.09829 (2017)
  5. 5.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1, No. 1. MIT Press, Cambridge (1998)Google Scholar
  6. 6.
    Thorndike, E.L.: Laws and hypotheses for behavior. In: Thorndike, E.L. (ed.) Animal Intelligence, pp. 241–281 (1970)Google Scholar
  7. 7.
    Needham, A., Barrett, T., Peterman, K.: A pick-me-up for infants’ exploratory skills: early simulated experiences reaching for objects using ‘sticky mittens’ enhances young infants’ object exploration skills. Infant Behav. Dev. 25(3), 279–295 (2002)CrossRefGoogle Scholar
  8. 8.
    Skinner, B.F.: The evolution of verbal behavior. J. Exp. Anal. Behav. 45(1), 115–122 (1986)CrossRefGoogle Scholar
  9. 9.
    Kuhl, P.K.: A new view of language acquisition. Proc. Nat. Acad. Sci. 97(22), 11850–11857 (2000)CrossRefGoogle Scholar
  10. 10.
    Holroyd, C.B., Coles, M.G.: The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109(4), 679 (2002)CrossRefGoogle Scholar
  11. 11.
    Thrun, S.: An approach to learning mobile robot navigation. Rob. Autonom. Syst. 15(4), 301–319 (1995)CrossRefGoogle Scholar
  12. 12.
    Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)CrossRefGoogle Scholar
  13. 13.
    Szepesvári, C.: Reinforcement Learning Algorithms for MDPs. Morgan and Claypool Publishers, San Rafael (2010)zbMATHGoogle Scholar
  14. 14.
    Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  15. 15.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. Technical report arXiv:1312.5602 [cs.LG], Deepmind Technologies (2013)
  16. 16.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  17. 17.
    Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7), 1180–1190 (2008)CrossRefGoogle Scholar
  18. 18.
    Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  19. 19.
    Palamuttam, R., Chen, W.: Vision enhanced asynchronous advantage actor-critic on racing games. Methods 4, A3C (2017)Google Scholar
  20. 20.
    Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A., Banino, A., Denil, M., Goroshin, R., Sifre, L., Kavukcuoglu, K., Hadsell, R., Kumaran, D.: Learning to navigate in complex environments (2016). arXiv preprint arXiv:1611.03673
  21. 21.
    Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI gym (2016). arXiv preprint arXiv:1606.01540
  22. 22.
    Racaniere, S., Weber, T., Reichert, D.P., Buesing, L., Guez, A., Rezende, D., Badia, A.P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D., Wierstra, D.: Imagination-augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 5694–5705 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Max Lenk
    • 1
  • Paula Hilsendegen
    • 2
    Email author
  • Silvan Michael Müller
    • 2
  • Oliver Rettig
    • 2
  • Marcus Strand
    • 2
  1. 1.SAP SEWalldorfGermany
  2. 2.Department for Computer ScienceDuale Hochschule Baden-WürttembergKarlsruheGermany

Personalised recommendations