Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments

  • Chao Yan
  • Xiaojia XiangEmail author
  • Chang Wang


Path planning remains a challenge for Unmanned Aerial Vehicles (UAVs) in dynamic environments with potential threats. In this paper, we have proposed a Deep Reinforcement Learning (DRL) approach for UAV path planning based on the global situation information. We have chosen the STAGE Scenario software to provide the simulation environment where a situation assessment model is developed with consideration of the UAV survival probability under enemy radar detection and missile attack. We have employed the dueling double deep Q-networks (D3QN) algorithm that takes a set of situation maps as input to approximate the Q-values corresponding to all candidate actions. In addition, the ε-greedy strategy is combined with heuristic search rules to select an action. We have demonstrated the performance of the proposed method under both static and dynamic task settings.


Unmanned aerial vehicle (UAV) Path planning Reinforcement learning Deep Q-network STAGE scenario 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tran, L.D., Cross, C.D., Motter, M.A., Neilan, J.H., Qualls, G., Rothhaar, P.M., Trujillo, A., Allen, B.D.: Reinforcement learning with autonomous small unmanned aerial vehicles in cluttered environments. In: Proceedings of AIAA Aviation Technology, Integration, and Operations Conference, 2899 (2015)Google Scholar
  2. 2.
    Faessler, M., Fontana, F., Forster, C., Mueggler, E., Pizzoli, M., Scaramuzza, D.: Autonomous, vision-based flight and live dense 3D mapping with a quadrotor micro aerial vehicle. J. Field. Rob. 33, 431–450 (2016)CrossRefGoogle Scholar
  3. 3.
    Scherer, S., Rehder, J., Achar, S., Cover, H., Chambers, A., Nuske, S., Singh, S.: River mapping from a flying robot: state estimation, river detection, and obstacle mapping. Auton. Robot. 33, 189–214 (2012)CrossRefGoogle Scholar
  4. 4.
    Xie, L., Wang, S., Markham, A., Trigoni, N.: Towards monocular vision based obstacle avoidance through deep reinforcement learning. arXiv:1706.09829(2017)Google Scholar
  5. 5.
    Ross, S., Melik Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., Hebert, M.: Learning monocular reactive UAV control in cluttered natural environments. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 1765–1772 (2013)Google Scholar
  6. 6.
    Ma, Z., Wang, C., Niu, Y., Wang, X., Shen, L.: A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles. Robot. Auton. Syst. 100, 108–118 (2018)CrossRefGoogle Scholar
  7. 7.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  8. 8.
    Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  9. 9.
    Zhao, Y., Zheng, Z., Zhang, X., Liu, Y.: Q learning algorithm based UAV path learning and obstacle avoidance approach. In: Proceedings of Chinese Control Conference (CCC), pp. 3397–3402 (2017)Google Scholar
  10. 10.
    Li, S., Xu, X., Zuo, L.: Dynamic path planning of a mobile robot with improved Q-learning algorithm. In: Proceedings of IEEE International Conference on Information and Automation, pp. 409–414 (2015)Google Scholar
  11. 11.
    Tang, R., Yuan, H.: Cyclic error correction based Q-learning for mobile robots navigation. Int. J. Control. Autom. Syst. 15, 1790–1798 (2017)CrossRefGoogle Scholar
  12. 12.
    Wang, C., Hindriks, K.V., Babuska, R.: Robot learning and use of affordances in goal-directed tasks. In: Proceeding of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2288–2294 (2013)Google Scholar
  13. 13.
    Yan, C., Xiang, X.: A path planning algorithm for UAV based on improved Q-learning. In: Proceedings of IEEE International Conference on Robotics and Automation Sciences, pp. 46–50 (2018)Google Scholar
  14. 14.
    Li, Y.: Deep Reinforcement Learning: an Overview. arXiv:1701.07274(2017)Google Scholar
  15. 15.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602(2013)Google Scholar
  16. 16.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature. 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  17. 17.
    Wu, J., Shin, S., Kim, C.G., Kim, S.D.: Effective lazy training method for deep Q-network in obstacle avoidance and path planning. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1799–1804 (2017)Google Scholar
  18. 18.
    Zhou, B., Wang, W., Wang, Z., Ding, B.: Neural Q learning algorithm based UAV obstacle avoidance. In: Proceedings of IEEE/CSAA Guidance, Navigation and Control Conference, pp. 961–966 (2018)Google Scholar
  19. 19.
    Wang, Y., Peng, D.: A simulation platform of multi-sensor multi-target track system based on STAGE. In: Proceedings of World Congress on Intelligent Control and Automation, pp. 6975–6978 (2010)Google Scholar
  20. 20.
    Deng, Y.: A threat assessment model under uncertain environment. Math. Probl. Eng. 2015, 1–12 (2015)Google Scholar
  21. 21.
    Gao, Y., Xiang, J.: New threat assessment non-parameter model in beyond-visual-range air combat. Journal of System Simulation. 18, 2570–2572 (2006)Google Scholar
  22. 22.
    Xiao, B., Fang, Y., Hu, S., Wang, L.: New threat assessment method in beyond-the-horizon range air combat. Syst. Eng. Electron. 31, 2163–2166 (2009)Google Scholar
  23. 23.
    Ernest, N., Cohen, K., Kivelevitch, E., Schumacher, C., Casbeer, D.: Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles. Unmanned Systems. 3(03), 185–204 (2015)CrossRefGoogle Scholar
  24. 24.
    Wen, N., Su, X., Ma, P., Zhao, L., Zhang, Y.: Online UAV path planning in uncertain and hostile environments. Int. J. Mach. Learn. Cybern. 8, 469–487 (2017)CrossRefGoogle Scholar
  25. 25.
    Kim, Y.J., Hoffmann, C.M.: Enhanced battlefield visualization for situation awareness. Comput. Graph. 27, 873–885 (2003)CrossRefGoogle Scholar
  26. 26.
    Tai, L., Liu, M.: Towards cognitive exploration through deep reinforcement learning for mobile robots. arXiv:1610.01733(2016)Google Scholar
  27. 27.
    Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2015)Google Scholar
  28. 28.
    Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning (ICML), pp. 1995–2003 (2016)Google Scholar
  29. 29.
    Van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)Google Scholar
  30. 30.
    Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., De Freitas, N.: Sample efficient Actor-Critic with experience replay. arXiv:1611.01224(2016)Google Scholar
  31. 31.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of International Conference on Machine Learning (ICML), pp. 807–814 (2010)Google Scholar
  32. 32.
    Kingma, D.P., Ba, J.: Adam: a Method for Stochastic Optimization. arXiv: 1412.6980(2014)Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.College of Intelligence Science and TechnologyNational University of Defense TechnologyChangshaChina

Personalised recommendations