Dyna-Q Algorithm for Path Planning of Quadrotor UAVs

  • Xin HuoEmail author
  • Tianze Zhang
  • Yuzhu Wang
  • Weizhen Liu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 946)


In this paper, the problem of path planning of quadrotor unmanned aerial vehicles (UAVs) is investigated in the framework of reinforcement learning methodology. With the abstraction of the environment in the form of grid world in 2D, the design procedure is presented by utilizing the Dyna-Q algorithm, which is one of the reinforcement method combining both model-based and non-model framework. In this process, an optimal or suboptimal safe flight trajectory will be obtained by learning constantly and planning by simulated experience, thus calculative reward can be maximized efficiently. Matlab software is used for maze establishing and computation, and the effectiveness of the proposed method is illustrated by two typical examples.


Reinforcement learning Dyna-Q algorithm Path planning Quadrotor UAVs 


  1. 1.
    Qu, Y.-H., Pan, Q., Yan, J.-G.: Flight path planning of UAV based on heuristically search and genetic algorithms. In: Proceedings of IECON 2005 (2005)Google Scholar
  2. 2.
    Zhang, T., Huo, X.: Path planning and control of a quadrotor UAV: a symbolic approach. In: 11th Asian Control Conference, pp. 2750–2755 (2017)Google Scholar
  3. 3.
    Goerzen, C., Kong, Z., Mettler, B.: A survey of motion planning algorithms from the perspective of autonomous UAV guidance. J. Intell. Robot. Syst. 57, 65–100 (2010)CrossRefGoogle Scholar
  4. 4.
    Zheng, C., Li, L., Xu, F., Sun, F., Ding, M.: Evolutionary route planner for unmanned air vehicles. IEEE Trans. Robot. 21, 609–620 (2005)CrossRefGoogle Scholar
  5. 5.
    Lange, S., Riedmiller, M., Voigtl\(\ddot{a}\)nder, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: 2012 IEEE World Congress on Computational Intelligence, pp. 10–15 (2012)Google Scholar
  6. 6.
    Tai, L., Liu, M.: Towards cognitive exploration through deep reinforcement learning for mobile robots, arXiv:1610.01733v1 [cs.RO] 6 Oct 2016
  7. 7.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRefGoogle Scholar
  8. 8.
    Zhang, T., Huo, X., Chen, S., Yang, B., Zhang, G.: Hybrid path planning of a quadrotor UAV based on Q-Learning algorithm. In: 37th Chinese Control Conference (2018, accepted)Google Scholar
  9. 9.
    Pfeiffer, M., Schaeuble, M., Nieto, J., Siegwart, R., Cadena, C.: From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots, pp. 1527–1533 (2016)Google Scholar
  10. 10.
    Reißig, G., Weber, A., Rungger, M.: Feedback refinement relations for the synthesis of symbolic controllers. IEEE Trans. Autom. Control 62, 1781–1796 (2015)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). Scholar
  12. 12.
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike Adaptive Elements that Can Solve Difficult Learning Control Problems, pp. 834–846. MIT Press, Cambridge (1988)Google Scholar
  13. 13.
    Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)CrossRefGoogle Scholar
  14. 14.
    Howard, R.A.: Dynamic programming and Markov processes. Math. Gazette 3(358), 120 (1960)Google Scholar
  15. 15.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn. Athena Scientific, Belmont (2007)zbMATHGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Xin Huo
    • 1
    Email author
  • Tianze Zhang
    • 1
    • 2
  • Yuzhu Wang
    • 3
  • Weizhen Liu
    • 1
  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.National Instruments ChinaShanghaiChina
  3. 3.Helong Senior High School of NonganChangchunChina

Personalised recommendations