Skip to main content

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

  • Conference paper
  • First Online:
Advances in Neural Networks – ISNN 2015 (ISNN 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9377))

Included in the following conference series:

  • 2422 Accesses

Abstract

In this paper, a novel Q-learning based policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.

This work was supported in part by the National Natural Science Foundation of China under Grants 61273140, 61304086, 61374105, and 61233001, and in part by Beijing Natural Science Foundation under Grant 4132078.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics 38(4), 943–949 (2008)

    Article  Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  4. Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming with an application to power systems. IEEE Transactions on Neural Networks and Learning Systems 24(7), 1150–1156 (2013)

    Article  Google Scholar 

  5. Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 25(3), 621–634 (2014)

    Article  Google Scholar 

  6. Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Transactions on Automatic Control 59(11), 3051–3056 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  7. Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Transactions on Neural Networks and Learning systems 24(10), 1513–1525 (2013)

    Article  Google Scholar 

  8. Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpur, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Transactions on Neural Networks 8(5), 997–1007 (1997)

    Article  Google Scholar 

  11. Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 25(9), 1733–1739 (2014)

    Article  Google Scholar 

  12. Song, R., Lewis, F.L., Wei, Q., Zhang, H., Jiang, Z.-P., Levine, D.: Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Transactions on Neural Networks and Learning Systems 26(4), 851–865 (2015)

    Article  MathSciNet  Google Scholar 

  13. Si, J., Wang, Y.-T.: On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks 12(2), 264–276 (2001)

    Article  Google Scholar 

  14. Watkins, C.: Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge (1989)

    Google Scholar 

  15. Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)

    Article  MATH  Google Scholar 

  16. Wei, Q., Liu, D.: An iterative ε-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Networks 32, 236–244 (2012)

    Article  MATH  Google Scholar 

  17. Wei, Q., Zhang, H., Dai, J.: Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9), 1839–1848 (2009)

    Article  Google Scholar 

  18. Wei, Q., Liu, D.: A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Transactions on Automation Science and Engineering 11(4), 1176–1190 (2014)

    Article  Google Scholar 

  19. Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics 61(11), 6399–6408 (2014)

    Article  Google Scholar 

  20. Wei, Q., Liu, D.: Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing 149(3), 106–115 (2015)

    Article  Google Scholar 

  21. Wei, Q., Liu, D., Shi, G., Liu, Y.: Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics (2015) (article in press)

    Google Scholar 

  22. Wei, Q., Liu, D., Shi, G.: A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Transactions on Industrial Electronics 62(4), 2509–2518 (2015)

    Article  Google Scholar 

  23. Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Transactions on Automation Science and Engineering 11(4), 1020–1036 (2014)

    Article  Google Scholar 

  24. Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 26(4), 866–879 (2015)

    Article  MathSciNet  Google Scholar 

  25. Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook 22, 25–38 (1977)

    Google Scholar 

  26. Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)

    Google Scholar 

  27. Xu, X., Lian, C., Zuo, L., He, H.: Kernel-based approximate dynamic programming for real-time online learning control: An experimental study. IEEE Transactions on Control Systems Technology 22(1), 146–156 (2014)

    Article  Google Scholar 

  28. Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on System, Man, and cybernetics–Part B: Cybernetics 38(4), 937–942 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinglai Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wei, Q., Liu, D. (2015). A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks – ISNN 2015. ISNN 2015. Lecture Notes in Computer Science(), vol 9377. Springer, Cham. https://doi.org/10.1007/978-3-319-25393-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25393-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25392-3

  • Online ISBN: 978-3-319-25393-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics