Skip to main content

A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems

  • Chapter
  • First Online:
Self-Learning Optimal Control of Nonlinear Systems

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 103))

Abstract

In this chapter, a novel iterative Q-learning algorithm, called “policy iteration-based deterministic Q-learning algorithm,” is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically nonincreasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are used to implement the policy iteration-based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Al-tamimi, A., Lewis, F.L., Abu-khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 943–949 (2008)

    Article  Google Scholar 

  2. Beard, R.: Improving the Closed-Loop Performance of Nonlinear Systems. Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY (1995)

    Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  4. Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cogn. Comput. 5, 264–277 (2013)

    Article  Google Scholar 

  5. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton FL (2010)

    Book  MATH  Google Scholar 

  6. Dierks, T., Jagannathan, S.: Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans. Neural Netw. 23, 1118–1129 (2012)

    Article  Google Scholar 

  7. Dierks, T., Thumati, B., Jagannathan, S.: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw. 22, 851–860 (2009)

    Article  MATH  Google Scholar 

  8. Dorf, R.C., Bishop, R.H.: Modern Control Systems, 12th edn. Prentice Hall, New York (2011)

    MATH  Google Scholar 

  9. Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)

    Article  Google Scholar 

  10. Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22, 259–269 (2013)

    Article  Google Scholar 

  11. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-sistani, M.B.: Reinforcement image-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50, 1167–1175 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41, 1281–1288 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32, 76–105 (2012)

    Article  MathSciNet  Google Scholar 

  14. Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51, 1249–1260 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43, 779–789 (2013)

    Article  Google Scholar 

  16. Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 621–634 (2014)

    Article  Google Scholar 

  17. Modares, H., Lewis, F.L., Naghibi-sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn.Syst. 24, 1513–1525 (2013)

    Article  Google Scholar 

  18. Modares, H., Lewis, F.L., Naghibi-sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  19. Molina, D., Venayagamoothy, G.K., Liang, J., Harley, R.G.: Intelligent local area signals based damping of power system oscillations using virtual generators and approximate dynamic programming. IEEE Trans. Smart Grid 4, 498–508 (2013)

    Article  Google Scholar 

  20. Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern Part C Appl. Rev. 32, 140–153 (2002)

    Article  Google Scholar 

  21. Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.N.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19, 1648–1660 (2006)

    Article  MATH  Google Scholar 

  22. Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12, 412–421 (2011)

    Article  Google Scholar 

  23. Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Trans. Neural Netw. 8, 997–1007 (1997)

    Article  Google Scholar 

  24. Sahoo, A., Jagannathan, S.: Event-triggered optimal regulation of uncertain linear discrete-time systems by using Q-learning scheme. In: Proceedings of the IEEE Conference on Decision and Control, pp. 1233–1238, Los Angeles, CA, USA (2014)

    Google Scholar 

  25. Si, J., Wang, Y.T.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12, 264–276 (2001)

    Article  Google Scholar 

  26. Song, R., Xiao, W., Sun, C.: A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture. Sci. Chin. Inf. Sci. 57, 682021–06820210 (2014)

    MATH  Google Scholar 

  27. Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  28. Watkins, C.: Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge, England (1989)

    Google Scholar 

  29. Watkins, C., Danyan, P.: \(Q\)-learning. Mach. Learn. 8, 279–292 (1992)

    Google Scholar 

  30. Wei, Q., Liu, D.: An iterative \(\epsilon \)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw. 32, 236–244 (2012)

    Article  MATH  Google Scholar 

  31. Wei, Q., Liu, D.: Numerically adaptive learning control scheme for discrete-time nonlinear systems. IET Control Theor. Appl. 7, 1472–1486 (2013)

    Article  Google Scholar 

  32. Wei, Q., Liu, D.: A novel dual iterative \(Q\)-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. (2014, in press). doi:10.1109/TIE.2014.2361485

  33. Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11, 1176–1190 (2014)

    Article  Google Scholar 

  34. Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11, 1020–1036 (2014)

    Article  Google Scholar 

  35. Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Industr. Electron. 61, 6399–6408 (2014)

    Article  Google Scholar 

  36. Wei, Q., Liu, D.: Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing (2014, in press). doi:10.1016/j.neucom.2013.09.069

  37. Wei, Q., Liu, D.: Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Comput. Appl. 24, 1355–1367 (2014)

    Article  Google Scholar 

  38. Wei, Q., Liu, D.: A novel policy iteration based deterministic \(Q\)-learning for discrete-time nonlinear systems. Sci. Chin. Inf. Sci. 58, 122203 (2015)

    Article  Google Scholar 

  39. Wei, Q., Liu, D., Xu, Y.: Policy iteration optimal tracking control for chaotic systems by adaptive dynamic programming approach. Chinese Physics B accept (2014)

    Google Scholar 

  40. Wei, Q., Wang, D., Zhang, D.: Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays. Neural Comput. Appl. 23, 1851–1863 (2013)

    Article  Google Scholar 

  41. Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. (2014, in press). doi:10.1109/TCYB.2014.2354377

  42. Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 937–942 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinglai Wei .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Science Press, Beijing and Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Wei, Q., Song, R., Li, B., Lin, X. (2018). A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems. In: Self-Learning Optimal Control of Nonlinear Systems. Studies in Systems, Decision and Control, vol 103. Springer, Singapore. https://doi.org/10.1007/978-981-10-4080-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4080-1_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4079-5

  • Online ISBN: 978-981-10-4080-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics