A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems

Wei, Qinglai; Song, Ruizhuo; Li, Benkai; Lin, Xiaofeng

doi:10.1007/978-981-10-4080-1_4

Qinglai Wei⁶,
Ruizhuo Song⁷,
Benkai Li⁶ &
…
Xiaofeng Lin⁸

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 103))

2073 Accesses
3 Citations

Abstract

In this chapter, a novel iterative Q-learning algorithm, called “policy iteration-based deterministic Q-learning algorithm,” is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically nonincreasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are used to implement the policy iteration-based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Al-tamimi, A., Lewis, F.L., Abu-khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 943–949 (2008)
Article Google Scholar
Beard, R.: Improving the Closed-Loop Performance of Nonlinear Systems. Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY (1995)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cogn. Comput. 5, 264–277 (2013)
Article Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton FL (2010)
Book MATH Google Scholar
Dierks, T., Jagannathan, S.: Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans. Neural Netw. 23, 1118–1129 (2012)
Article Google Scholar
Dierks, T., Thumati, B., Jagannathan, S.: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw. 22, 851–860 (2009)
Article MATH Google Scholar
Dorf, R.C., Bishop, R.H.: Modern Control Systems, 12th edn. Prentice Hall, New York (2011)
MATH Google Scholar
Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)
Article Google Scholar
Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22, 259–269 (2013)
Article Google Scholar
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-sistani, M.B.: Reinforcement image-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50, 1167–1175 (2014)
Article MathSciNet MATH Google Scholar
Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41, 1281–1288 (2005)
Article MathSciNet MATH Google Scholar
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32, 76–105 (2012)
Article MathSciNet Google Scholar
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51, 1249–1260 (2006)
Article MathSciNet MATH Google Scholar
Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43, 779–789 (2013)
Article Google Scholar
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 621–634 (2014)
Article Google Scholar
Modares, H., Lewis, F.L., Naghibi-sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn.Syst. 24, 1513–1525 (2013)
Article Google Scholar
Modares, H., Lewis, F.L., Naghibi-sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)
Article MathSciNet MATH Google Scholar
Molina, D., Venayagamoothy, G.K., Liang, J., Harley, R.G.: Intelligent local area signals based damping of power system oscillations using virtual generators and approximate dynamic programming. IEEE Trans. Smart Grid 4, 498–508 (2013)
Article Google Scholar
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern Part C Appl. Rev. 32, 140–153 (2002)
Article Google Scholar
Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.N.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19, 1648–1660 (2006)
Article MATH Google Scholar
Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12, 412–421 (2011)
Article Google Scholar
Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Trans. Neural Netw. 8, 997–1007 (1997)
Article Google Scholar
Sahoo, A., Jagannathan, S.: Event-triggered optimal regulation of uncertain linear discrete-time systems by using Q-learning scheme. In: Proceedings of the IEEE Conference on Decision and Control, pp. 1233–1238, Los Angeles, CA, USA (2014)
Google Scholar
Si, J., Wang, Y.T.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12, 264–276 (2001)
Article Google Scholar
Song, R., Xiao, W., Sun, C.: A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture. Sci. Chin. Inf. Sci. 57, 682021–06820210 (2014)
MATH Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)
Article MathSciNet MATH Google Scholar
Watkins, C.: Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge, England (1989)
Google Scholar
Watkins, C., Danyan, P.: \(Q\)-learning. Mach. Learn. 8, 279–292 (1992)
Google Scholar
Wei, Q., Liu, D.: An iterative \(\epsilon \)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw. 32, 236–244 (2012)
Article MATH Google Scholar
Wei, Q., Liu, D.: Numerically adaptive learning control scheme for discrete-time nonlinear systems. IET Control Theor. Appl. 7, 1472–1486 (2013)
Article Google Scholar
Wei, Q., Liu, D.: A novel dual iterative \(Q\)-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. (2014, in press). doi:10.1109/TIE.2014.2361485
Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11, 1176–1190 (2014)
Article Google Scholar
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11, 1020–1036 (2014)
Article Google Scholar
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Industr. Electron. 61, 6399–6408 (2014)
Article Google Scholar
Wei, Q., Liu, D.: Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing (2014, in press). doi:10.1016/j.neucom.2013.09.069
Wei, Q., Liu, D.: Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Comput. Appl. 24, 1355–1367 (2014)
Article Google Scholar
Wei, Q., Liu, D.: A novel policy iteration based deterministic \(Q\)-learning for discrete-time nonlinear systems. Sci. Chin. Inf. Sci. 58, 122203 (2015)
Article Google Scholar
Wei, Q., Liu, D., Xu, Y.: Policy iteration optimal tracking control for chaotic systems by adaptive dynamic programming approach. Chinese Physics B accept (2014)
Google Scholar
Wei, Q., Wang, D., Zhang, D.: Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays. Neural Comput. Appl. 23, 1851–1863 (2013)
Article Google Scholar
Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. (2014, in press). doi:10.1109/TCYB.2014.2354377
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 937–942 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qinglai Wei & Benkai Li
University of Science and Technology Beijing, Beijing, China
Ruizhuo Song
Guangxi University, Guangxi, China
Xiaofeng Lin

Authors

Qinglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ruizhuo Song
View author publications
You can also search for this author in PubMed Google Scholar
Benkai Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinglai Wei .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wei, Q., Song, R., Li, B., Lin, X. (2018). A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems. In: Self-Learning Optimal Control of Nonlinear Systems. Studies in Systems, Decision and Control, vol 103. Springer, Singapore. https://doi.org/10.1007/978-981-10-4080-1_4

Download citation

DOI: https://doi.org/10.1007/978-981-10-4080-1_4
Published: 14 June 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4079-5
Online ISBN: 978-981-10-4080-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics