Abstract
In this chapter, a novel iterative Q-learning algorithm, called “policy iteration-based deterministic Q-learning algorithm,” is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically nonincreasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are used to implement the policy iteration-based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
References
Al-tamimi, A., Lewis, F.L., Abu-khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 943–949 (2008)
Beard, R.: Improving the Closed-Loop Performance of Nonlinear Systems. Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy, NY (1995)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cogn. Comput. 5, 264–277 (2013)
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton FL (2010)
Dierks, T., Jagannathan, S.: Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans. Neural Netw. 23, 1118–1129 (2012)
Dierks, T., Thumati, B., Jagannathan, S.: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw. 22, 851–860 (2009)
Dorf, R.C., Bishop, R.H.: Modern Control Systems, 12th edn. Prentice Hall, New York (2011)
Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)
Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22, 259–269 (2013)
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-sistani, M.B.: Reinforcement image-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50, 1167–1175 (2014)
Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41, 1281–1288 (2005)
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32, 76–105 (2012)
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51, 1249–1260 (2006)
Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43, 779–789 (2013)
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 621–634 (2014)
Modares, H., Lewis, F.L., Naghibi-sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn.Syst. 24, 1513–1525 (2013)
Modares, H., Lewis, F.L., Naghibi-sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)
Molina, D., Venayagamoothy, G.K., Liang, J., Harley, R.G.: Intelligent local area signals based damping of power system oscillations using virtual generators and approximate dynamic programming. IEEE Trans. Smart Grid 4, 498–508 (2013)
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern Part C Appl. Rev. 32, 140–153 (2002)
Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.N.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19, 1648–1660 (2006)
Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12, 412–421 (2011)
Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Trans. Neural Netw. 8, 997–1007 (1997)
Sahoo, A., Jagannathan, S.: Event-triggered optimal regulation of uncertain linear discrete-time systems by using Q-learning scheme. In: Proceedings of the IEEE Conference on Decision and Control, pp. 1233–1238, Los Angeles, CA, USA (2014)
Si, J., Wang, Y.T.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12, 264–276 (2001)
Song, R., Xiao, W., Sun, C.: A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture. Sci. Chin. Inf. Sci. 57, 682021–06820210 (2014)
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)
Watkins, C.: Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge, England (1989)
Watkins, C., Danyan, P.: \(Q\)-learning. Mach. Learn. 8, 279–292 (1992)
Wei, Q., Liu, D.: An iterative \(\epsilon \)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw. 32, 236–244 (2012)
Wei, Q., Liu, D.: Numerically adaptive learning control scheme for discrete-time nonlinear systems. IET Control Theor. Appl. 7, 1472–1486 (2013)
Wei, Q., Liu, D.: A novel dual iterative \(Q\)-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. (2014, in press). doi:10.1109/TIE.2014.2361485
Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11, 1176–1190 (2014)
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11, 1020–1036 (2014)
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Industr. Electron. 61, 6399–6408 (2014)
Wei, Q., Liu, D.: Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing (2014, in press). doi:10.1016/j.neucom.2013.09.069
Wei, Q., Liu, D.: Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Comput. Appl. 24, 1355–1367 (2014)
Wei, Q., Liu, D.: A novel policy iteration based deterministic \(Q\)-learning for discrete-time nonlinear systems. Sci. Chin. Inf. Sci. 58, 122203 (2015)
Wei, Q., Liu, D., Xu, Y.: Policy iteration optimal tracking control for chaotic systems by adaptive dynamic programming approach. Chinese Physics B accept (2014)
Wei, Q., Wang, D., Zhang, D.: Dual iterative adaptive dynamic programming for a class of discrete-time nonlinear systems with time-delays. Neural Comput. Appl. 23, 1851–1863 (2013)
Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. (2014, in press). doi:10.1109/TCYB.2014.2354377
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38, 937–942 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Science Press, Beijing and Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wei, Q., Song, R., Li, B., Lin, X. (2018). A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems. In: Self-Learning Optimal Control of Nonlinear Systems. Studies in Systems, Decision and Control, vol 103. Springer, Singapore. https://doi.org/10.1007/978-981-10-4080-1_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-4080-1_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4079-5
Online ISBN: 978-981-10-4080-1
eBook Packages: EngineeringEngineering (R0)