Abstract
In this chapter, a novel discrete-time \( Q \)-learning algorithm based on value iteration is developed. In each iteration of the developed \( Q \)-learning algorithm, the iterative \( Q \) function is updated for all the states and controls in state and control spaces, instead of updating for a single state and a single control in the traditional \( Q \)-learning algorithm. A new convergence criterion is established to guarantee that the iterative \( Q \) function converges to the optimum, where the convergence criterion of the learning rates for traditional \( Q \)-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative \( Q \) function are analyzed to obtain the convergence criterion, instead of analyzing the iterative \( Q \) function itself. For convenience of analysis, the convergence properties for non-discount case of the deterministic \( Q \)-learning algorithm are first developed. Then, considering the discount factor, the convergence criterion for the discount case is established. Neural networks (NNs) are used to approximate the iterative \( Q \) function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic \( Q \)-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
References
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 943–949 (2008)
Beard, R.: Improving the closed-loop performance of nonlinear Systems. Ph.D Dissertation, Rensselaer Polytechnic Institute, Troy, NY (1995)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cognit. Comput. 5(2), 264–277 (2013)
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators. CRC Press, Boca Raton (2010)
Dierks, T., Jagannathan, S.: Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans. Neural Netw. 23(7), 1118–1129 (2012)
Dorf, R.C., Bishop, R.H.: Modern Control Systems, 12th edn. Prentice Hall, New York (2011)
Enns, R., Si, J.: Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans. Neural Netw. 14(4), 929–939 (2003)
Fairbank, M., Alonso, E., Prokhorov, D.: An equivalence between adaptive dynamic programming with a critic and backpropagation through time. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2088–2100 (2013)
Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)
Heydari, A., Balakrishnan, S.N.: Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans. Neural Netw. Learn. Syst. 24(1), 145–157 (2013)
Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22(2), 259–269 (2013)
Kar, S., Moura, J.M.F., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus \(+\) innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)
Kartoun, U., Shapiro, A., Stern, H., Edan, Y.: Physical modeling of a bag knot in a robot learning system. IEEE Trans. Autom. Sci. Eng. 7(1), 172–177 (2010)
Kim, J.H., Lewis, F.L.: Model-free control design for unknown linear discrete-time systems via \(Q\)-learning with LMI. Automatica 46(8), 1320–1326 (2010)
Konar, A., Chakraborty, I.G., Singh, S.J., Jain, L.C., Nagar, A.K.: A deterministic improved \(Q\)-learning for path planning of a mobile robot. IEEE Trans. Syst. Man. Cybern. Syst. 43(5), 1141–1153 (2013)
Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)
Lee, J.Y., Park, J.B., Choi, Y.H.: Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. Automatica 47(1), 207–214 (2012)
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32(6), 76–105 (2012)
Liang, J., Molina, D.D., Venayagamoorthy, G.K., Harley, R.G.: Two-level dynamic stochastic optimal power flow control for power systems with intermittent renewable generation. IEEE Trans. Pow. Syst. 28(3), 2670–2678 (2013)
Lin, W.S., Sheu, J.W.: Optimization of train regulation and energy usage of metro lines using an adaptive-optimal-control algorithm. IEEE Trans. Autom. Sci. Eng. 8(4), 855–864 (2011)
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51(8), 1249–1260 (2006)
Liu, D., Wang, D., Li, H.: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 418–428 (2014)
Liu, D., Wang, D., Zhao, D., Wei, Q., Jin, N.: Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 9(3), 628–634 (2012)
Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43(2), 779–789 (2013)
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 621–634 (2014)
Liu, D., Zhang, Y., Zhang, H.: A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans. Neural Netw. 16(5), 1219–1228 (2005)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Molina, D., Venayagamoorthy, G.K., Liang, J., Harley, R.G.: Intelligent local area signals based damping of power system oscillations using virtual generators and approximate dynamic programming. IEEE Trans. Smart Grid 4(1), 498–508 (2013)
Munoz, P., Barco, R., Ruiz-Aviles, J.M., Bandera, I., Aguilar, A.: Fuzzy rule-based reinforcement learning for load balancing techniques in enterprise LTE femtocells. IEEE Trans. Veh. Technol. 62(5), 1962–1973 (2013)
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(2), 140–153 (2002)
Ni, J., Liu, M., Ren, L., Yang, S.X.: A multiagent \(Q\)-learning-based optimal allocation approach for urban water resource management system. IEEE Trans. Autom. Sci. Eng. 11(1), 204–214 (2014)
Ni, Z., He, H., Wen, J., Xu, X.: Goal representation heuristic dynamic programming on maze navigation. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2038–2050 (2013)
Pradhan, S.K., Subudhi, B.: Real-time adaptive control of a flexible manipulator using reinforcement learning. IEEE Trans. Autom. Sci. Eng. 9(2), 237–249 (2012)
Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12(2), 412–421 (2011)
Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Trans. Neural Netw. 8(5), 997–1007 (1997)
Rahimiyan, M., Mashhadi, H.R.: An adaptive \(Q\)-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(5), 547–556 (2010)
Si, J., Wang, Y.T.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12(2), 264–276 (2001)
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Watkins, C.: Learning from delayed rewards. PhD Dissertation, Cambridge University, Cambridge (1989)
Watkins, C., Dayan, P.: \(Q\)-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Wei, Q., Liu, D.: An iterative \(\epsilon \)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw. 32, 236–244 (2012)
Wei, Q., Liu, D.: Numerical adaptive learning control scheme for discrete-time nonlinear systems. IET Control Theory Appl. 7(11), 1472–1486 (2013)
Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11(4), 1176–1190 (2014)
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Ind. Electr. 61(11), 6399–6408 (2014)
Wei, Q., Zhang, H., Dai, J.: Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9), 1839–1848 (2009)
Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. Gener. Syst. Yearb. 22, 25–38 (1977)
Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)
Xu, H., Jagannathan, S.: Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 24(3), 471–484 (2013)
Zhang, H., Lewis, F.L.: Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7), 1432–1439 (2012)
Zhang, H., Wei, Q., Liu, D.: An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1), 207–214 (2011)
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 937–942 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Science Press, Beijing and Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wei, Q., Song, R., Li, B., Lin, X. (2018). Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based \( Q \)-Learning. In: Self-Learning Optimal Control of Nonlinear Systems. Studies in Systems, Decision and Control, vol 103. Springer, Singapore. https://doi.org/10.1007/978-981-10-4080-1_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-4080-1_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4079-5
Online ISBN: 978-981-10-4080-1
eBook Packages: EngineeringEngineering (R0)