Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based $$ Q $$ -Learning

Wei, Qinglai; Song, Ruizhuo; Li, Benkai; Lin, Xiaofeng

doi:10.1007/978-981-10-4080-1_3

Qinglai Wei⁶,
Ruizhuo Song⁷,
Benkai Li⁶ &
…
Xiaofeng Lin⁸

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 103))

2086 Accesses
2 Citations

Abstract

In this chapter, a novel discrete-time $ Q $-learning algorithm based on value iteration is developed. In each iteration of the developed $ Q $-learning algorithm, the iterative $ Q $ function is updated for all the states and controls in state and control spaces, instead of updating for a single state and a single control in the traditional $ Q $-learning algorithm. A new convergence criterion is established to guarantee that the iterative $ Q $ function converges to the optimum, where the convergence criterion of the learning rates for traditional $ Q $-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative $ Q $ function are analyzed to obtain the convergence criterion, instead of analyzing the iterative $ Q $ function itself. For convenience of analysis, the convergence properties for non-discount case of the deterministic $ Q $-learning algorithm are first developed. Then, considering the discount factor, the convergence criterion for the discount case is established. Neural networks (NNs) are used to approximate the iterative $ Q $ function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic $ Q $-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 943–949 (2008)
Article Google Scholar
Beard, R.: Improving the closed-loop performance of nonlinear Systems. Ph.D Dissertation, Rensselaer Polytechnic Institute, Troy, NY (1995)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cognit. Comput. 5(2), 264–277 (2013)
Article Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators. CRC Press, Boca Raton (2010)
Book MATH Google Scholar
Dierks, T., Jagannathan, S.: Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans. Neural Netw. 23(7), 1118–1129 (2012)
Article Google Scholar
Dorf, R.C., Bishop, R.H.: Modern Control Systems, 12th edn. Prentice Hall, New York (2011)
MATH Google Scholar
Enns, R., Si, J.: Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans. Neural Netw. 14(4), 929–939 (2003)
Article Google Scholar
Fairbank, M., Alonso, E., Prokhorov, D.: An equivalence between adaptive dynamic programming with a critic and backpropagation through time. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2088–2100 (2013)
Article Google Scholar
Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)
Article Google Scholar
Heydari, A., Balakrishnan, S.N.: Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans. Neural Netw. Learn. Syst. 24(1), 145–157 (2013)
Article Google Scholar
Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22(2), 259–269 (2013)
Article Google Scholar
Kar, S., Moura, J.M.F., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus $+$ innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)
Article MathSciNet Google Scholar
Kartoun, U., Shapiro, A., Stern, H., Edan, Y.: Physical modeling of a bag knot in a robot learning system. IEEE Trans. Autom. Sci. Eng. 7(1), 172–177 (2010)
Article Google Scholar
Kim, J.H., Lewis, F.L.: Model-free control design for unknown linear discrete-time systems via $Q$-learning with LMI. Automatica 46(8), 1320–1326 (2010)
Article MathSciNet MATH Google Scholar
Konar, A., Chakraborty, I.G., Singh, S.J., Jain, L.C., Nagar, A.K.: A deterministic improved $Q$-learning for path planning of a mobile robot. IEEE Trans. Syst. Man. Cybern. Syst. 43(5), 1141–1153 (2013)
Article Google Scholar
Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)
Article MathSciNet MATH Google Scholar
Lee, J.Y., Park, J.B., Choi, Y.H.: Integral $Q$-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. Automatica 47(1), 207–214 (2012)
MATH Google Scholar
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32(6), 76–105 (2012)
Article MathSciNet Google Scholar
Liang, J., Molina, D.D., Venayagamoorthy, G.K., Harley, R.G.: Two-level dynamic stochastic optimal power flow control for power systems with intermittent renewable generation. IEEE Trans. Pow. Syst. 28(3), 2670–2678 (2013)
Article Google Scholar
Lin, W.S., Sheu, J.W.: Optimization of train regulation and energy usage of metro lines using an adaptive-optimal-control algorithm. IEEE Trans. Autom. Sci. Eng. 8(4), 855–864 (2011)
Article Google Scholar
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51(8), 1249–1260 (2006)
Article MathSciNet MATH Google Scholar
Liu, D., Wang, D., Li, H.: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 418–428 (2014)
Article Google Scholar
Liu, D., Wang, D., Zhao, D., Wei, Q., Jin, N.: Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 9(3), 628–634 (2012)
Article Google Scholar
Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43(2), 779–789 (2013)
Article Google Scholar
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 621–634 (2014)
Article Google Scholar
Liu, D., Zhang, Y., Zhang, H.: A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans. Neural Netw. 16(5), 1219–1228 (2005)
Article Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Article MathSciNet MATH Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Article Google Scholar
Molina, D., Venayagamoorthy, G.K., Liang, J., Harley, R.G.: Intelligent local area signals based damping of power system oscillations using virtual generators and approximate dynamic programming. IEEE Trans. Smart Grid 4(1), 498–508 (2013)
Article Google Scholar
Munoz, P., Barco, R., Ruiz-Aviles, J.M., Bandera, I., Aguilar, A.: Fuzzy rule-based reinforcement learning for load balancing techniques in enterprise LTE femtocells. IEEE Trans. Veh. Technol. 62(5), 1962–1973 (2013)
Article Google Scholar
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(2), 140–153 (2002)
Article Google Scholar
Ni, J., Liu, M., Ren, L., Yang, S.X.: A multiagent $Q$-learning-based optimal allocation approach for urban water resource management system. IEEE Trans. Autom. Sci. Eng. 11(1), 204–214 (2014)
Article Google Scholar
Ni, Z., He, H., Wen, J., Xu, X.: Goal representation heuristic dynamic programming on maze navigation. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2038–2050 (2013)
Article Google Scholar
Pradhan, S.K., Subudhi, B.: Real-time adaptive control of a flexible manipulator using reinforcement learning. IEEE Trans. Autom. Sci. Eng. 9(2), 237–249 (2012)
Article Google Scholar
Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12(2), 412–421 (2011)
Article Google Scholar
Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Trans. Neural Netw. 8(5), 997–1007 (1997)
Article Google Scholar
Rahimiyan, M., Mashhadi, H.R.: An adaptive $Q$-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(5), 547–556 (2010)
Article Google Scholar
Si, J., Wang, Y.T.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12(2), 264–276 (2001)
Article MathSciNet Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)
Article MathSciNet MATH Google Scholar
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Article MathSciNet MATH Google Scholar
Watkins, C.: Learning from delayed rewards. PhD Dissertation, Cambridge University, Cambridge (1989)
Google Scholar
Watkins, C., Dayan, P.: $Q$-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Wei, Q., Liu, D.: An iterative $\epsilon $-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw. 32, 236–244 (2012)
Article MATH Google Scholar
Wei, Q., Liu, D.: Numerical adaptive learning control scheme for discrete-time nonlinear systems. IET Control Theory Appl. 7(11), 1472–1486 (2013)
Article MathSciNet Google Scholar
Wei, Q., Liu, D.: A novel iterative $\theta $-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11(4), 1176–1190 (2014)
Article Google Scholar
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)
Article Google Scholar
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Ind. Electr. 61(11), 6399–6408 (2014)
Article Google Scholar
Wei, Q., Zhang, H., Dai, J.: Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9), 1839–1848 (2009)
Article Google Scholar
Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. Gener. Syst. Yearb. 22, 25–38 (1977)
Google Scholar
Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)
Google Scholar
Xu, H., Jagannathan, S.: Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 24(3), 471–484 (2013)
Article Google Scholar
Zhang, H., Lewis, F.L.: Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7), 1432–1439 (2012)
Article MathSciNet MATH Google Scholar
Zhang, H., Wei, Q., Liu, D.: An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1), 207–214 (2011)
Article MathSciNet MATH Google Scholar
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 937–942 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qinglai Wei & Benkai Li
University of Science and Technology Beijing, Beijing, China
Ruizhuo Song
Guangxi University, Guangxi, China
Xiaofeng Lin

Authors

Qinglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ruizhuo Song
View author publications
You can also search for this author in PubMed Google Scholar
Benkai Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinglai Wei .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wei, Q., Song, R., Li, B., Lin, X. (2018). Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based $ Q $-Learning. In: Self-Learning Optimal Control of Nonlinear Systems. Studies in Systems, Decision and Control, vol 103. Springer, Singapore. https://doi.org/10.1007/978-981-10-4080-1_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-4080-1_3
Published: 14 June 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4079-5
Online ISBN: 978-981-10-4080-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based \( Q \)-Learning

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based \( Q \)-Learning

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation