Skip to main content

Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based \( Q \)-Learning

  • Chapter
  • First Online:
Self-Learning Optimal Control of Nonlinear Systems

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 103))

Abstract

In this chapter, a novel discrete-time \( Q \)-learning algorithm based on value iteration is developed. In each iteration of the developed \( Q \)-learning algorithm, the iterative \( Q \) function is updated for all the states and controls in state and control spaces, instead of updating for a single state and a single control in the traditional \( Q \)-learning algorithm. A new convergence criterion is established to guarantee that the iterative \( Q \) function converges to the optimum, where the convergence criterion of the learning rates for traditional \( Q \)-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative \( Q \) function are analyzed to obtain the convergence criterion, instead of analyzing the iterative \( Q \) function itself. For convenience of analysis, the convergence properties for non-discount case of the deterministic \( Q \)-learning algorithm are first developed. Then, considering the discount factor, the convergence criterion for the discount case is established. Neural networks (NNs) are used to approximate the iterative \( Q \) function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic \( Q \)-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 943–949 (2008)

    Article  Google Scholar 

  2. Beard, R.: Improving the closed-loop performance of nonlinear Systems. Ph.D Dissertation, Rensselaer Polytechnic Institute, Troy, NY (1995)

    Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  4. Boaro, M., Fuselli, D., Angelis, F.D., Liu, D., Wei, Q., Piazza, F.: Adaptive dynamic programming algorithm for renewable energy scheduling and battery management. Cognit. Comput. 5(2), 264–277 (2013)

    Article  Google Scholar 

  5. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators. CRC Press, Boca Raton (2010)

    Book  MATH  Google Scholar 

  6. Dierks, T., Jagannathan, S.: Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Trans. Neural Netw. 23(7), 1118–1129 (2012)

    Article  Google Scholar 

  7. Dorf, R.C., Bishop, R.H.: Modern Control Systems, 12th edn. Prentice Hall, New York (2011)

    MATH  Google Scholar 

  8. Enns, R., Si, J.: Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans. Neural Netw. 14(4), 929–939 (2003)

    Article  Google Scholar 

  9. Fairbank, M., Alonso, E., Prokhorov, D.: An equivalence between adaptive dynamic programming with a critic and backpropagation through time. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2088–2100 (2013)

    Article  Google Scholar 

  10. Fuselli, D., Angelis, F.D., Boaro, M., Liu, D., Wei, Q., Squartini, S., Piazza, F.: Action dependent heuristic dynamic programming for home energy resource scheduling. Int. J. Electr. Power Energy Syst. 48, 148–160 (2013)

    Article  Google Scholar 

  11. Heydari, A., Balakrishnan, S.N.: Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Trans. Neural Netw. Learn. Syst. 24(1), 145–157 (2013)

    Article  Google Scholar 

  12. Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22(2), 259–269 (2013)

    Article  Google Scholar 

  13. Kar, S., Moura, J.M.F., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus \(+\) innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)

    Article  MathSciNet  Google Scholar 

  14. Kartoun, U., Shapiro, A., Stern, H., Edan, Y.: Physical modeling of a bag knot in a robot learning system. IEEE Trans. Autom. Sci. Eng. 7(1), 172–177 (2010)

    Article  Google Scholar 

  15. Kim, J.H., Lewis, F.L.: Model-free control design for unknown linear discrete-time systems via \(Q\)-learning with LMI. Automatica 46(8), 1320–1326 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Konar, A., Chakraborty, I.G., Singh, S.J., Jain, L.C., Nagar, A.K.: A deterministic improved \(Q\)-learning for path planning of a mobile robot. IEEE Trans. Syst. Man. Cybern. Syst. 43(5), 1141–1153 (2013)

    Article  Google Scholar 

  17. Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Lee, J.Y., Park, J.B., Choi, Y.H.: Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. Automatica 47(1), 207–214 (2012)

    MATH  Google Scholar 

  19. Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32(6), 76–105 (2012)

    Article  MathSciNet  Google Scholar 

  20. Liang, J., Molina, D.D., Venayagamoorthy, G.K., Harley, R.G.: Two-level dynamic stochastic optimal power flow control for power systems with intermittent renewable generation. IEEE Trans. Pow. Syst. 28(3), 2670–2678 (2013)

    Article  Google Scholar 

  21. Lin, W.S., Sheu, J.W.: Optimization of train regulation and energy usage of metro lines using an adaptive-optimal-control algorithm. IEEE Trans. Autom. Sci. Eng. 8(4), 855–864 (2011)

    Article  Google Scholar 

  22. Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51(8), 1249–1260 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  23. Liu, D., Wang, D., Li, H.: Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 418–428 (2014)

    Article  Google Scholar 

  24. Liu, D., Wang, D., Zhao, D., Wei, Q., Jin, N.: Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans. Autom. Sci. Eng. 9(3), 628–634 (2012)

    Article  Google Scholar 

  25. Liu, D., Wei, Q.: Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans. Cybern. 43(2), 779–789 (2013)

    Article  Google Scholar 

  26. Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 621–634 (2014)

    Article  Google Scholar 

  27. Liu, D., Zhang, Y., Zhang, H.: A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans. Neural Netw. 16(5), 1219–1228 (2005)

    Article  Google Scholar 

  28. Modares, H., Lewis, F.L., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  29. Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)

    Article  Google Scholar 

  30. Molina, D., Venayagamoorthy, G.K., Liang, J., Harley, R.G.: Intelligent local area signals based damping of power system oscillations using virtual generators and approximate dynamic programming. IEEE Trans. Smart Grid 4(1), 498–508 (2013)

    Article  Google Scholar 

  31. Munoz, P., Barco, R., Ruiz-Aviles, J.M., Bandera, I., Aguilar, A.: Fuzzy rule-based reinforcement learning for load balancing techniques in enterprise LTE femtocells. IEEE Trans. Veh. Technol. 62(5), 1962–1973 (2013)

    Article  Google Scholar 

  32. Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(2), 140–153 (2002)

    Article  Google Scholar 

  33. Ni, J., Liu, M., Ren, L., Yang, S.X.: A multiagent \(Q\)-learning-based optimal allocation approach for urban water resource management system. IEEE Trans. Autom. Sci. Eng. 11(1), 204–214 (2014)

    Article  Google Scholar 

  34. Ni, Z., He, H., Wen, J., Xu, X.: Goal representation heuristic dynamic programming on maze navigation. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2038–2050 (2013)

    Article  Google Scholar 

  35. Pradhan, S.K., Subudhi, B.: Real-time adaptive control of a flexible manipulator using reinforcement learning. IEEE Trans. Autom. Sci. Eng. 9(2), 237–249 (2012)

    Article  Google Scholar 

  36. Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12(2), 412–421 (2011)

    Article  Google Scholar 

  37. Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Trans. Neural Netw. 8(5), 997–1007 (1997)

    Article  Google Scholar 

  38. Rahimiyan, M., Mashhadi, H.R.: An adaptive \(Q\)-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(5), 547–556 (2010)

    Article  Google Scholar 

  39. Si, J., Wang, Y.T.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12(2), 264–276 (2001)

    Article  MathSciNet  Google Scholar 

  40. Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  41. Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  42. Watkins, C.: Learning from delayed rewards. PhD Dissertation, Cambridge University, Cambridge (1989)

    Google Scholar 

  43. Watkins, C., Dayan, P.: \(Q\)-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  44. Wei, Q., Liu, D.: An iterative \(\epsilon \)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw. 32, 236–244 (2012)

    Article  MATH  Google Scholar 

  45. Wei, Q., Liu, D.: Numerical adaptive learning control scheme for discrete-time nonlinear systems. IET Control Theory Appl. 7(11), 1472–1486 (2013)

    Article  MathSciNet  Google Scholar 

  46. Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11(4), 1176–1190 (2014)

    Article  Google Scholar 

  47. Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)

    Article  Google Scholar 

  48. Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Ind. Electr. 61(11), 6399–6408 (2014)

    Article  Google Scholar 

  49. Wei, Q., Zhang, H., Dai, J.: Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9), 1839–1848 (2009)

    Article  Google Scholar 

  50. Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. Gener. Syst. Yearb. 22, 25–38 (1977)

    Google Scholar 

  51. Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)

    Google Scholar 

  52. Xu, H., Jagannathan, S.: Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 24(3), 471–484 (2013)

    Article  Google Scholar 

  53. Zhang, H., Lewis, F.L.: Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7), 1432–1439 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  54. Zhang, H., Wei, Q., Liu, D.: An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1), 207–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  55. Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 38(4), 937–942 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinglai Wei .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Science Press, Beijing and Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Wei, Q., Song, R., Li, B., Lin, X. (2018). Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based \( Q \)-Learning. In: Self-Learning Optimal Control of Nonlinear Systems. Studies in Systems, Decision and Control, vol 103. Springer, Singapore. https://doi.org/10.1007/978-981-10-4080-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4080-1_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4079-5

  • Online ISBN: 978-981-10-4080-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics