Control Theory and Technology

, Volume 17, Issue 1, pp 73–84 | Cite as

Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems

  • Bo PangEmail author
  • Tao Bian
  • Zhong-Ping Jiang


This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discrete-time systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing infinite-horizon PI methods are discussed. Then, both data-driven off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.


Optimal control time-varying system adaptive dynamic programming policy iteration (PI) value iteration (VI) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    R. E. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957.zbMATHGoogle Scholar
  2. [2]
    D. P. Bertsekas. Dynamic Programming and Optimal Control. 4th ed. Belmont: Athena Scientific, 2017.zbMATHGoogle Scholar
  3. [3]
    D. Liberzon. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton: Princeton University Press, 2011.zbMATHGoogle Scholar
  4. [4]
    D. P. Bertsekas, J. N. Tsitsiklis. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996.zbMATHGoogle Scholar
  5. [5]
    R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018.Google Scholar
  6. [6]
    C. Szepesvari. Algorithms for Reinforcement Learning. San Franscisco: Morgan and Claypool Publishers, 2010.CrossRefzbMATHGoogle Scholar
  7. [7]
    Y. Jiang, Z. P. Jiang. Robust Adaptive Dynamic Programming. Hoboken: Wiley, 2017.CrossRefzbMATHGoogle Scholar
  8. [8]
    F. L. Lewis, D. Liu (editors). Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken: Wiley, 2013.Google Scholar
  9. [9]
    B. Kiumarsi, K. G. Vamvoudakis, H. Modares, et al. Optimal and autonomous control using reinforcement learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2042–2062.MathSciNetCrossRefGoogle Scholar
  10. [10]
    W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Hoboken: Wiley, 2011.CrossRefzbMATHGoogle Scholar
  11. [11]
    D. Liu, Q. Wei, D. Wang, et al. Adaptive Dynamic Programming with Applications in Optimal Control. Berlin: Springer International Publishing, 2017.CrossRefzbMATHGoogle Scholar
  12. [12]
    R. Kamalapurkar, P. Walters, J. Rosenfeld, et al. Reinforcement Learning for Optimal Feedback Control: A Lyapunov Based Approach. Berlin: Springer International Publishing, 2018.CrossRefzbMATHGoogle Scholar
  13. [13]
    W. Gao, Z. P. Jiang. Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Transactions on Automatic Control, 2016, 61(12): 4164–4169.MathSciNetCrossRefzbMATHGoogle Scholar
  14. [14]
    D. Vrabie, K. G. Vamvoudakis, F. L. Lewis. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. London: Institution of Engineering and Technology, 2013.zbMATHGoogle Scholar
  15. [15]
    M. Huang, W. Gao, Z. P. Jiang. Connected cruise control with delayed feedback and disturbance: An adaptive dynamic programming approach. International Journal of Adaptive Control and Signal Processing, 2017: DOI Scholar
  16. [16]
    T. Bian, Z. P. Jiang. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica, 2016, 71: 348–360.MathSciNetCrossRefzbMATHGoogle Scholar
  17. [17]
    D. P. Bertsekas. Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 500–509.MathSciNetCrossRefGoogle Scholar
  18. [18]
    D. Kleinman, T. Fortmann, M. Athans. On the design of linear systems with piecewise-constant feedback gains. IEEE Transactions on Automatic Control, 1968, 13(4): 354–361.CrossRefGoogle Scholar
  19. [19]
    Q. M. Zhao, H. Xu, J. Sarangapani. Finite-horizon near optimal adaptive control of uncertain linear discrete-time systems. Optimal Control Applications and Methods, 2015, 36(6): 853–872.MathSciNetCrossRefzbMATHGoogle Scholar
  20. [20]
    C. X. Mu, D. Wang, H. B. He. Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Transactions on Cybernetics, 2018, 48(10): 2948–2961.CrossRefGoogle Scholar
  21. [21]
    A. Heydari, S. N. Balakrishnan. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1): 145–157.CrossRefGoogle Scholar
  22. [22]
    R. Beard. Improving the Closed-loop Performance of Nonlinear Systems. Ph.D. dissertation. New York: Rensselaer Polytechnic Institute, 1995.Google Scholar
  23. [23]
    T. Cheng, F. L. Lewis, M. Abu-Khalaf. A neural network solution for fixed-final time optimal control of nonlinear systems. Automatica, 2007, 43(3): 482–490.MathSciNetCrossRefzbMATHGoogle Scholar
  24. [24]
    Q. M. Zhao, H. Xu, S. Jagannathan. Neural network-based finitehorizon optimal control of uncertain affine nonlinear discretetime systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3): 486–499.MathSciNetCrossRefGoogle Scholar
  25. [25]
    P. Frihauf, M. Krstic, T. Basar. Finite-horizon LQ control for unknown discrete-time linear systems via extremum seeking. European Journal of Control, 2013, 19(5): 399–407.MathSciNetCrossRefzbMATHGoogle Scholar
  26. [26]
    S. J. Liu, M. Krstic, T. Basar. Batch-to-batch finite-horizon LQ control for unknown discrete-time linear systems via stochastic extremum seeking. IEEE Transactions on Automatic Control, 2017, 62(8): 4116–4123.MathSciNetCrossRefzbMATHGoogle Scholar
  27. [27]
    J. Fong, Y. Tan, V. Crocher, et al. Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics. Systems & Control Letters, 2018, 111: 49–57.MathSciNetCrossRefzbMATHGoogle Scholar
  28. [28]
    G. De Nicolao. On the time-varying Riccati difference equation of optimal filtering. SIAM Journal on Control and Optimization, 1992, 30(6): 1251–1269.MathSciNetCrossRefzbMATHGoogle Scholar
  29. [29]
    E. Emre, G. Knowles. A Newton-like approximation algorithm for the steady-state solution of the riccati equation for time-varying systems. Control Applications and Methods, 1987, 8(2): 191–197.MathSciNetCrossRefzbMATHGoogle Scholar
  30. [30]
    G. Hewer. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control, 1971, 16(4): 382–384.CrossRefGoogle Scholar
  31. [31]
    D. Kleinman. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 1968, 13(1): 114–115.CrossRefGoogle Scholar
  32. [32]
    D. Kleinman. Suboptimal Design of Linear Regulator Systems Subject to Computer Storage Limitations. Ph.D. dissertation. Cambridge: Massachusetts Institute of Technology, 1967.Google Scholar
  33. [33]
    P. Lancaster, L. Rodman. Algebraic Riccati Equations. Oxford: Oxford University Press, 1995.zbMATHGoogle Scholar
  34. [34]
    S. J. Bradtke, B. E. Ydstie, A. G. Barto. Adaptive linear quadratic control using policy iteration. Proceedings of the American Control Conference, Baltimore: IEEE, 1994: 3475–3479.Google Scholar
  35. [35]
    W. Gao, Y. Jiang, Z. P. Jiang, et al. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica, 2016, 72: 37–45.MathSciNetCrossRefzbMATHGoogle Scholar
  36. [36]
    L. V. Kantorovich, G. P. Akilov. Functional Analysis in Normed Spaces. New York: Macmillan, 1964.zbMATHGoogle Scholar
  37. [37]
    S. Bittanti, P. Colaneri, G. De Nicolao. The difference periodic Riccati equation for the periodic prediction problem. IEEE Transactions on Automatic Control, 1988, 33(8): 706–712.MathSciNetCrossRefzbMATHGoogle Scholar
  38. [38]
    Y. Yang. An efficient LQR design for discrete-time linear periodic system based on a novel lifting method. Automatica, 2018, 87: 383–388.MathSciNetCrossRefzbMATHGoogle Scholar
  39. [39]
    Y. Jiang, Z. P. Jiang. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699–2704.MathSciNetCrossRefzbMATHGoogle Scholar
  40. [40]
    R. Okano, T. Kida. Stability and stabilization of extending space structures. Transactions of the Society of Instrument and Control Engineers, 2002, 38(3): 284–292.CrossRefGoogle Scholar
  41. [41]
    A. Long, M. Richards, D. E. Hastings. On-orbit servicing: a new value proposition for satellite design and operation. Journal of Spacecraft and Rockets, 2007, 44(4): 964–976.CrossRefGoogle Scholar
  42. [42]
    L. Zhang, G. R. Duan. Robust poles assignment for a kind of second-order linear time-varying systems. Proceedings of the Chinese Control Conference, Hefei: IEEE, 2012: 2602–2606.Google Scholar

Copyright information

© Editorial Board of Control Theory & Applications, South China University of Technology and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Control and Networks (CAN) Lab, Department of Electrical and Computer Engineering, Tandon School of EngineeringNew York UniversityBrooklynUSA
  2. 2.Bank of America Merrill Lynch, One Bryant ParkNew YorkUSA

Personalised recommendations