Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems with Disturbances

  • Ruizhuo SongEmail author
  • Qinglai Wei
  • Qing Li
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 166)


An optimal control method is developed for unknown continuous time systems with unknown disturbances in this chapter. The integral reinforcement learning (IRL) algorithm is presented to obtain the iterative control. Off-policy learning is used to allow the dynamics to be completely unknown. Neural networks (NN) are used to construct critic and action networks. It is shown that if there are unknown disturbances, off-policy IRL may not converge or may be biased. For reducing the influence of unknown disturbances, a disturbances compensation controller is added. It is proven that the weight errors are uniformly ultimately bounded (UUB) based on Lyapunov techniques. Convergence of the Hamiltonian function is also proven. The simulation study demonstrates the effectiveness of the proposed optimal control method for unknown systems with disturbances.


  1. 1.
    Jiang, Y., Jiang, Z.: Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems. IEEE Trans. Circuits Syst. II: Express Br. 59(10), 693–697 (2012)CrossRefGoogle Scholar
  2. 2.
    Chen, B., Liu, K., Liu, X., Shi, P., Lin, C., Zhang, H.: Approximation-based adaptive neural control design for a class of nonlinear systems. IEEE Trans. Cybern. 44(5), 610–619 (2014)CrossRefGoogle Scholar
  3. 3.
    Lewis, F., Vamvoudakis, K.: Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(1), 14–25 (2011)CrossRefGoogle Scholar
  4. 4.
    Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)CrossRefGoogle Scholar
  5. 5.
    Lee, J., Park, J., Choi, Y.: Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. Automatica 48(11), 2850–2859 (2012)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Kiumarsi, B., Lewis, F., Modares, H., Karimpour, A., Naghibi-Sistani, M.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Vrabie, D., Pastravanu, O., Lewis, F., Abu-Khalaf, M.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Lewis, F., Vrabie, D., Vamvoudakis, K.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. Mag. 32(6), 76–105 (2012)Google Scholar
  9. 9.
    Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K., Lewis, F., Dixon, W.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Vrabie, D., Lewis, F.: Adaptive dynamic programming for online solution of a zero-sum differential game. J. Control Theory Appl. 9(3), 353–360 (2011)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Vrabie, D., Lewis, F.: Integral reinforcement learning for online computation of feedback nash strategies of nonzero-sum differential games. In: Proceedings of Decision and Control, Atlanta, GA, USA, pp. 3066–3071 (2010)Google Scholar
  12. 12.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. A Bradford Book. The MIT Press, Cambridge (2005)Google Scholar
  13. 13.
    Wang, J., Xu, X., Liu, D., Sun, Z., Chen, Q.: Self-learning cruise control using Kernel-based least squares policy iteration. IEEE Trans. Control Syst. Technol. 22(3), 1078–1087 (2014)Google Scholar
  14. 14.
    Vamvoudakis, K., Vrabie, D., Lewis, F.: Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control 24(17), 2686–2710 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Li, H., Liu, D., Wang, D.: Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 11(3), 706–714 (2014)CrossRefGoogle Scholar
  16. 16.
    Luo, B., Wu, H., Huang, T.: Off-policy reinforcement learning for \(H_{\infty }\) control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)Google Scholar
  17. 17.
    Jiang, Y., Jiang, Z.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Jiang, Y., Jiang, Z.: Robust adaptive dynamic programming for nonlinear control design. In: Proceedings of IEEE Conference on Decision and Control, Maui, Hawaii, USA, pp. 1896–1901 (2012)Google Scholar
  19. 19.
    Jiang, Y., Jiang, Z.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 882–893 (2014)Google Scholar
  20. 20.
    Beard, R., Saridis, G., Wen, J.: Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 33(12), 2159–2177 (1997)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Saridis, G., Lee, C.: An approximation theory of optimal control for trainable manipulators. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 9(3), 152–159 (1979)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6(6), 1262–1275 (1994)CrossRefGoogle Scholar
  24. 24.
    Lewis, F., Jagannathan, S., Yesildirek, A.: Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor and Francis, London (1999)Google Scholar
  25. 25.
    Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 621–634 (2014)Google Scholar

Copyright information

© Science Press, Beijing and Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.University of Science and Technology BeijingBeijingChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations