Multiple Actor-Critic Optimal Control via ADP

  • Ruizhuo SongEmail author
  • Qinglai Wei
  • Qing Li
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 166)


In industrial process control, there may be multiple performance objectives, depending on salient features of the input-output data. Aiming at this situation, this chapter proposes multiple actor-critic structures to obtain the optimal control via input-output data for unknown nonlinear systems. The shunting inhibitory artificial neural network (SIANN) is used to classify the input-output data into one of several categories. Different performance measure functions may be defined for disparate categories. The ADP algorithm, which contains model module, critic network and action network, is used to establish the optimal control in each category. A recurrent neural network (RNN) model is used to reconstruct the unknown system dynamics using input-output data. Neural networks are used to approximate the critic and action networks, respectively. It is proven that the model error and the closed unknown system are uniformly ultimately bounded (UUB). Simulation results demonstrate the performance of the proposed optimal control scheme for the unknown nonlinear system.


  1. 1.
    Levine, D., Ramirez Jr., P.: An attentional theory of emotional influences on risky decisions. Prog. Brain Res. 202(2), 369–388 (2013)Google Scholar
  2. 2.
    Levine, D., Mills, B., Estrada, S.: Modeling emotional influences on human decision making under risk. In: Proceedings of International Joint Conference on Neural Networks, pp. 1657–1662 (2005)Google Scholar
  3. 3.
    Werbos, P.: Intelligence in the brain: a theory of how it works and how to build it. Neural Netw. 22, 200–212 (2009)CrossRefGoogle Scholar
  4. 4.
    Werbos. P.: Stable adaptive control using new critic designs. In: Proceedings of Adaptation, Noise, and Self-Organizing Systems (1998)Google Scholar
  5. 5.
    Narendra, K., Balakrishnan, J.: Adaptive control using multiple models. IEEE Trans. Autom. control 42(2), 171–187 (1997)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Sugimoto, N., Morimoto, J., Hyon, S., Kawato, M.: The eMOSAIC model for humanoid robot control. Neural Netw. 29–30, 8–19 (2012)CrossRefGoogle Scholar
  7. 7.
    Doya, K.: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12(7–8), 961–974 (1999)CrossRefGoogle Scholar
  8. 8.
    Hikosaka, O., Nakahara, H., Rand, M., Sakai, K., Lu, X., Nakamura, K., Miyachi, S., Doya, K.: Parallel neural networks for learning sequential procedures. Trends Neurosci. 22(10), 464–471 (1999)CrossRefGoogle Scholar
  9. 9.
    Lee, J., Lee, J.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Song, R., Xiao, W., Zhang, H.: Multi-objective optimal control for a class of unknown nonlinear systems based on finite- approximation-error ADP algorithm. Neurocomputing 119(7), 212–221 (2013)CrossRefGoogle Scholar
  11. 11.
    Li, H., Liu, D., Wang, D.: Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 11(3), 706–714 (2014)CrossRefGoogle Scholar
  12. 12.
    Yang, X., Liu, D., Huang, Y.: Neural-network-based online optimal control for uncertain nonlinear continuous-time systems with control constraints. IET Control Theory Appl. 7(17), 2037–2047 (2013)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Lewis, F., Vamvoudakis, K.: Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(1), 14–25 (2011)CrossRefGoogle Scholar
  14. 14.
    Li, Z., Duan, Z., Lewis, F.: Distributed robust consensus control of multi-agent systems with heterogeneous matching uncertainties. Automatica 50(3), 883–889 (2014)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Modares, H., Lewis, F., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Zhang, H., Lewis, F.: Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics. Automatica 48(7), 1432–1439 (2012)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)CrossRefGoogle Scholar
  18. 18.
    Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Comput. 14, 1347–1369 (2002)CrossRefGoogle Scholar
  19. 19.
    Levine, D.: Neural dynamics of affect, gist, probability, and choice. Cogn. Syst. Res. 15–16, 57–72 (2012)CrossRefGoogle Scholar
  20. 20.
    Werbos, P.: Using ADP to understand and replicate brain intelligence: the next level design. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 209–216 (2007)Google Scholar
  21. 21.
    Arulampalam, G., Bouzerdoum, A.: A generalized feedforward neural network architecture for classification and regression. Neural Netw. 16, 561–568 (2003)CrossRefGoogle Scholar
  22. 22.
    Bouzerdoum, A.: Classification and function approximation using feedforward shunting inhibitory artificial neural networks. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 6, pp. 613–618 (2000)Google Scholar
  23. 23.
    Tivive, F., Bouzerdoum, A.: Efficient training algorithms for a class of shunting inhibitory convolutional neural networks. IEEE Trans. Neural Netw. 16(3), 541–556 (2005)CrossRefGoogle Scholar
  24. 24.
    Song, R., Lewis, F., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)CrossRefGoogle Scholar
  25. 25.
    Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6(6), 1262–1275 (1994)CrossRefGoogle Scholar
  26. 26.
    Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)CrossRefGoogle Scholar
  27. 27.
    Kim, Y., Lewis, F.: Neural network output feedback control of robot manipulators. IEEE Trans. Robot. Autom. 15(2), 301–309 (1999)CrossRefGoogle Scholar
  28. 28.
    Khalil, H.: Nonlinear System. Prentice-Hall, NJ (2002)Google Scholar
  29. 29.
    Lewis, F., Jagannathan, S., Yesildirek, A.: Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor and Francis, London (1999)Google Scholar
  30. 30.
    Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)CrossRefGoogle Scholar
  31. 31.
    Yang, C., Li, Z., Li, J.: Trajectory planning and optimized adaptive control for a class of wheeled inverted pendulum vehicle models. IEEE Trans. Cybern. 43(1), 24–36 (2013)CrossRefGoogle Scholar
  32. 32.
    Yang, C., Li, Z., Cui, R., Xu, B.: Neural network-based motion control of an underactuated wheeled inverted pendulum model. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 2004–2016 (2014)CrossRefGoogle Scholar
  33. 33.
    Beard, R.: Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. thesis, Rensselaer Polytechnic Institute, Troy, NY (1995)Google Scholar
  34. 34.
    Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41, 779–791 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Science Press, Beijing and Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.University of Science and Technology BeijingBeijingChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations