Integral Policy Iteration for Zero-Sum Games with Completely Unknown Nonlinear Dynamics

  • Hongliang Li
  • Derong Liu
  • Ding Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8226)


In this paper, we develop a model-free integral policy iteration algorithm to learn online the Nash equilibrium solution of two-player zero-sum differential games with completely unknown nonlinear continuous-time dynamics. The developed algorithm updates value function, control and disturbance policies simultaneously. To implement this algorithm, three neural networks are used to approximate the game value function, the control policy and the disturbance policy. The least squares method is used to estimate the unknown parameters of the neural networks. The effectiveness of the developed scheme is demonstrated by a simulation example.


Adaptive dynamic programming Policy iteration Neural networks Zero-sum games 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, Hoboken (2012)CrossRefGoogle Scholar
  2. 2.
    Vamvoudakis, K.G., Lewis, F.L.: Online Actor-critic Algorithm to Solve the Continuous-time Infinite Horizon Optimal Control Problem. Automatica 46, 878–888 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method. IEEE Trans. Neural Netw. 22, 2226–2236 (2011)CrossRefGoogle Scholar
  4. 4.
    Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A Novel Actor-critic-identifier Architecture for Approximate Optimal Control of Uncertain Nonlinear Systems. Automatica 49, 82–92 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive Optimal Control for Continuous-time Linear Systems Based on Policy Iteration. Automatica 45, 477–484 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Vrabie, D., Lewis, F.L.: Neural Network Approach to Continuous-time Direct Adaptive Optimal Control for Partially Unknown Nonlinear Systems. Neural Netw. 22, 237–246 (2009)CrossRefGoogle Scholar
  7. 7.
    Mehta, P., Meyn, S.: Q-learning and Pontryagins Minimum Principle. In: Proceedings of the 48th IEEE Conference on Decision and Control, pp. 3598–3605 (2009)Google Scholar
  8. 8.
    Lee, J.Y., Park, J.B., Choi, Y.H.: Integral Q-learning and Explorized Policy Iteration for Adaptive Optimal Control of Continuous-time Linear Systems. Automatica 48, 2850–2859 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Lee, J.Y., Park, J.B., Choi, Y.H.: Integral Reinforcement Learning with Explorations for Continuous-time Nonlinear Systems. In: Proceedings of the 2012 IEEE World Congress on Computational Intelligence, pp. 1042–1047 (2012)Google Scholar
  10. 10.
    Jiang, Y., Jiang, Z.P.: Computational Adaptive Optimal Control for Continuous-time Linear Systems with Completely Unknown Dynamics. Automatica 48, 2699–2704 (2012)CrossRefzbMATHGoogle Scholar
  11. 11.
    Basar, T., Olsder, G.J.: Dynamic Noncooperative Game, 2nd edn. SIAM, Philadelphia (1997)Google Scholar
  12. 12.
    Abu-Khalaf, M., Lewis, F.L., Huang, J.: Neurodynamic Progarmming and Zero-sum Games for Constrained Control Systems. IEEE Trans. Neural Netw. 19, 1243–1252 (2008)CrossRefGoogle Scholar
  13. 13.
    Zhang, H., Wei, Q., Liu, D.: An Iterative Adaptive Dynamic Programming Method for Solving a Class of Nonlinear Zero-sum Differential Games. Automatica 47, 207–214 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Vamvoudakis, K.G., Lewis, F.L.: Online Solution of Nonlinear Two-player Zero-sum Games using Synchronous Policy Iteration. Int. J. Robust. Nonlinear Control 22, 1460–1483 (2012)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Johnson, M., Bhasin, S., Dixon, W.E.: Nonlinear Two-player Zero-sum Game Approximate Solution Using a Policy Iteration Algorithm. In: Proceedings of Conference on Decision and Control and European Control Conference, pp. 142–147 (2011)Google Scholar
  16. 16.
    Varbie, D., Lewis, F.L.: Adaptive Dynamic Programming for Online Solution of a Zero-sum Differential Game. J. Control Theory Appl. 9, 353–360 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Wu, H.N., Luo, B.: Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear H  ∞  Control. IEEE Trans. Neural Netw. and Learn. Syst. 23, 1884–1895 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hongliang Li
    • 1
  • Derong Liu
    • 1
  • Ding Wang
    • 1
  1. 1.The State Key Laboratory of Management and Control for Complex SystemsInstitute of Automation, Chinese Academy of SciencesBeijingChina

Personalised recommendations