Advertisement

An Iterative ADP Method to Solve for a Class of Nonlinear Zero-Sum Differential Games

  • Ruizhuo SongEmail author
  • Qinglai Wei
  • Qing Li
Chapter
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 166)

Abstract

In this chapter, an iterative ADP method is presented to solve a class of continuous-time nonlinear two-person zero-sum differential games. The idea is to use ADP technique to obtain the optimal control pair iteratively which makes the performance index function reach the saddle point of the zero-sum differential games. When the saddle point does not exist, the mixed optimal control pair is obtained to make the performance index function reach the mixed optimum. Rigid proofs are proposed to guarantee the control pair stabilize the nonlinear system. And the convergent property of the performance index function is also proved. Neural networks are used to approximate the performance index function, compute the optimal control policy and model the nonlinear system respectively for facilitating the implementation of the iterative ADP method. Two examples are given to demonstrate the validity of the proposed method.

References

  1. 1.
    Jamshidi, M.: Large-Scale Systems-Modeling and Control. North-Holland, Amsterdam, The Netherlands (1982)zbMATHGoogle Scholar
  2. 2.
    Chang, H., Marcus, S.: Two-person zero-sum markov games: receding horizon approach. IEEE Trans. Autom. Control 48(11), 1951–1961 (2003)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Chen, B., Tseng, C., Uang, H.: Fuzzy differential games for nonlinear stochastic systems: suboptimal approach. IEEE Trans. Fuzzy Syst. 10(2), 222–233 (2002)CrossRefGoogle Scholar
  4. 4.
    Hwnag, K., Chiou, J., Chen, T.: Reinforcement learning in zero-sum Markov games for robot soccer systems. In: Proceedings of the 2004 IEEE International Conference on Networking, Sensing and Control Taipei, Taiwan, pp. 1110–1114 (2004)Google Scholar
  5. 5.
    Laraki, R., Solan, E.: The value of zero-sum stopping games in continuous time. SIAM J. Control Optim. 43(5), 1913–1922 (2005)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Leslie, D., Collins, E.: Individual Q-learning in normal form games. SIAM J. Control Optim. 44(2), 495–514 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gu, D.: A differential game approach to formation control. IEEE Trans. Control Syst. Technol. 16(1), 85–93 (2008)CrossRefGoogle Scholar
  8. 8.
    Basar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Academic, New York (1982)zbMATHGoogle Scholar
  9. 9.
    Altman, E., Basar, T.: Multiuser rate-based flow control. IEEE Trans. Commun. 46(7), 940–949 (1998)CrossRefGoogle Scholar
  10. 10.
    Goebel, R.: Convexity in zero-sum differential games. In: Proceedings of IEEE Conference on Decision and Control, pp. 3964–3969 (2002)Google Scholar
  11. 11.
    Zhang, P., Deng, H., Xi, J.: On the value of two-person zero-sum linear quadratic differential games. In: Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 2005 Seville, Spain, pp. 12–15 (2005)Google Scholar
  12. 12.
    Hua, X., Mizukami, K.: Linear-quadratic zero-sum differential games for generalized state space systems. IEEE Trans. Autom. Control 39(1), 143–147 (1994)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Jimenez, M., Poznyak, A.: Robust and adaptive strategies with pre-identification via sliding mode technique in LQ differential games. In: Proceedings of the 2006 American Control Conference Minneapolis, Minnesota, USA, pp. 14–16 (2006)Google Scholar
  14. 14.
    Engwerda, J.: Uniqueness conditions for the affine open-loop linear quadratic differential game. Automatica 44(2), 504–511 (2008)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Bertsekas, D.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)zbMATHGoogle Scholar
  16. 16.
    Owen, G.: Game Theory. Acadamic Press, New York (1982)zbMATHGoogle Scholar
  17. 17.
    Basar, T., Bernhard, P.: \(H\infty \) Optimal Control and Related Minimax Design Problems. Birkhäuser, Boston (1995)zbMATHGoogle Scholar
  18. 18.
    Yong, J.: Dynamic programming and Hamilton–Jacobi–Bellman equation. Shanghai Science Press, Shanghai (1991)Google Scholar
  19. 19.
    Padhi, R., Unnikrishnan, N., Wang, X., Balakrishman, S.: A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)CrossRefGoogle Scholar
  20. 20.
    Gupta, S.: Numerical Methods for Engineerings. Wiley Eastern Ltd. and New Age International Company, New Delhi (1995)Google Scholar
  21. 21.
    Si, J., Wang, Y.: On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. 12(2), 264–275 (2001)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Enns, R., Si, J.: Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans. Neural Netw. 14(7), 929–939 (2003)CrossRefGoogle Scholar

Copyright information

© Science Press, Beijing and Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.University of Science and Technology BeijingBeijingChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations