Abstract
This paper is concerned with a discrete-time two-player zero-sum game of nonlinear systems, which is solved by a new iterative adaptive dynamic programming (ADP) method. In the present iterative ADP algorithm, two iteration procedures, which are upper and lower iterations, are implemented to obtain the upper and lower performance index functions, respectively. Initialized by an arbitrary positive semi-definite function, it is shown that the iterative value functions converge to the optimal performance index function if the optimal performance index function of the two-player zero-sum game exists. Finally, simulation results are given to illustrate the performance of the developed method.
Q. Wei—This work was supported in part by the National Natural Science Foundation of China under Grants 61533017, 61374105, 61233001, 61304086, 61503379, and 61273140.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Basar, T., Bernhard, P.: \(H_\infty \)-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach, 2nd edn. Birkhauser, Boston (1995)
Bellman, R.E.: Dynamic Programming. Princeton University Press, New Jersey (1957)
Berkel, K., Jager, B., Hofman, T., Steinbuch, M.: Implementation of dynamic programming for optimal control problems with continuous states. IEEE Trans. Control Syst. Technol. 23, 1172–1179 (2015)
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 882–893 (2014)
Kiumarsi, B., Lewis, F.L.: Actor-critic based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans. Neural Netw. Learn. syst. 26, 140–151 (2014)
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Autom. Control 51, 1249–1260 (2006)
Ni, Z., He, H., Zhong, X., Prokhorov, D.V.: Model-free dual heuristic dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 26, 1834–1839 (2015)
Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 1733–1739 (2014)
Song, R., Lewis, F.L., Wei, Q., Zhang, H., Jiang, Z.P., Levine, D.: Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Trans. Neural Netw. Learn. Syst. 26, 851–865 (2015)
Song, R., Lewis, F. L., Wei, Q., & Zhang, H. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances. IEEE Trans. Cybern. (2015) Article inpress. doi:10.1109/TCYB.2015.2421338
Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 26, 866–879 (2015)
Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. 44, 2820–2833 (2014)
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Ind. Electron. 61, 6399–6408 (2014)
Wei, Q., Liu, D., Shi, G., Liu, Y.: Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Trans. Ind. Electron. 42, 4203–4214 (2015)
Wei, Q., Song, R., Yan, P.: Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 27, 444–458 (2016)
Wei, Q., Liu, D., Lin, H.: Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 46, 840–853 (2016)
Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11, 1176–1190 (2014)
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11, 1020–1036 (2014)
Wei, Q., Liu, D., Shi, G.: A novel dual iterative \(Q\)-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. 62, 2509–2518 (2015)
Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. Gen. Syst. Yearb. 22, 25–38 (1977)
Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wei, Q., Liu, D. (2016). Discrete-Time Two-Player Zero-Sum Games for Nonlinear Systems Using Iterative Adaptive Dynamic Programming . In: Cheng, L., Liu, Q., Ronzhin, A. (eds) Advances in Neural Networks – ISNN 2016. ISNN 2016. Lecture Notes in Computer Science(), vol 9719. Springer, Cham. https://doi.org/10.1007/978-3-319-40663-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-40663-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40662-6
Online ISBN: 978-3-319-40663-3
eBook Packages: Computer ScienceComputer Science (R0)