Abstract
This chapter establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time non-zero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in policy iteration (PI) algorithm. Critic and action networks are used to obtain the performance index and control for each player. Gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proven. Simulation study demonstrates the effectiveness of the developed method for nonlinear continuous-time NZS games with unknown system dynamics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vamvoudakis, K., Lewis, F.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B-Cybern. 38(4), 937–942 (2008)
Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. 44(12), 2820–2833 (2014)
Wei, Q., Liu, D.: A novel iterative-Adaptive dynamic programming for discrete-time nonlinear. IEEE Trans. Automat. Sci. Eng. 11(4), 1176–1190 (2014)
Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)
Song, R., Lewis, F., Wei, Q., Zhang, H., Jiang, Z., Levine, D.: Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 851–865 (2015)
Modares, H., Lewis, F., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Modares, H., Lewis, F.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Modares, H., Lewis, F., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)
Kiumarsi, B., Lewis, F., Naghibi-Sistani, M., Karimpour, A.: Approximate dynamic programming for optimal tracking control of unknown linear systems using measured data. IEEE Trans. Cybern. 45(12), 2770–2779 (2015)
Jiang, Y., Jiang, Z.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Luo, B., Wu, H., Huang, T.: Off-policy reinforcement learning for H control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)
Song, R., Lewis, F., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)
Lewis, F., Vrabie, D., Syrmos, V.L.: Optimal Control, 3rd edn. Wiley, Hoboken (2012)
Vamvoudakis, K., Lewis, F., Hudas, G.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems withsaturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Leake, R., Liu, R.: Construction of suboptimal control sequences. SIAM J. Control 5(1), 54–63 (1967)
Jungers, M., De Pieri, E., Abou-Kandil, H.: Solving coupled algebraic Riccati equations from closed-loop Nash strategy, by lack of trust approach. Int. J. Tomogr. Stat. 7(F07), 49–54 (2007)
Limebeer, D., Anderson, B., Hendel, H.: A Nash game approach to mixed H2/H control. IEEE Trans. Autom. Control 39(1), 69–82 (1994)
Liu, D., Li, H., Wang, D.: Online synchronous approximate optimal learning algorithm for multiplayer nonzero-sum games with unknown dynamics. IEEE Trans. Syst. Man Cybern.: Syst. 44(8), 1015–1027 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Science Press, Beijing and Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Song, R., Wei, Q., Li, Q. (2019). Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games. In: Adaptive Dynamic Programming: Single and Multiple Controllers. Studies in Systems, Decision and Control, vol 166. Springer, Singapore. https://doi.org/10.1007/978-981-13-1712-5_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-1712-5_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1711-8
Online ISBN: 978-981-13-1712-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)