Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games

Song, Ruizhuo; Wei, Qinglai; Li, Qing

doi:10.1007/978-981-13-1712-5_12

Ruizhuo Song⁵,
Qinglai Wei⁶ &
Qing Li⁵

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 166))

691 Accesses
1 Citations

Abstract

This chapter establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time non-zero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in policy iteration (PI) algorithm. Critic and action networks are used to obtain the performance index and control for each player. Gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proven. Simulation study demonstrates the effectiveness of the developed method for nonlinear continuous-time NZS games with unknown system dynamics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vamvoudakis, K., Lewis, F.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)
Article MathSciNet Google Scholar
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B-Cybern. 38(4), 937–942 (2008)
Article Google Scholar
Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. 44(12), 2820–2833 (2014)
Article Google Scholar
Wei, Q., Liu, D.: A novel iterative-Adaptive dynamic programming for discrete-time nonlinear. IEEE Trans. Automat. Sci. Eng. 11(4), 1176–1190 (2014)
Article Google Scholar
Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)
Article Google Scholar
Song, R., Lewis, F., Wei, Q., Zhang, H., Jiang, Z., Levine, D.: Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 851–865 (2015)
Article MathSciNet Google Scholar
Modares, H., Lewis, F., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Article Google Scholar
Modares, H., Lewis, F.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Article MathSciNet Google Scholar
Modares, H., Lewis, F., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)
Article MathSciNet Google Scholar
Kiumarsi, B., Lewis, F., Naghibi-Sistani, M., Karimpour, A.: Approximate dynamic programming for optimal tracking control of unknown linear systems using measured data. IEEE Trans. Cybern. 45(12), 2770–2779 (2015)
Article Google Scholar
Jiang, Y., Jiang, Z.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Article MathSciNet Google Scholar
Luo, B., Wu, H., Huang, T.: Off-policy reinforcement learning for H control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)
Article Google Scholar
Song, R., Lewis, F., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)
Article Google Scholar
Lewis, F., Vrabie, D., Syrmos, V.L.: Optimal Control, 3rd edn. Wiley, Hoboken (2012)
Book Google Scholar
Vamvoudakis, K., Lewis, F., Hudas, G.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Article MathSciNet Google Scholar
Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems withsaturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Article MathSciNet Google Scholar
Leake, R., Liu, R.: Construction of suboptimal control sequences. SIAM J. Control 5(1), 54–63 (1967)
Article MathSciNet Google Scholar
Jungers, M., De Pieri, E., Abou-Kandil, H.: Solving coupled algebraic Riccati equations from closed-loop Nash strategy, by lack of trust approach. Int. J. Tomogr. Stat. 7(F07), 49–54 (2007)
MathSciNet Google Scholar
Limebeer, D., Anderson, B., Hendel, H.: A Nash game approach to mixed H2/H control. IEEE Trans. Autom. Control 39(1), 69–82 (1994)
Article Google Scholar
Liu, D., Li, H., Wang, D.: Online synchronous approximate optimal learning algorithm for multiplayer nonzero-sum games with unknown dynamics. IEEE Trans. Syst. Man Cybern.: Syst. 44(8), 1015–1027 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology Beijing, Beijing, China
Ruizhuo Song & Qing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qinglai Wei

Authors

Ruizhuo Song
View author publications
You can also search for this author in PubMed Google Scholar
Qinglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruizhuo Song .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Song, R., Wei, Q., Li, Q. (2019). Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games. In: Adaptive Dynamic Programming: Single and Multiple Controllers. Studies in Systems, Decision and Control, vol 166. Springer, Singapore. https://doi.org/10.1007/978-981-13-1712-5_12

Download citation

DOI: https://doi.org/10.1007/978-981-13-1712-5_12
Published: 29 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1711-8
Online ISBN: 978-981-13-1712-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics