Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games

Zhang, Qichao; Zhao, Dongbin; Zhang, Sibo

doi:10.1007/978-3-319-70087-8_84

Qichao Zhang^18,19,
Dongbin Zhao^18,19 &
Sibo Zhang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

In this paper, the optimal control problem of nonzero-sum (NZS) games with partially unknown dynamics is investigated. The off-policy reinforcement learning (RL) method is proposed to approximate the solution of the coupled Hamilton-Jacobi (HJ) equations. A single critic network structure for each player is constructed using neural network (NN) technique. To improve the applicability of the off-policy RL method, the tuning laws of critic weights are designed based on the offline learning and online learning methods, respectively. The simulation study demonstrates the effectiveness of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Article 09 January 2022

Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints

Article 12 April 2023

References

Friedman, A.: Differential Games. Courier Corporation, Mineola (2013)
Google Scholar
Zhang, Q., Zhao, D., Zhu, Y.: Event-triggered \(H_\infty \) control for continuous-time nonlinear system via concurrent learning. IEEE Trans. Syst. Man Cybern. Syst. 47, 1071–1081 (2016). doi:10.1109/TSMC.2016.2531680
Article Google Scholar
Zhang, Q., Zhao, D., Zhu, Y.: Event-triggered H8 control for continuous-time nonlinear system via concurrent learning. IEEE Trans. Syst. Man Cybern. Syst. 47(7), 1071–1081 (2017)
Article Google Scholar
Starr, A.W., Ho, Y.C.: Nonzero-sum differential games. J. Optim. Theor. Appl. 3(3), 184–206 (1969)
Article MATH MathSciNet Google Scholar
Zhang, Q., Zhao, D., Zhu, Y.: Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. Neurocomputing 238, 377–386 (2017)
Article Google Scholar
Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)
Article MATH MathSciNet Google Scholar
Zhao, D., Zhang, Q., Wang, D., et al.: Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans. Cybern. 46(3), 854–865 (2016)
Article Google Scholar
Zhu, Y., Zhao, D., He, H., et al.: Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Trans. Ind. Electron. 64(5), 4101–4109 (2017)
Article Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)
Article MATH MathSciNet Google Scholar
Vrabie, D., Lewis, F.L.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22(3), 237–246 (2009)
Article MATH Google Scholar
Kamalapurkar, R., Klotz, J.R., Dixon, W.E.: Concurrent learning-based approximate feedback-nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J. Automatica Sin. 1(3), 239–247 (2014)
Article Google Scholar
Jiang, Y., Jiang, Z.: Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems. IEEE Trans. Circ. Syst. II Express Briefs 59(10), 693–697 (2012)
Google Scholar
Wang, D., Liu, D., Zhang, Q., Zhao, D.: Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst. Man Cybern. Syst. 46(11), 1544–1555 (2016)
Article Google Scholar
Mu, C., Ni, Z., Sun, C., He, H.: Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans. Cybern. 47(6), 1460–1470 (2017)
Article Google Scholar
Song, R., Lewis, F.L., Wei, Q.: Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 704–713 (2017)
Article MathSciNet Google Scholar
Luo, B., Wu, H.N., Huang, T.: Off-policy reinforcement learning for \(H_\infty \) control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)
Article Google Scholar

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (NSFC) under Grants No. 61573353, No. 61533017, by the National Key Research and Development Plan under Grants 2016YFB0101000.

Author information

Authors and Affiliations

The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Qichao Zhang & Dongbin Zhao
University of Chinese Academy of Sciences, Beijing, 100049, China
Qichao Zhang & Dongbin Zhao
University of Illinois Urbana-Champaign, Champaign, IL, 61801, USA
Sibo Zhang

Authors

Qichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dongbin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Sibo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongbin Zhao .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Q., Zhao, D., Zhang, S. (2017). Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_84

Download citation

DOI: https://doi.org/10.1007/978-3-319-70087-8_84
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games

Abstract

Access this chapter

Similar content being viewed by others

Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games

Abstract

Access this chapter

Similar content being viewed by others

Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

Adaptive critic design for nonlinear multi-player zero-sum games with unknown dynamics and control constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation