Skip to main content

Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

Abstract

In this paper, the optimal control problem of nonzero-sum (NZS) games with partially unknown dynamics is investigated. The off-policy reinforcement learning (RL) method is proposed to approximate the solution of the coupled Hamilton-Jacobi (HJ) equations. A single critic network structure for each player is constructed using neural network (NN) technique. To improve the applicability of the off-policy RL method, the tuning laws of critic weights are designed based on the offline learning and online learning methods, respectively. The simulation study demonstrates the effectiveness of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Friedman, A.: Differential Games. Courier Corporation, Mineola (2013)

    Google Scholar 

  2. Zhang, Q., Zhao, D., Zhu, Y.: Event-triggered \(H_\infty \) control for continuous-time nonlinear system via concurrent learning. IEEE Trans. Syst. Man Cybern. Syst. 47, 1071–1081 (2016). doi:10.1109/TSMC.2016.2531680

    Article  Google Scholar 

  3. Zhang, Q., Zhao, D., Zhu, Y.: Event-triggered H8 control for continuous-time nonlinear system via concurrent learning. IEEE Trans. Syst. Man Cybern. Syst. 47(7), 1071–1081 (2017)

    Article  Google Scholar 

  4. Starr, A.W., Ho, Y.C.: Nonzero-sum differential games. J. Optim. Theor. Appl. 3(3), 184–206 (1969)

    Article  MATH  MathSciNet  Google Scholar 

  5. Zhang, Q., Zhao, D., Zhu, Y.: Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs. Neurocomputing 238, 377–386 (2017)

    Article  Google Scholar 

  6. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  7. Zhao, D., Zhang, Q., Wang, D., et al.: Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans. Cybern. 46(3), 854–865 (2016)

    Article  Google Scholar 

  8. Zhu, Y., Zhao, D., He, H., et al.: Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Trans. Ind. Electron. 64(5), 4101–4109 (2017)

    Article  Google Scholar 

  9. Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  10. Vrabie, D., Lewis, F.L.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22(3), 237–246 (2009)

    Article  MATH  Google Scholar 

  11. Kamalapurkar, R., Klotz, J.R., Dixon, W.E.: Concurrent learning-based approximate feedback-nash equilibrium solution of N-player nonzero-sum differential games. IEEE/CAA J. Automatica Sin. 1(3), 239–247 (2014)

    Article  Google Scholar 

  12. Jiang, Y., Jiang, Z.: Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems. IEEE Trans. Circ. Syst. II Express Briefs 59(10), 693–697 (2012)

    Google Scholar 

  13. Wang, D., Liu, D., Zhang, Q., Zhao, D.: Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst. Man Cybern. Syst. 46(11), 1544–1555 (2016)

    Article  Google Scholar 

  14. Mu, C., Ni, Z., Sun, C., He, H.: Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans. Cybern. 47(6), 1460–1470 (2017)

    Article  Google Scholar 

  15. Song, R., Lewis, F.L., Wei, Q.: Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 704–713 (2017)

    Article  MathSciNet  Google Scholar 

  16. Luo, B., Wu, H.N., Huang, T.: Off-policy reinforcement learning for \(H_\infty \) control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (NSFC) under Grants No. 61573353, No. 61533017, by the National Key Research and Development Plan under Grants 2016YFB0101000.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongbin Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, Q., Zhao, D., Zhang, S. (2017). Off-Policy Reinforcement Learning for Partially Unknown Nonzero-Sum Games. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_84

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70087-8_84

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70086-1

  • Online ISBN: 978-3-319-70087-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics