Skip to main content

Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games

  • Chapter
  • First Online:
Adaptive Dynamic Programming: Single and Multiple Controllers

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 166))

Abstract

This chapter establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time non-zero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in policy iteration (PI) algorithm. Critic and action networks are used to obtain the performance index and control for each player. Gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proven. Simulation study demonstrates the effectiveness of the developed method for nonlinear continuous-time NZS games with unknown system dynamics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vamvoudakis, K., Lewis, F.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8), 1556–1569 (2011)

    Article  MathSciNet  Google Scholar 

  2. Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. Part B-Cybern. 38(4), 937–942 (2008)

    Article  Google Scholar 

  3. Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. 44(12), 2820–2833 (2014)

    Article  Google Scholar 

  4. Wei, Q., Liu, D.: A novel iterative-Adaptive dynamic programming for discrete-time nonlinear. IEEE Trans. Automat. Sci. Eng. 11(4), 1176–1190 (2014)

    Article  Google Scholar 

  5. Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)

    Article  Google Scholar 

  6. Song, R., Lewis, F., Wei, Q., Zhang, H., Jiang, Z., Levine, D.: Multiple Actor-Critic Structures for Continuous-Time Optimal Control Using Input-Output Data. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 851–865 (2015)

    Article  MathSciNet  Google Scholar 

  7. Modares, H., Lewis, F., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)

    Article  Google Scholar 

  8. Modares, H., Lewis, F.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)

    Article  MathSciNet  Google Scholar 

  9. Modares, H., Lewis, F., Naghibi-Sistani, M.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)

    Article  MathSciNet  Google Scholar 

  10. Kiumarsi, B., Lewis, F., Naghibi-Sistani, M., Karimpour, A.: Approximate dynamic programming for optimal tracking control of unknown linear systems using measured data. IEEE Trans. Cybern. 45(12), 2770–2779 (2015)

    Article  Google Scholar 

  11. Jiang, Y., Jiang, Z.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)

    Article  MathSciNet  Google Scholar 

  12. Luo, B., Wu, H., Huang, T.: Off-policy reinforcement learning for H control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)

    Article  Google Scholar 

  13. Song, R., Lewis, F., Wei, Q., Zhang, H.: Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 46(5), 1041–1050 (2016)

    Article  Google Scholar 

  14. Lewis, F., Vrabie, D., Syrmos, V.L.: Optimal Control, 3rd edn. Wiley, Hoboken (2012)

    Book  Google Scholar 

  15. Vamvoudakis, K., Lewis, F., Hudas, G.: Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)

    Article  MathSciNet  Google Scholar 

  16. Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems withsaturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)

    Article  MathSciNet  Google Scholar 

  17. Leake, R., Liu, R.: Construction of suboptimal control sequences. SIAM J. Control 5(1), 54–63 (1967)

    Article  MathSciNet  Google Scholar 

  18. Jungers, M., De Pieri, E., Abou-Kandil, H.: Solving coupled algebraic Riccati equations from closed-loop Nash strategy, by lack of trust approach. Int. J. Tomogr. Stat. 7(F07), 49–54 (2007)

    MathSciNet  Google Scholar 

  19. Limebeer, D., Anderson, B., Hendel, H.: A Nash game approach to mixed H2/H control. IEEE Trans. Autom. Control 39(1), 69–82 (1994)

    Article  Google Scholar 

  20. Liu, D., Li, H., Wang, D.: Online synchronous approximate optimal learning algorithm for multiplayer nonzero-sum games with unknown dynamics. IEEE Trans. Syst. Man Cybern.: Syst. 44(8), 1015–1027 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruizhuo Song .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Science Press, Beijing and Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Song, R., Wei, Q., Li, Q. (2019). Off-Policy Integral Reinforcement Learning Method for Multi-player Non-zero-Sum Games. In: Adaptive Dynamic Programming: Single and Multiple Controllers. Studies in Systems, Decision and Control, vol 166. Springer, Singapore. https://doi.org/10.1007/978-981-13-1712-5_12

Download citation

Publish with us

Policies and ethics